Exercise 04: Local File Storage for Face Embeddings
Overview​
This exercise teaches you how to build a persistent storage system for face embeddings using local JSON files. You'll implement a complete storage solution that can save, retrieve, and manage embedding records - essential for any face authentication system that needs to remember users between sessions.
Why Storage Matters​
Face authentication systems need persistent storage to:
- Remember Users: Store embeddings from registration for future login attempts
- Enable Comparison: Retrieve stored embeddings to compare against live captures
- Manage Identities: Track multiple embeddings per user for better accuracy
- Persist Data: Maintain user data across application restarts
Architecture Overview​
The storage system uses a trait-based design for flexibility:
// Data structure for each stored embedding
pub struct EmbeddingRecord {
pub id: String, // Unique identifier
pub name: String, // User name
pub embedding: Vec<f32>, // Face embedding vector
pub created_at: chrono::DateTime<chrono::Utc>, // Timestamp
pub metadata: HashMap<String, String>, // Additional data
}
// Storage interface (trait)
pub trait EmbeddingStorage {
fn store_embedding(&mut self, record: EmbeddingRecord) -> Result<()>;
fn get_embedding(&self, id: &str) -> Result<Option<EmbeddingRecord>>;
fn get_all_embeddings(&self) -> Result<Vec<EmbeddingRecord>>;
fn delete_embedding(&mut self, id: &str) -> Result<bool>;
}
// Local file implementation
pub struct LocalFileStorage {
file_path: String,
data: Mutex<HashMap<String, EmbeddingRecord>>,
}
Your Tasks​
Task 1: Implement LocalFileStorage::new()​
pub fn new(file_path: String) -> Result<Self>
This constructor should:
- Create Storage Instance: Initialize the struct with the file path
- Set Up In-Memory Cache: Create a Mutex-protected HashMap for fast access
- Load Existing Data: Call
load_data()to read any existing records
Implementation Approach:​
- Initialize the struct fields with appropriate values
- Set up thread-safe data structures for concurrent access
- Load any existing data from the file system
Hint: Check the CHEATSHEET.md for HashMap and Mutex patterns.
Task 2: Implement load_data()​
fn load_data(&self) -> Result<()>
This method should:
- Check File Existence: Return early if file doesn't exist
- Handle Empty Files: Deal gracefully with zero-byte files
- Parse JSON: Deserialize the file content into a HashMap
- Update Cache: Store the loaded data in the in-memory HashMap
Implementation Approach:​
- Use file system operations to check existence and size
- Handle JSON deserialization with proper error handling
- Update the in-memory cache with loaded data
- Use proper mutex locking for thread safety
Hint: Check the CHEATSHEET.md for JSON deserialization patterns.
Task 3: Implement save_data()​
fn save_data(&self) -> Result<()>
This method should:
- Create Directory: Ensure the parent directory exists
- Open File: Create or truncate the target file
- Serialize Data: Convert the HashMap to JSON format
- Write File: Save the JSON to disk
Implementation Approach:​
- Handle directory creation for the file path
- Use appropriate file opening options for writing
- Serialize the in-memory data to JSON format
- Ensure thread-safe access to the data
Hint: Check the CHEATSHEET.md for JSON serialization and file operations.
Task 4: Implement EmbeddingStorage Trait​
store_embedding()​
fn store_embedding(&mut self, record: EmbeddingRecord) -> Result<()>
- Add the record to in-memory storage and persist to disk
get_embedding()​
fn get_embedding(&self, id: &str) -> Result<Option<EmbeddingRecord>>
- Search for and return a record by its ID
get_all_embeddings()​
fn get_all_embeddings(&self) -> Result<Vec<EmbeddingRecord>>
- Return all stored embedding records
delete_embedding()​
fn delete_embedding(&mut self, id: &str) -> Result<bool>
- Remove a record by ID and return whether deletion was successful
Implementation Approach: Use HashMap operations with proper mutex locking and persistence.
Task 5: Implement open_temp_storage()​
pub fn open_temp_storage() -> Result<(Box<dyn EmbeddingStorage>, String)>
This helper function should:
- Generate Unique Path: Create a temporary filename using UUID
- Create Storage: Initialize a LocalFileStorage instance
- Return Boxed Trait: Return as a trait object for flexibility
Implementation Approach:​
- Generate a unique filename using UUID
- Create a new storage instance with that path
- Return both the storage and path for cleanup
Hint: Use Uuid::new_v4() for unique identifiers.
Technical Details​
JSON File Format:​
The storage saves data as a JSON object where keys are record IDs:
{
"uuid-1": {
"id": "uuid-1",
"name": "Alice",
"embedding": [0.1, 0.2, ...],
"created_at": "2024-01-01T12:00:00Z",
"metadata": {}
},
"uuid-2": { ... }
}
Concurrency Handling:​
- Uses
Mutex<HashMap>for thread-safe access to in-memory data - Locks are held briefly during read/write operations
- File I/O is synchronized through the mutex
Error Handling:​
- Gracefully handles missing files (starts with empty storage)
- Deals with corrupted JSON (logs warning, starts fresh)
- Proper error propagation using
Result<T>
Testing​
The provided tests verify:
- Basic Storage: Can store and retrieve records
- Sorting: Results are returned in correct order
- Limits: Respects k-parameter for top-k queries
- Empty Storage: Handles empty storage gracefully
Run tests with:
cargo test
File Management​
The storage system:
- Auto-creates directories as needed
- Handles missing files gracefully
- Overwrites files completely on each save (simple but safe)
- Uses pretty-printed JSON for human readability
Production Considerations​
This simple file-based approach works well for:
- Development and Testing: Easy to inspect and debug
- Small Datasets: Hundreds to thousands of embeddings
- Single-User Applications: No concurrent access needed
For production systems, consider:
- Database Storage: PostgreSQL with pgvector extension
- Vector Databases: Qdrant, Pinecone, Weaviate
- Concurrent Access: Proper locking mechanisms
- Backup Strategies: Regular data backups
Next Steps​
After completing this exercise, you'll be ready to:
- Implement similarity search and retrieval (Exercise 05)
This storage foundation is crucial for the face authentication system's ability to persist and retrieve user embeddings efficiently.