Third-Party Libraries Cheatsheet for Face Auth Workshop
This cheatsheet covers the essential parts of candle_*, image, and serde libraries used across exercises 01-05.
๐ฅ Candle Frameworkโ
Basic Tensor Operationsโ
Creating Tensorsโ
// From vector with shape
let data: Vec<u8> = image_data;
let tensor = Tensor::from_vec(data, (height, width, channels), &Device::Cpu)?;
// From array/slice
let mean = [0.485, 0.456, 0.406];
let mean_tensor = Tensor::new(&mean, &Device::Cpu)?;
// Reshape tensor
let reshaped = tensor.reshape((3, 1, 1))?;
Tensor Shape Manipulationโ
// Permute dimensions (e.g., HWC to CHW)
let tensor = tensor.permute((2, 0, 1))?;
// Add batch dimension
let batched = tensor.unsqueeze(0)?;
// Remove singleton dimensions
let squeezed = tensor.squeeze(0)?.squeeze(0)?;
Data Type Conversionsโ
// Convert to different data types
let float_tensor = tensor.to_dtype(DType::F32)?;
// Scale values (e.g., 0-255 to 0-1)
let normalized = tensor.to_dtype(DType::F32)? / 255.0;
Mathematical Operationsโ
Broadcasting Operations: These automatically expand tensors to compatible shapes for element-wise operations.
// Broadcasting rules: smaller tensors are "stretched" to match larger ones
// Example: (3, 224, 224) + (3, 1, 1) = (3, 224, 224)
// The (3, 1, 1) tensor gets repeated across all 224x224 pixels
let result = tensor1.broadcast_add(&tensor2)?; // Addition
let result = tensor1.broadcast_sub(&tensor2)?; // Subtraction
let result = tensor1.broadcast_mul(&tensor2)?; // Multiplication
let result = tensor1.broadcast_div(&tensor2)?; // Division
// Matrix multiplication (no broadcasting - strict dimension requirements)
let result = tensor_a.matmul(&tensor_b)?;
// Transpose swaps two dimensions
let transposed = tensor.transpose(0, 1)?; // Swap dims 0 and 1
How Broadcasting Works:
- Dimensions are aligned from the right (trailing dimensions first)
- Missing dimensions are treated as size 1
- Dimensions of size 1 are stretched to match the other tensor
- Example:
(256,)+(3, 224, 224)becomes(1, 1, 256)+(3, 224, 224)โ(3, 224, 224)
Reduction Operationsโ
What keepdim means: Maintains the original number of dimensions by keeping reduced dims as size 1.
// sum_keepdim example:
// Input: (2, 3, 4) tensor
// .sum(1) โ (2, 4) # dimension 1 disappears
// .sum_keepdim(1) โ (2, 1, 4) # dimension 1 becomes size 1
let sum = tensor.sum_keepdim(1)?; // Sum along dim 1, keep dim structure
// Element-wise operations
let sqrt_tensor = tensor.sqrt()?; // โx for each element
let squared = tensor.sqr()?; // xยฒ for each element
Why keepdim matters: Preserves tensor shape for broadcasting operations. Without it, you can't broadcast the result back to the original tensor shape.
Extracting Valuesโ
// Single scalar value
let scalar: f32 = tensor.to_vec0()?;
// 1D vector
let values: Vec<f32> = tensor.to_vec1()?;
// Flatten all dimensions and get vector
let flattened: Vec<f32> = tensor.flatten_all()?.to_vec1()?;
L2 Normalization (Essential for Embeddings)โ
What it does: Scales vectors to unit length while preserving direction. Essential for cosine similarity.
Mathematical Formula: normalized_vector = vector / ||vector||โ
Building Blocks:
// Step by step operations you'll need:
// 1. Square each element
let squared = tensor.sqr()?;
// 2. Sum along dimension (keeping dimensions for broadcasting)
let sum_squared = tensor.sum_keepdim(1)?;
// 3. Take square root to get L2 norm
let norm = sum_squared.sqrt()?;
// 4. Divide original by norm (broadcasting)
let normalized = tensor.broadcast_div(&norm)?;
Why use it: After L2 normalization, ||v||โ = 1, which means:
- Cosine similarity becomes just a dot product
- Removes magnitude bias - focuses only on direction
- Essential for fair comparison of embeddings
Cosine Similarity Building Blocksโ
Mathematical Formula: cosine_similarity = (A ยท B) / (||A|| ร ||B||)
Key Operations:
// Matrix multiplication for dot product
let dot_product = tensor_a.matmul(&tensor_b.transpose(0, 1)?)?;
// Transpose for proper matrix multiplication
let transposed = tensor.transpose(0, 1)?;
// Extract scalar from tensor
let scalar_value = tensor.squeeze(0)?.squeeze(0)?.to_vec0::<f32>()?;
// For Vec<f32> similarity (alternative approach):
let dot: f32 = vec_a.iter().zip(vec_b.iter()).map(|(x, y)| x * y).sum();
let mag_a: f32 = vec_a.iter().map(|x| x * x).sum::<f32>().sqrt();
let mag_b: f32 = vec_b.iter().map(|x| x * x).sum::<f32>().sqrt();
Model Loading & Usageโ
Core Concepts: Loading pre-trained models and running inference.
Hugging Face Hub API Building Blocks:
// Download model from Hugging Face Hub
let api = hf_hub::api::sync::Api::new()?;
let api = api.model("model-name-here".to_string());
let model_file = api.get("model.safetensors")?;
// Create VarBuilder from downloaded weights
let vb = unsafe {
VarBuilder::from_mmaped_safetensors(&[model_file], DType::F32, &device)?
};
// Load specific model architectures (examples):
// ConvNeXt: convnext::convnext_no_final_layer(&config, vb)?
// Other models have similar patterns
Inference Building Blocks:
// Handle batch dimensions
let batched_input = if input.dim(0)? == 3 { // Single image (C,H,W)
input.unsqueeze(0)? // Add batch: (1,C,H,W)
} else {
input.clone() // Already batched (N,C,H,W)
};
// Forward pass through model
let output = model.forward(&batched_input)?;
// Common model interfaces:
// - Module::forward() for neural networks
// - Func::forward() for functional models
Key Points:
VarBuilder: Loads pre-trained weights from.safetensorsfilesModule::forward(): Standard interface for neural network inference- Batch Dimension: Most models expect
(batch_size, channels, height, width) - Device Management: Ensure model and input tensors are on same device
๐ผ๏ธ Image Processingโ
Dependenciesโ
[dependencies]
image = "0.25.6"
Essential Importsโ
use image::{ImageReader, ImageFormat};
Loading and Processing Imagesโ
// Load image from file path
let img = image::ImageReader::open(path)?
.decode()?;
// Resize image (multiple resize methods)
let img = img.resize_to_fill(
224, // width
224, // height
image::imageops::FilterType::Triangle, // filter type
);
// Convert to RGB8 format
let img = img.to_rgb8();
// Extract raw pixel data
let data: Vec<u8> = img.into_raw(); // Returns Vec<u8> with RGB values
Filter Typesโ
// Available filter types for resizing
image::imageops::FilterType::Triangle // Good general purpose
image::imageops::FilterType::Lanczos3 // High quality
image::imageops::FilterType::Nearest // Fastest, pixelated
image::imageops::FilterType::CatmullRom // Sharp results
Key Concept: The reshape((3, 1, 1)) creates tensors that broadcast across all pixels:
- Original image:
(3, 224, 224)- 3 channels, 224ร224 pixels - Mean/Std:
(3, 1, 1)- 3 values, broadcasted to each pixel - Result: Each of the 224ร224 pixels gets normalized using its channel's specific mean/std
๐ฆ Serde (Serialization/Deserialization)โ
Defining Serializable Structsโ
// Required derives for JSON serialization
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct YourStruct {
// Common field types:
pub id: String, // String fields
pub name: String,
pub data: Vec<f32>, // Vector fields
pub timestamp: chrono::DateTime<chrono::Utc>, // DateTime fields
pub metadata: HashMap<String, String>, // HashMap fields
}
JSON Serializationโ
// Serialize to JSON string (pretty printed)
let json_string = serde_json::to_string_pretty(&records)?;
// Serialize to JSON string (compact)
let json_string = serde_json::to_string(&records)?;
// Write to file
std::fs::write("data.json", json_string)?;
JSON Deserializationโ
// Read from file
let content = std::fs::read_to_string("data.json")?;
// Handle empty files
if content.trim().is_empty() {
return Ok(Vec::new());
}
// Deserialize from JSON string
let records: Vec<EmbeddingRecord> = serde_json::from_str(&content)?;
Working with DateTimeโ
// Create current timestamp
let timestamp = chrono::Utc::now();
// DateTime automatically serializes to ISO 8601 string in JSON
Working with UUIDsโ
// Generate new UUID
let id = uuid::Uuid::new_v4().to_string();
๐ Performance Tipsโ
- Tensor Operations: Use broadcast operations instead of loops when possible
- Memory Management: Reuse tensors when possible, avoid unnecessary clones
- Model Loading: Cache loaded models, don't reload for each inference
- Image Processing: Consider batch processing multiple images at once
- Serialization: Use
serde_json::to_string_prettyfor debugging, regularto_stringfor production
โ ๏ธ Common Pitfallsโ
- Tensor Shapes: Always check tensor dimensions before operations
- Data Types: Be consistent with DType (F16 vs F32)
- Error Handling: Use
?operator and proper Result types - Empty Files: Always handle empty JSON files in deserialization
- Path Handling: Use proper path validation for file operations