Chapter 7: Caching

Accessing data from its source — whether a database, a remote service, or persistent storage — takes time. Network round trips, disk reads, and computation all contribute to latency. A caching service stores recently or frequently accessed data in memory so that subsequent requests for the same data can be served faster. Caching is one of the most effective techniques for improving the performance and reducing the cost of distributed systems.

The fundamental insight behind caching is the principle of locality: data that was accessed recently is likely to be accessed again soon (temporal locality), and data near recently accessed data is also likely to be accessed (spatial locality). A well-designed cache exploits these patterns to serve a high percentage of requests from fast in-memory storage rather than slower backing stores.

Interface

caching/src/lib.rs The caching service provides four procedures. Get and set form the core read/write interface. Unlike the configuration service, the set operation accepts a time-to-live (TTL) parameter that controls how long the entry remains valid. A delete operation allows explicit cache invalidation, and a stats operation exposes operational metrics.

pub const GET_PROCEDURE: ProcedureId = 1;
pub const SET_PROCEDURE: ProcedureId = 2;
pub const DELETE_PROCEDURE: ProcedureId = 3;
pub const STATS_PROCEDURE: ProcedureId = 4;

#[derive(Debug, Serializable, Deserializable)]
pub struct GetArgs {
    pub key: String,
}

#[derive(Debug, Serializable, Deserializable)]
pub struct GetResult {
    pub value: String,
    pub hit: i32,
}

#[derive(Debug, Serializable, Deserializable)]
pub struct SetArgs {
    pub key: String,
    pub value: String,
    pub ttl_secs: i32,
}

#[derive(Debug, Serializable, Deserializable)]
pub struct StatsResult {
    pub hits: i32,
    pub misses: i32,
    pub size: i32,
}

The GetResult includes a hit field (1 for a cache hit, 0 for a miss), allowing clients to distinguish between a missing key and an empty value. This is important for avoiding cache stampedes, where many clients simultaneously fetch the same missing key from the backing store.

Implementation

caching/src/main.rs The cache server maintains an in-memory data structure that combines a HashMap for fast key lookups with a VecDeque for tracking access order. Each entry stores a value and an expiration time. This implements an LRU (Least Recently Used) eviction policy with time-based expiration.

const MAX_CAPACITY: usize = 10000;

struct CacheEntry {
    value: String,
    expires_at: Instant,
}

struct Cache {
    entries: HashMap<String, CacheEntry>,
    lru_order: VecDeque<String>,
    hits: i32,
    misses: i32,
    max_capacity: usize,
}

The get operation checks the expiration time before returning a value. If the entry has expired, it is removed and treated as a miss. On a hit, the key is moved to the front of the LRU queue, marking it as recently accessed:

fn get(&mut self, key: &str) -> Option<String> {
    if let Some(entry) = self.entries.get(key) {
        if entry.expires_at < Instant::now() {
            self.entries.remove(key);
            self.lru_order.retain(|k| k != key);
            self.misses += 1;
            return None;
        }
        let value = entry.value.clone();
        // Move to front of LRU
        self.lru_order.retain(|k| k != key);
        self.lru_order.push_front(key.to_string());
        self.hits += 1;
        Some(value)
    } else {
        self.misses += 1;
        None
    }
}

The set operation handles capacity management. When the cache is full and a new entry needs to be inserted, the least recently used entry (at the back of the deque) is evicted. This ensures the cache never exceeds its memory budget while keeping the most useful entries in memory:

fn set(&mut self, key: String, value: String, ttl_secs: i32) {
    let ttl = if ttl_secs > 0 {
        Duration::from_secs(ttl_secs as u64)
    } else {
        Duration::from_secs(3600) // Default 1 hour TTL
    };

    let entry = CacheEntry {
        value,
        expires_at: Instant::now() + ttl,
    };

    self.lru_order.retain(|k| k != &key);

    // Evict if at capacity
    while self.entries.len() >= self.max_capacity {
        if let Some(evicted_key) = self.lru_order.pop_back() {
            self.entries.remove(&evicted_key);
        } else {
            break;
        }
    }

    self.entries.insert(key.clone(), entry);
    self.lru_order.push_front(key);
}

A background task runs periodically to clean up expired entries. This prevents the cache from filling up with stale data that might never be accessed again:

// Background cleanup task for expired entries
let cleanup_cache = Arc::clone(&cache);
tokio::spawn(async move {
    loop {
        sleep(CLEANUP_INTERVAL).await;
        cleanup_cache.lock().await.cleanup_expired();
    }
});

Design Discussion

The LRU eviction policy is one of several strategies for managing cache capacity. LRU works well when access patterns exhibit temporal locality — recently accessed items are likely to be accessed again. Other strategies include LFU (Least Frequently Used), which favors items accessed many times even if not recently, and random eviction, which is simpler but less effective.

TTL-based expiration provides a safety net against serving stale data. Without TTLs, a cached value could persist indefinitely even after the backing store has been updated. The choice of TTL represents a trade-off: shorter TTLs reduce staleness but increase load on the backing store; longer TTLs improve hit rates but risk serving outdated data.

The hit/miss tracking exposed through the stats procedure is invaluable for operations. A healthy cache should have a high hit rate (often 90% or above). A sudden drop in hit rate might indicate a change in access patterns, a configuration error, or a capacity problem. These metrics are the kind of signals that the monitoring service (discussed later) can track and alert on.

In a production environment, caching is often deployed as a hierarchy: a local in-process cache for the hottest data, a shared cache service (like this one) for less-hot data, and the backing store for everything else. Each layer trades off latency, capacity, and consistency differently.