Caching
Caching is a mechanism to store data temporarily for quick access. Below are the most common caching strategies, with their descriptions, pros, cons, and typical use cases.
Caching Strategies
Cache Aside
The application retrieves data directly from the cache. If the data is not in the cache (cache miss), the application fetches it from the database, updates the cache, and returns the data. The cache here sits on the side and the application has complete control over it.
Pros
- Flexible: Cache updated only when needed.
- Prevents stale data when combined with appropriate eviction policies.
- Simple to implement.
- Resilient to cache failures as a failure will only increase latency and not break if cache is down
Cons
- Cache misses may cause latency spikes.
- Requires manual cache maintenance.
- Not ideal for write-heavy workloads.
When to Use
- For read-heavy applications where data changes infrequently (e.g., product catalogs, user profiles).
- Common applications would be shopping carts, session management
Read Through
The application fetches data only from the cache. The cache is responsible for loading data from the database on a cache miss. The main difference between the cache aside is that in this case the library usually supports and manages the cache.
Pros
- Centralized data access logic.
- Simplifies the application code.
- Can prefetch data to reduce latency via warmup
Cons
- Higher complexity in cache implementation.
- Potential stale data if not managed well.
- The data model within the cache must be the same as the ony on the database
When to Use
- When you want centralized control over data retrieval (e.g., CDN-backed systems).
- Often used in database for query optimizations when there is tight control over the system
- Common for API request caching
Write Through
Data is written to both the cache and the database simultaneously. Ensures cache is always consistent with the database.
Pros
- Ensures cache consistency.
- Reduces latency for subsequent reads.
Cons
- Higher write latency (data is written twice).
- May write unnecessary data to the cache.
When to Use
- For applications requiring strong consistency and frequent reads (e.g., session data, configuration settings).
Write Behind
Data is written only to the cache initially. The cache updates the database asynchronously in the background. The key here is that the cache writing and database writing are decoupled giving us the performance boost.
Pros
- Low write latency.
- Efficient for write-heavy workloads.
Cons
- Risk of data loss if the cache fails before syncing.
- Requires a robust mechanism to handle sync failures.
When to Use
- For applications where write performance is critical, and eventual consistency is acceptable (e.g., logging, analytics).
Choosing a Strategy
- Evaluate read/write patterns: Read-heavy workloads benefit from strategies like Cache Aside or Read Through, while write-heavy workloads suit Write Back.
- Consider consistency requirements: Choose Write Through for strong consistency, and Write Back for eventual consistency.
- Analyze latency sensitivity: Refresh Ahead minimizes latency for frequently
Common Caching Issues
- Cache Inconsistency: When cached data is stale or out of sync with the source of truth, leading to incorrect results.
- Expiration Issues: Setting inappropriate TTLs (too short or too long) can result in excessive cache misses or stale data.
- Cache Stampede: Many clients simultaneously miss a cache entry and flood the backend with requests, causing performance degradation.
- Over-caching: Caching data that changes frequently can result in overhead and negates the benefits of caching.
- Under-caching: Failing to cache frequently used or expensive-to-compute data reduces performance benefits.
- Eviction Policy Problems: Poor choices in eviction strategies (e.g., LRU, FIFO) may lead to important data being evicted prematurely.
- Memory Overhead: Insufficient cache size can lead to frequent evictions, while excessive size can waste resources.
- Distributed Cache Coordination: Challenges arise in ensuring consistency and availability across nodes in distributed systems.
- Security Issues: Sensitive data might inadvertently be cached and exposed.
- Cache Penetration: Repeated requests for nonexistent keys can bypass the cache entirely, overloading the backend.
Consistent Hashing
Consistent hashing is a key distribution strategy used in caching systems to evenly and efficiently distribute data across nodes, especially in distributed environments. Unlike traditional hash functions, consistent hashing minimizes data reallocation when nodes are added or removed. It maps both keys and cache nodes to a circular hash space and assigns each key to the closest node (clockwise) on the ring. When a node changes, only the keys mapped to that node need to be reassigned, reducing disruption. This approach enhances scalability, fault tolerance, and load balancing, making it ideal for distributed caches like Memcached or Redis.
Cache Eviction Strategies
Least Recently Used (LRU)
Description:
LRU evicts the least recently accessed item when the cache reaches its capacity. It is typically implemented using a
combination of a hash map (for O(1) lookup) and a doubly linked list (for maintaining access order). The hash map stores
references to the linked list nodes, and the linked list ensures efficient addition and removal of elements. This
approach balances performance and memory usage, making LRU a popular choice in caching systems.
When to Use:
Use LRU when the most recently accessed items are likely to be reused soon. Practical examples include:
- Web Browsers: Cache pages or resources the user recently visited.
- Database Query Caches: Retain results of queries that are likely to be repeated.
- Operating Systems: Manage page replacement in memory systems.
Least Frequently Used (LFU)
Description:
LFU evicts items that have been accessed the least number of times. It is typically implemented using a frequency
counter associated with each item, along with data structures like a hash map and priority queue or frequency list to
organize items by their access counts efficiently. Advanced implementations may combine LFU with other policies to
mitigate stale frequency bias.
When to Use:
Use LFU when access frequency over time is a reliable indicator of usefulness. Practical examples include:
- Recommendation Systems: Cache frequently viewed items or products.
- Machine Learning Model Inference: Retain outputs of frequent predictions.
First In, First Out (FIFO)
Description:
FIFO evicts the oldest item added to the cache, regardless of access patterns. It is simple to implement using a queue
data structure where items are added to the back and removed from the front as the cache fills up. FIFO is deterministic
and has low computational overhead.
When to Use:
FIFO works well in time-ordered access patterns. Practical examples include:
- Job Queues: Manage tasks in a fair, sequential manner.
- Sensor Data Buffers: Store a fixed number of recent readings.
Time-To-Live (TTL)
Description:
TTL-based eviction removes items from the cache once their predefined lifetime expires. It typically uses a timestamp
associated with each item and periodically checks whether any items have expired. Efficient implementations often rely
on priority queues or timers to track expiration.
When to Use:
TTL is ideal for volatile data that becomes stale over time. Practical examples include:
- DNS Caches: Refresh IP-to-domain mappings periodically.
- Authentication Systems: Invalidate session tokens after a fixed duration.
Most Recently Used (MRU)
Description:
MRU evicts the most recently accessed item, assuming older data is more likely to be reused. It uses similar data
structures to LRU (hash map + linked list) but prioritizes recent items for eviction rather than retention.
When to Use:
MRU is suitable for workloads where older data has greater value. Practical examples include:
- Multimedia Streaming: Retain less recently accessed files for potential replay.
- Inventory Management: Cache older, less-demanded stock details for analytics.
Adaptive Replacement Cache (ARC)
Description:
ARC dynamically balances between recency (LRU) and frequency (LFU) by maintaining separate lists for recently accessed
and frequently accessed items. It adapts to changing access patterns without manual tuning, offering better performance
in mixed workloads.
When to Use:
ARC is ideal for systems with unpredictable or mixed access patterns. Practical examples include:
- Database Buffer Pools: Optimize read/write operations.
- Distributed Caching Systems: Handle varying data usage across clients.
Segmented LRU (SLRU)
Description:
SLRU divides the cache into segments, typically “protected” and “probationary.” Items start in the probationary segment
and move to the protected segment if accessed frequently. It combines LRU behavior with prioritization for frequently
accessed items.
When to Use:
SLRU is effective for scenarios with both recency and frequency needs. Practical examples include:
- Web Proxies: Cache content based on access and popularity.
- File Systems: Retain recently and frequently accessed files for performance.
Multi-tier caching
In large-scale distributed systems, multi-tier caching is a design pattern that enhances performance, scalability, and cost efficiency by organizing caches into multiple layers or tiers, each optimized for specific roles. This approach is particularly critical for systems serving high volumes of traffic with diverse data access patterns.
Edge Cache (CDNs)
Closest to end-users, often implemented with Content Delivery Networks (e.g., Akamai, Cloudflare). Caches static and semi-dynamic content geographically near users to reduce latency and network load. Ideal for caching frequently requested resources like images, videos, or APIs.
Regional/Intermediate Cache
Deployed within data centers or regional hubs. Aggregates requests from multiple edge locations, reducing load on backend services. Used for session data, intermediate computation results, or less frequently accessed resources.
Application-Level Caches
In-memory caches (e.g., Redis, Memcached) directly accessible by application servers. Stores application-specific or user-session data for ultra-low latency access. Often serves as the first line of caching for query results, computed views, or application-specific state.
Backend Cache
Operates closer to data stores like databases or file systems. Reduces load on the primary storage layer, often with solutions like database query result caching or key-value stores. Can include specialized solutions like sharded caches for handling large datasets.
Workflow in a Distributed System
- Requests pass through the cache tiers in sequence, starting from the edge cache and moving deeper toward the data source.
- Cache hits are served from the closest tier, minimizing latency.
- Cache misses propagate downstream, with data fetched from the source, updated in intermediate tiers, and potentially pushed upstream for future requests.
This hierarchical design ensures high availability, low latency, and cost-effective use of resources, especially in environments handling global traffic or massive data loads.