Redis

Redis (REmote DIctionary Server) is an in-memory, key-value data store, often used as a database, cache, or message broker. Its high performance and low latency make it ideal for real-time applications.

Use Cases

Caching: Speed up application performance.
Session Store: Handle user sessions with TTL.
Rate Limiting: Use TTL + counters.
Pub/Sub: For real-time notifications.

Key Features

Time-to-Live (TTL)

Redis supports setting an expiration time for keys using commands like EXPIRE, PEXPIRE (for millisecond precision), or while creating a key with SETEX.

Expired keys are deleted automatically:

Lazy Expiration: Keys are removed during access if their TTL has expired.
Active Expiration: Redis runs periodic scans to remove expired keys proactively.

Commands:

EXPIRE key seconds: Set TTL in seconds.
PEXPIRE key milliseconds: Set TTL in milliseconds.
TTL key: Returns TTL in seconds.
PTTL key: Returns TTL in milliseconds.
PERSIST key: Remove the expiration, making the key permanent.

Atomic Operations

All commands are atomic due to Redis’s single-threaded architecture. Operations involving multiple keys can use Lua scripting or transactions to ensure atomicity.

Pub/Sub Messaging

Redis supports the publish/subscribe pattern:

Publisher: Sends messages to a channel.
Subscribers: Receive messages from subscribed channels.

Real-time notifications or event systems can leverage this feature. Commands:

PUBLISH channel message: Sends a message.
SUBSCRIBE channel: Subscribes to a channel.
UNSUBSCRIBE channel: Stops listening to a channel.

Transactions

Redis transactions group multiple commands into a single atomic unit:

Use MULTI to start a transaction.
Commands are queued until EXEC is called.
If an error occurs, the transaction can be discarded with DISCARD.

Limitations:

No rollback support. Once executed, changes are permanent.

Persistence

Redis provides multiple persistence options:

RDB Snapshots: Saves the dataset at a point in time. Triggered manually (SAVE) or periodically ( CONFIG SET save).
AOF (Append-Only File): Logs every write operation. Configurable rewrite policies reduce file size.

Lua Scripting

Redis supports executing Lua scripts directly on the server, reducing the overhead of multiple round trips. Scripts are executed atomically.

Commands:

EVAL "script" numkeys key1 key2 ... arg1 arg2 ...
SCRIPT LOAD and EVALSHA for script caching.

Streams (Advanced Data Structure)

Redis Streams provide an append-only log for message queues, real-time data ingestion, or event sourcing.

Commands:

XADD: Add data to a stream.
XREAD: Read from a stream.
XGROUP: Create consumer groups for processing.

Data Types

Strings

Binary-safe, holding text or binary data (e.g., JSON, images) up to 512MB per key.

Pros:

Simple and versatile.
Fast read/write operations.
Atomic operations like increment/decrement.

Cons:

Limited to single values.
Inefficient for grouped or complex data structures.

Performance:

O(1) for set/get operations.
Can handle millions of requests per second on commodity hardware.

Use Cases:

Caching (e.g., API responses, HTML fragments).
Counters (e.g., page views, likes).
Session management (e.g., SETEX for TTL).

Lists

Description: Ordered collections of strings, implemented as linked lists.

Pros:

Easy to manipulate with commands (LPUSH, RPUSH, LPOP, RPOP).
Supports range queries (LRANGE).

Cons:

Performance can degrade with very large lists (traversing O(n)).
Less memory-efficient compared to arrays.

Performance:

O(1) for LPUSH, RPUSH, LPOP, RPOP.
O(n) for range queries and operations on middle elements.

Use Cases:

Queues (e.g., task/job queues).
Recent logs or activity feeds.
Message buffers.

Sets

Description: Unordered collections of unique strings.

Pros:

Automatic deduplication.
Supports set operations like union, intersection, and difference.

Cons:

Unordered nature may not suit all use cases.
O(n) for large sets during certain operations.

Performance:

O(1) for add/remove/check membership (SADD, SREM, SISMEMBER).
O(n) for set operations (SUNION, SINTER, etc.).

Use Cases:

Tags and categories.
Unique user tracking (e.g., unique page visits).
Real-time leaderboards.

Sorted Sets

Sets where each element is associated with a score, stored in sorted order.

Pros:

Maintains ordering by score.
Efficient range queries based on scores.

Cons:

Slightly higher memory overhead due to scoring.
Write performance can degrade with extremely large sets.

Performance:

O(log(n)) for insertion, removal, and updates.
O(log(n) + m) for range queries, where m is the number of results.

Use Cases:

Leaderboards (e.g., games or rankings).
Rate-limiting by score (e.g., timestamps).
Priority queues.

Hashes

Collections of field-value pairs, like a lightweight key-value store.

Pros:

Efficient for storing small objects.
Access specific fields without fetching the entire object.

Cons:

Less efficient for very large datasets.
Nested nature adds slight complexity.

Performance:

O(1) for field operations (HSET, HGET).
O(n) for iterating all fields.

Use Cases:

User profiles or settings.
Storing small objects with multiple attributes.
Session state management.

Streams

Append-only log structure for handling data streams.

Pros:

Supports real-time data ingestion.
Consumer groups enable distributed processing.

Cons:

Higher complexity compared to simpler data types.
Requires careful management of stream size.

Performance:

O(1) for appending (XADD).
O(log(n)) for range queries (XRANGE, XREAD).

Use Cases:

Event sourcing.
Messaging systems.
Log aggregation.

Bitmaps

Space-efficient representation of binary data using bits.

Pros:

Extremely memory-efficient for binary flags.
Supports bit-level operations.

Cons:

Requires bitwise logic for complex operations.
Limited to binary (on/off) states.

Performance:

O(1) for bit-level operations (SETBIT, GETBIT).
Efficient for very large datasets.

Use Cases:

Feature toggles.
User activity tracking.
Simple binary states.

HyperLogLog

Probabilistic data structure for cardinality estimation.

Pros:

Requires minimal memory (~12 KB).
Fast and scalable for large datasets.

Cons:

Approximation: small error margin (~0.81%).
Limited to cardinality (no detailed data).

Performance:

O(1) for add/query operations.

Use Cases:

Unique visitor tracking.
Approximate counting of events.
Monitoring systems.

Geospatial Indexes

Handles geospatial data with radius queries and distance calculations.

Pros:

Efficient for geospatial queries.
Built-in commands for geolocation-based operations.

Cons:

Requires additional memory for coordinates.
Limited to geospatial use cases.

Performance:

O(log(n)) for GEOADD, GEORADIUS.

Use Cases:

Location-based services (e.g., nearest stores).
Tracking and routing.
Real-time geospatial analytics.

Common Interview Questions

What are Redis keys, and how do they differ from traditional database keys?

Redis keys are binary-safe, meaning they can contain any kind of data (e.g., strings, numbers, binary data). Unlike traditional database keys in relational databases, Redis keys are used to access data directly without a schema. However, choosing meaningful keys is essential for maintainability.

What is a Redis hash, and when would you use it over other data structures?

A Redis hash is a collection of field-value pairs, similar to a dictionary in programming. It’s efficient for representing objects with multiple attributes, such as a user profile. Hashes consume less memory than storing the same data in individual string keys, especially for a large number of small fields.

How does Redis achieve high performance?

Redis operates in-memory, avoiding disk I/O latency, and uses a single-threaded event loop for predictable performance. It uses efficient data structures like lists and hashes to optimize storage and retrieval. Additionally, it employs techniques like pipelining to batch commands and reduce network overhead.

What is Redis persistence, and what are its two primary methods?

Redis supports two persistence methods:

RDB (Redis Database Backup) creates snapshots of the dataset at specified intervals.
AOF (Append-Only File) logs every write operation for durability.

RDB is faster for recovery but may lose recent changes, while AOF provides better durability with a trade-off in performance.

How does Redis handle concurrency?

Redis is single-threaded for command execution, which avoids race conditions by design. Operations are atomic, meaning no two clients can modify the same data simultaneously. This simplicity often outperforms systems with complex locking mechanisms.

What is the difference between Redis pub/sub and Redis Streams?

Pub/Sub is a lightweight messaging system for real-time communication where messages are not stored, and only active subscribers receive them. Redis Streams is a more robust system that persists messages, supports replaying, and provides consumer group management, suitable for event sourcing and long-lived data.

Explain Redis eviction policies?

Redis uses eviction policies when it reaches its memory limit.

Policies include:

noeviction: Returns errors for write commands.
allkeys-lru: Removes least recently used keys.
allkeys-random: Removes random keys.
volatile-lru: Removes LRU keys with expiration.
volatile-random: Removes random expiring keys.

Proper configuration depends on your use case and data retention needs.

What are Redis modules, and why are they useful?

Redis modules extend its functionality, adding custom commands and data types. Examples include RediSearch for full-text search and RedisGraph for graph-based queries. They allow Redis to be tailored to specific use cases without altering its core.

How does Redis Cluster ensure high availability and fault tolerance?

Redis Cluster partitions data across multiple nodes using hash slots. It provides redundancy by replicating data to replicas. If a master node fails, a replica is promoted to master automatically, ensuring minimal downtime.

What is Redis Sentinel, and how does it differ from Redis Cluster?

Redis Sentinel monitors master and replica nodes, providing failover in case of master failure. It’s simpler and used for high availability in single-node setups. Redis Cluster, on the other hand, is designed for scaling with sharding and fault tolerance across multiple nodes.

How does Redis handle large keys or values?

Redis can store large keys/values, but it’s not optimal due to memory constraints and potential performance issues. Breaking large data into smaller keys or using data structures like lists and hashes is preferred to leverage Redis’s in-memory nature efficiently.

What are Redis pipelines, and when should you use them?

Pipelining batches multiple commands into a single network request to reduce latency. It’s beneficial for executing many independent operations, such as bulk inserts. However, it doesn’t ensure atomicity across commands, so use transactions when atomicity is required.

What is the difference between Redis and Memcached?

Redis supports complex data structures, persistence, and replication, while Memcached is simpler, focusing on key-value caching. Redis is better for use cases requiring data durability or advanced queries, while Memcached excels in scenarios demanding high-speed caching with simpler requirements.

How do transactions work in Redis?

Redis transactions are enclosed in MULTI and EXEC commands. They ensure a sequence of operations is executed atomically. However, Redis lacks rollback; if an error occurs in a command, the remaining commands are still executed.

What are Lua scripts in Redis, and how are they executed?

Lua scripts allow executing multiple commands atomically, reducing network latency and ensuring consistency. Redis executes Lua scripts in a single-threaded environment, guaranteeing no interruptions. Scripts are cached for reuse, enhancing performance.

What is the role of the `SCAN` command, and how does it differ from `KEYS`?

SCAN iterates keys incrementally, making it suitable for large datasets without blocking Redis. Unlike KEYS, which retrieves all matching keys at once, SCAN avoids performance degradation and memory overhead in production environments.

How does Redis handle memory fragmentation?

Redis relies on memory allocators like jemalloc to manage fragmentation. Periodic memory optimization commands, such as MEMORY PURGE, can reduce fragmentation. However, tuning the allocator and monitoring usage are essential for long-term stability.

What is the `HyperLogLog` data structure in Redis?

HyperLogLog is a probabilistic data structure used to estimate cardinality (unique elements) of a set. It consumes minimal memory (12 KB) regardless of set size but trades exactness for efficiency, making it ideal for analytics applications.

Explain how Redis handles write-heavy workloads.

Redis handles write-heavy workloads through in-memory operations, replication, and pipelining. Techniques like sharding data across clusters and using AOF with fsync policies tailored to workload requirements further optimize write performance and durability.

How do you secure a Redis instance?

Redis security involves binding to specific IPs, requiring passwords (AUTH), and configuring firewalls. TLS encryption protects data in transit. Since Redis lacks built-in authentication mechanisms per user, external tools or proxies are used for fine-grained access control.