Consistent hashing is a distributed systems technique that maps both data keys and server nodes onto a shared virtual ring, minimizing data redistribution when nodes are added or removed. It is a foundational building block in scalable caches, databases, and load balancers.
Consistent hashing places both cache/storage nodes and data keys onto a circular hash space, typically ranging from 0 to 2^32-1, often called a 'hash ring'. Each key is owned by the first node encountered when moving clockwise around the ring from the key's hash position. This approach decouples the number of nodes from the modulo arithmetic used in naive hashing, so the mapping does not collapse when the cluster size changes.
With naive modular hashing (key % N), adding or removing a single node changes N and forces nearly all keys to remap to different nodes, causing a massive cache miss storm or data migration event. Consistent hashing limits remapping to only K/N keys on average, where K is the total number of keys and N is the number of nodes. This property makes it essential for systems like Amazon DynamoDB, Apache Cassandra, and distributed CDNs that must scale elastically without downtime.
First, each node is hashed using a deterministic function (e.g., MD5 or SHA-1 applied to its IP or hostname) to place it at one or more positions on the ring. Each incoming key is hashed with the same function, and the system performs a clockwise lookup to find the nearest node — typically implemented with a sorted array and binary search in O(log N) time. When a node is added, only the keys between the new node and its predecessor are migrated to it; when a node is removed, only its keys move to its successor.
A single physical node assigned one ring position often leads to uneven load because hash positions cluster unpredictably. The standard solution is virtual nodes: each physical node is assigned multiple (e.g., 100–200) hash positions under different labels such as 'node-A-1', 'node-A-2', etc. This spreads load more uniformly across the ring and also allows heterogeneous hardware to be weighted — a more powerful server simply receives more virtual node slots.
Even with vnodes, a poorly chosen hash function or skewed key distribution can create hot spots where one node handles disproportionate traffic. Most production systems pair consistent hashing with a replication strategy — Cassandra, for example, replicates each key to the next N successor nodes on the ring for fault tolerance. Always benchmark your key distribution in staging; a hash function like xxHash or MurmurHash3 generally provides better uniformity than MD5 for non-cryptographic use cases.
Use a well-tested library (e.g., ketama for Memcached, or built-in implementations in client SDKs for Redis Cluster and Cassandra) rather than rolling your own ring logic. Tune the number of virtual nodes based on your cluster size — too few causes imbalance, too many wastes memory and slows lookups. Consistent hashing is the right tool when you need horizontal scalability with minimal data movement; for small, static clusters, simpler strategies like random partitioning or range-based sharding may suffice.
© RM Full Stack & AI Engineer · All guides · Roadmaps · Open the app