Sharding & Replication

Understand database sharding strategies (horizontal, vertical, directory-based) and replication topologies (master-replica, multi-master) for scaling data systems.

Advanced · 16 min read

Why Shard?

When a single database server cannot handle the data volume or query load, you split the data across multiple servers — this is sharding (horizontal partitioning). Each shard holds a subset of the data.

Sharding Strategies

Strategy	How It Works	Pros	Cons
Range-based	Shard by value range (A-M, N-Z)	Simple, range queries easy	Hot spots if data is skewed
Hash-based	Hash the shard key, mod by shard count	Even distribution	Range queries need scatter-gather
Directory-based	Lookup table maps keys to shards	Flexible, custom placement	Lookup service is a SPOF
Geo-based	Shard by geographic region	Data locality	Cross-region queries are expensive

Sharded Cluster Architecture

Sharded cluster with 3 shards, each having a primary and replica.

Sharding vs Replication

Sharding (Partition)	Replication (Copy)
Splits data across nodes	Copies data to multiple nodes
Each shard has unique data	Each replica has the same data
Scales write throughput	Scales read throughput
Increases total storage capacity	Provides redundancy / fault tolerance
Complex: cross-shard queries, rebalancing	Simpler: replication lag is the main concern

Replication Topologies

Topology	How It Works	Consistency	Use Case
Single-leader	One primary accepts writes; replicas for reads	Strong (sync) or Eventual (async)	Most OLTP workloads
Multi-leader	Multiple primaries accept writes	Conflict resolution needed	Multi-region deployments
Leaderless	Any node accepts reads and writes	Quorum-based (R + W > N)	Cassandra, DynamoDB

Choosing a Shard Key

The shard key determines which shard stores each record. A bad shard key creates hot spots (one shard gets most of the traffic). Good shard keys have high cardinality and even distribution.

CAUTION: Avoid using timestamps as shard keys! All recent writes would go to the same shard. Instead, use a hash of the user ID or a compound key like user_id + timestamp.

Good shard keys: user_id, order_id, tenant_id
Bad shard keys: timestamp, country (low cardinality), auto-increment ID

Rebalancing Shards

When you add or remove shards, data must be redistributed. Consistent hashing minimizes data movement by only moving keys that map to the changed portion of the hash ring. Virtual nodes improve balance further.

Key Takeaways

Use replication for read scaling and HA; use sharding for write scaling and storage.
Choose shard keys with high cardinality and even distribution.
Consistent hashing minimizes data movement during rebalancing.
Combine sharding + replication for both scale and fault tolerance.

Part of the System Design series on Tekivex. Browse all tutorials or explore our open-source products.