Design a Cloud Storage Solution (Google Drive / Dropbox)

File chunking, deduplication, versioning, sync protocol, conflict resolution, and sharing — designing a scalable cloud file storage system.

Advanced · 46 min read

Requirements

  • Functional: Upload/download files; sync across devices; share files/folders; version history
  • Non-functional: 1B users; 10M daily active; max file size 50 GB; 99.99% availability; strong consistency for metadata

File Chunking

Split files into 4 MB chunks. Each chunk is hashed (SHA-256). Benefits: (1) resume interrupted uploads, (2) only upload changed chunks on edits, (3) deduplicate identical chunks across all users.

Component Technology Role
Metadata DB PostgreSQL Files, folders, chunks, versions, sharing permissions
Chunk Storage S3 + CDN Store chunk bytes keyed by SHA-256 hash
Block Service Custom service Chunk, hash, upload, deduplicate
Sync Service Long poll / WebSocket Notify devices of remote changes
Cache Redis Hot chunk metadata; delta calculation

Deduplication

Before uploading a chunk, check if a chunk with that hash already exists in storage. If yes, just record the reference — don't upload the bytes. This is content-addressable storage. Dropbox reports 40–70% storage savings from cross-user deduplication.

Conflict Resolution

When two devices edit the same file offline, a conflict occurs. Strategy: last-writer-wins with conflict copy — accept both edits; create a "Conflicted copy" file so no data is lost; surface the conflict to the user.


Part of the System Design series on Tekivex. Browse all tutorials or explore our open-source products.