File Systems and Distributed Storage

Local filesystems, NFS/NAS, distributed filesystems (HDFS, GlusterFS, Ceph), and how storage is architected in large-scale data platforms.

Advanced · 11 min read

Storage Hierarchy

Layer Latency Size Examples
CPU registers <1 ns Bytes In-chip
L1/L2/L3 Cache 1–30 ns KB–MB In-chip SRAM
RAM 100 ns GB DDR5
NVMe SSD 100 µs TB Local SSD, AWS GP3
SATA SSD / HDD 1–10 ms TB–PB Object storage backends
Distributed storage 1–100 ms Exabytes HDFS, Ceph, S3
Tape / Archive Hours Unlimited Glacier, LTO tape

HDFS (Hadoop Distributed File System)

HDFS stores very large files across many commodity machines by splitting them into 128 MB blocks. A NameNode (metadata) tracks where each block lives. Multiple DataNodes store the actual blocks with a 3x replication factor.

  • Designed for write-once, read-many access patterns
  • Data locality — MapReduce/Spark moves computation to the data node, not vice versa
  • Rack awareness — replicas placed across racks for fault tolerance
  • NameNode is a SPOF — use High-Availability NameNode with Zookeeper in production

Ceph

Ceph is an open-source, unified distributed storage system that provides object storage (compatible with S3), block storage (RBD for VMs), and file storage (CephFS). Used by OpenStack, Kubernetes, and many cloud providers.

NOTE: Ceph's CRUSH algorithm maps data to storage devices deterministically without a central lookup table — removing the single-point-of-failure bottleneck that plagues other distributed filesystems.


Part of the System Design series on Tekivex. Browse all tutorials or explore our open-source products.