Recipe / Storage Engines

LSM Tree Primer

Log-structured merge trees power most modern write-heavy databases: RocksDB, LevelDB, Cassandra, ScyllaDB, and Meridian's own event log layer. This primer walks through the three moving parts you need to understand before tuning one in production.

1. The MemTable

All writes hit an in-memory sorted structure first — usually a skip list or a red-black tree. Writes are O(log n) and append-only to a write-ahead log for durability. When the MemTable crosses a size threshold, it is frozen and flushed to disk as an immutable sorted run called an SSTable.

2. SSTables and Levels

SSTables are sorted, immutable files on disk. They are organized into levels (L0, L1, L2…) where each level is exponentially larger than the one above. A read may need to consult one file per level, so bloom filters are attached to each SSTable to skip files that cannot contain the requested key.

3. Compaction

Background workers merge overlapping SSTables, dropping tombstones and superseded versions. The two dominant strategies are leveled compaction (low read amp, high write amp) and size-tiered (the inverse). Pick based on whether your workload is read- or write-bound.

Minimal write path

def put(key, value):
    wal.append(key, value)      # durability
    memtable.insert(key, value) # O(log n)
    if memtable.size > THRESHOLD:
        sstable = memtable.flush_to_disk()
        levels[0].append(sstable)
        schedule_compaction()