Recipe
Delta Lake primer
Delta Lake brings ACID transactions, time travel, and schema enforcement to object storage. This recipe walks through the three building blocks Meridian uses when wiring a Delta-backed table into a streaming or batch pipeline, with copy-paste snippets for the common cases.
1. Create a managed Delta table
Start from a Spark session with the Delta extension wired in. The table lands on the configured object store and registers metadata in the catalog. Partitioning on a low-cardinality column keeps file pruning cheap.
from pyspark.sql import SparkSession
spark = (SparkSession.builder
.appName("meridian-delta")
.config("spark.sql.extensions",
"io.delta.sql.DeltaSparkSessionExtension")
.getOrCreate())
(spark.read.json("s3://meridian/raw/events/")
.write.format("delta")
.partitionBy("event_date")
.mode("overwrite")
.save("s3://meridian/lake/events"))2. Merge upserts without rewriting the world
The MERGE statement matches incoming change records against the target by primary key, updating mutated rows and inserting new ones in a single transaction. Useoptimizeafter high-volume merges so the small-file count stays bounded.
MERGE INTO lake.events t USING staging.cdc s ON t.event_id = s.event_id WHEN MATCHED AND s.op = 'D' THEN DELETE WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *;
3. Time travel for audit and replay
Every commit writes a numbered version into the transaction log. Querying a prior version recovers the exact rows seen by a downstream consumer at that point in time, which makes regression diffs and accidental-delete recovery a one-liner.
-- inspect the log DESCRIBE HISTORY lake.events; -- read a prior version SELECT * FROM lake.events VERSION AS OF 142; -- read at a wall-clock instant SELECT * FROM lake.events TIMESTAMP AS OF '2026-06-01 00:00:00';