Recipe
Data lake architecture
Ingest, store, catalog, and query heterogeneous telemetry at scale without locking into a single schema.
Ingredients
- Object storage (S3-compatible) — raw zone
- Apache Iceberg or Delta Lake — table format
- Trino or DuckDB — federated query engine
- Apache Kafka — streaming ingest bus
- Hive Metastore or Unity Catalog — schema registry
- Parquet + Snappy — columnar storage layer
Layers
Bronze — raw ingest
Append-only, no schema enforcement. JSON, CSV, Avro land as-is with partition by ingest date.
Silver — cleansed
Deduplicated, type-coerced, nullable columns resolved. Parquet compaction runs hourly.
Gold — aggregated
Materialized views, denormalized fact tables, business-level aggregates refreshed on merge.
Query pattern
Trino federates across Iceberg tables in S3. Catalog points to Hive Metastore. DuckDB used for local exploratory work on Parquet snapshots. No ETL tool lock-in.
Published under Meridian Recipes. Adapt partitioning strategy to your query shape.