Recipe
Data catalog & lineage
Build a searchable inventory of every dataset flowing through Meridian, with column-level provenance tracing back to source.
Overview
This recipe wires Meridian's ingestion pipeline into a lightweight catalog service. Every table, view, and file ingested is registered with schema snapshots, freshness metrics, and upstream lineage pointers. Downstream consumers query the catalog to discover available data and trace how a column was derived.
Ingredients
- Meridian ingestion pipeline (Kafka or S3 event source)
- Schema registry (Avro or Protobuf)
- Catalog store (PostgreSQL with JSONB)
- Lineage collector sidecar
- Search index (Elasticsearch or Meilisearch)
Steps
- Register sources. Configure Meridian to emit a catalog event on every successful ingestion batch. Include schema fingerprint, row count, and source URI.
- Capture lineage. Attach a lightweight sidecar that intercepts transformation steps and records input→output column mappings as a directed acyclic graph.
- Index for search. Push catalog entries into a full-text index keyed by table name, column name, and free-text description tags.
- Expose API. Serve a REST endpoint that returns dataset metadata, freshness timestamps, and the full upstream lineage tree for any column.
Pro tip
Store lineage edges as adjacency lists in JSONB. A single recursive CTE can resolve the full upstream path for any column in under 10 ms on a warm catalog.