Recipe

Debezium Primer

Debezium is a distributed platform for change data capture (CDC) that turns your existing databases into event streams. This primer walks through wiring Debezium into a Meridian pipeline so downstream services react to row-level changes in milliseconds instead of polling every few minutes.

1. Why CDC over polling

Polling burns query budget, lags behind writes, and silently drops updates when two rows change between polls. Debezium tails the transaction log directly (binlog on MySQL, WAL on Postgres, oplog on Mongo) so every committed change becomes a Kafka event with full before/after payload, transaction id, and source LSN.

  • Sub-second propagation from commit to consumer.
  • Exactly-once semantics via Kafka offsets + LSN checkpoints.
  • Zero application code change on the source side.

2. Minimal connector config

Register a Postgres connector against Kafka Connect. The snapshot mode controls whether Debezium replays the existing table state before streaming live changes.

{
  "name": "meridian-orders-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "pg.meridian.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.dbname": "meridian",
    "topic.prefix": "meridian.cdc",
    "table.include.list": "public.orders,public.invoices",
    "plugin.name": "pgoutput",
    "snapshot.mode": "initial"
  }
}

3. Consuming events in Meridian

Each row change lands on a topic named meridian.cdc.public.orders. Meridian workers subscribe via the standard Kafka consumer group, deserialize the envelope, and route the after-state into the embedding pipeline or downstream aggregator. Use the op field to discriminate between create, update, delete, and snapshot reads.