RECIPE

Avro primer

Avro is a compact, schema-first binary serialization format widely used in Kafka pipelines, lakehouse tables, and event logs. This primer walks through reading and writing Avro records inside Meridian without leaving the editor.

1Why Avro is different from JSON

JSON ships its keys with every record, which balloons payload size and forces consumers to guess types. Avro keeps the schema out-of-band, so each record on the wire is a tight binary blob. The schema is fetched once, then every downstream consumer decodes against it. The trade-off: you cannot read Avro without its schema, so register it somewhere durable.

Meridian auto-registers schemas to the workspace registry the first time it sees a new fingerprint, so you only pay that cost once per shape.

2Declaring a schema

Avro schemas are JSON documents that describe the record shape. The two fields you cannot omit are type and name. Everything else is optional but conventional.

{
  "type": "record",
  "name": "Order",
  "namespace": "shop.events",
  "fields": [
    {"name": "order_id",  "type": "string"},
    {"name": "amount_cents", "type": "long"},
    {"name": "currency",  "type": {"type": "enum", "name": "Ccy", "symbols": ["USD","EUR","GBP"]}},
    {"name": "placed_at", "type": {"type": "long", "logicalType": "timestamp-micros"}}
  ]
}

3Evolution rules

Avro's superpower is forward and backward compatibility. Add a field with a default and old readers still parse new writes. Drop a field with a default and new readers still parse old writes. Renaming requires an alias entry — never just edit the name in place or you will silently desync the registry.

  • Adding optional fields with defaults is safe in both directions.
  • Type widening (int to long, float to double) is backward-compatible only.
  • Removing required fields breaks every old reader — bump the major version.