Apache Beam primer
Apache Beam is a unified programming model for batch and streaming data pipelines. This recipe walks through wiring a Beam pipeline against Meridian, so you can express transformations once and run them on Dataflow, Flink, or Spark without rewrites.
1Install the SDK
Beam ships Python, Java, and Go SDKs. For most Meridian users, the Python SDK is the fastest path. Pin the version to avoid runner mismatches in production.
pip install apache-beam==2.55.0 pip install meridian-sdk
2Build a pipeline
A pipeline is a directed graph of PTransforms over PCollections. The example below reads prompts from a file, scores them with Meridian, and writes the results back.
import apache_beam as beam
from meridian import Client
client = Client(api_key="mrd_live_...")
with beam.Pipeline() as p:
(
p
| "Read" >> beam.io.ReadFromText("prompts.txt")
| "Score" >> beam.Map(lambda x: client.score(x))
| "Write" >> beam.io.WriteToText("scored.jsonl")
)3Pick a runner
The DirectRunner is fine for local testing. For production, point at Dataflow for autoscaling, Flink for low-latency streaming, or Spark when you already operate a cluster. The same Beam code runs on all three.
- DirectRunner — local dev, single process
- DataflowRunner — Google Cloud, fully managed
- FlinkRunner — self-hosted streaming workloads