Broadway (Elixir) Primer
Broadway is a concurrent and multi-stage data ingestion framework for Elixir, built atop GenStage. It excels at consuming high-throughput data from sources like Kafka, RabbitMQ, SQS, and Google PubSub with built-in batching, backpressure, fault tolerance, and graceful shutdown.
1. Why Broadway
Traditional consumer loops require manual orchestration of concurrency, error handling, and backpressure. Broadway abstracts these concerns into a declarative pipeline: producers emit messages, processors transform them, and batchers group results for downstream sinks. The runtime supervises every stage and automatically applies retries.
- Backpressure-aware via GenStage demand
- Batching with size and timeout thresholds
- Rate limiting at the producer layer
- Graceful shutdown drains in-flight messages
2. Pipeline Anatomy
A Broadway pipeline is a single module implementing the Broadway behaviour. It declares a producer, a set of processors, and optional batchers. Each stage runs in its own supervised process tree, so a crash in one processor never collapses the whole pipeline.
3. Minimal Example
The snippet below sketches a pipeline that pulls from SQS, processes each message concurrently across ten workers, and batches results before persisting them.
defmodule MyApp.Pipeline do
use Broadway
def start_link(_opts) do
Broadway.start_link(__MODULE__,
name: __MODULE__,
producer: [
module: {BroadwaySQS.Producer, queue_url: "..."},
concurrency: 1
],
processors: [default: [concurrency: 10]],
batchers: [default: [batch_size: 50, batch_timeout: 2000]]
)
end
def handle_message(_, message, _) do
message
end
def handle_batch(_, messages, _, _) do
Enum.each(messages, &persist/1)
messages
end
defp persist(_msg), do: :ok
end