Async LLM Job Queue
Long-running LLM calls block HTTP responses, time out at the edge, and leave users staring at spinners. This recipe shows how to offload generation work to a background queue, return a job id immediately, and let clients poll or stream results when they are ready.
1. Submit the job
Accept the prompt at a POST endpoint, write a row to your jobs table with status "pending", enqueue the id on a Redis stream or SQS, and return the id. The request finishes in under 50ms, well below any edge timeout.
POST /api/jobs
{ "prompt": "Summarize this PDF..." }
=> 202 Accepted
{ "id": "job_01HXYZ...", "status": "pending" }2. Process in a worker
A separate worker process pulls ids off the queue, calls Meridian with the full reasoning budget it needs, and writes the final completion plus token counts back to the job row. Workers scale horizontally without touching your web tier.
while (id = await queue.pop()) {
const job = await db.jobs.get(id);
const out = await meridian.complete({
model: "azure/model-router",
prompt: job.prompt,
max_tokens: 4096,
});
await db.jobs.update(id, {
status: "done",
output: out.text,
tokens: out.usage,
});
}3. Poll or stream the result
Clients hit GET /api/jobs/:id every second or open an SSE channel that pushes status transitions. Once status is "done" the output ships in the same response. For interactive UIs, write partial tokens to Redis as the worker receives them and fan them out to subscribers.
GET /api/jobs/job_01HXYZ...
=> 200 OK
{
"id": "job_01HXYZ...",
"status": "done",
"output": "The PDF describes...",
"tokens": { "in": 2104, "out": 312 }
}