Back to docs
Recipe

LLM streaming patterns

Streaming token-by-token responses keeps perceived latency low and gives users something to read while the model finishes. This recipe walks through three production patterns we use on Meridian to wire SSE, chunked fetch, and backpressure into a Next.js 14 app router stack.

1. Server-Sent Events from a route handler

Return a ReadableStream with the text/event-stream content type. Each chunk should be framed as data: {json}\n\nso the EventSource on the client can parse it without buffering.

2. Chunked fetch with a TextDecoder

When you need POST bodies or custom headers, drop SSE and read the body as a stream directly. Decode bytes with a single shared decoder so multi-byte UTF-8 tokens never split across boundaries.

3. Backpressure and cancellation

Forward the client AbortSignal into the upstream model call so a closed tab stops billable generation. Wrap your queue in a TransformStream to throttle bursts from fast providers.

export async function POST(req: Request) {
  const stream = new ReadableStream({
    async start(controller) {
      const upstream = await fetch('https://llm.getnimbus.net/v1/chat/completions', {
        method: 'POST',
        signal: req.signal,
        headers: { 'content-type': 'application/json' },
        body: JSON.stringify({ model: 'azure/model-router', stream: true }),
      });
      const reader = upstream.body!.getReader();
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        controller.enqueue(value);
      }
      controller.close();
    },
  });
  return new Response(stream, {
    headers: { 'content-type': 'text/event-stream' },
  });
}