Seed + determinism

How Meridian uses the seed parameter and system_fingerprint header to produce reproducible outputs for evals and testing.

Overview

Every generation endpoint accepts an optional seed query parameter. When provided, Meridian seeds its internal PRNG with this value before sampling. Combined with the system_fingerprint request header, this guarantees byte-for-byte identical outputs across repeated calls — critical for deterministic evals, regression testing, and CI pipelines.

The seed parameter

Pass an integer in the range [0, 2^32 - 1] as a query parameter:

GET /v1/generate?seed=42&prompt=hello

The seed is fed into a ChaCha20-based deterministic RNG. All sampling operations — token selection, temperature scaling, top-p truncation — derive from this single seed. No entropy is mixed in from hardware sources when a seed is present.

Note: Omitting the seed parameter falls back to non-deterministic generation using hardware entropy. Outputs will vary across calls.

The system_fingerprint header

Determinism requires identical model weights, tokenizer, and runtime configuration. The system_fingerprint header pins all of these. Send the fingerprint returned from a previous response to lock the backend configuration.

GET /v1/generate?seed=42&prompt=hello
Host: api.getnimbus.net
system_fingerprint: fp_abc123def456

The fingerprint encodes a hash of the active model checkpoint, tokenizer vocabulary, and inference hyperparameters. If any of these change — a model update, a tokenizer patch — the fingerprint changes, and the server returns 409 Conflict with the new fingerprint in the response body.

Evals workflow

  1. Send a generation request with a fixed seed and capture the returned system_fingerprint from the response headers.
  2. Store the (seed, fingerprint, output) tuple as a golden test case.
  3. In CI, replay the request with the same seed and fingerprint header. Assert the output matches byte-for-byte.
  4. If the fingerprint has changed (model updated), update the golden tuple after verifying the new output is correct.

Rate limits & caching

Deterministic requests with identical (seed, fingerprint, prompt) tuples may be served from a short-lived cache (TTL: 60s). This reduces load during eval suite runs. Cache hits return the same response body and headers, including the original x-request-id.