Recipe
LLM Explainability Patterns
Production LLM systems are opaque by default. This recipe walks through three practical patterns Meridian customers use to surface why a model produced a given output — token-level confidence, attention probing, and chain-of-thought capture — without sacrificing throughput on the gateway.
1. Token-level confidence via logprobs
The cheapest form of explainability is asking the model for its own probability distribution. Pass logprobs: true and a small top_logprobs value to every chat completion you intend to audit. Meridian forwards the field to Azure's router transparently, so the cost overhead is a few percent on output tokens.
// Pattern: token-level attribution via logprobs
import { Meridian } from '@meridian/sdk';
const m = new Meridian({ apiKey: process.env.MERIDIAN_KEY });
const result = await m.chat.completions.create({
model: 'azure/model-router',
messages: [{ role: 'user', content: 'Why is the sky blue?' }],
logprobs: true,
top_logprobs: 5,
});
// Each token carries the model's confidence + alternates.
for (const tok of result.choices[0].logprobs.content) {
console.log(tok.token, Math.exp(tok.logprob).toFixed(3));
for (const alt of tok.top_logprobs) {
console.log(' alt:', alt.token, Math.exp(alt.logprob).toFixed(3));
}
}2. Counterfactual prompt probing
For higher-stakes outputs, run the same prompt with one input field ablated and compare. A material divergence tells you the field was load-bearing; a near-identical answer tells you the model ignored it. Meridian's model: azure/model-routeralias makes this cheap because routine probes route to small models while the canonical run goes to the full one.
3. Structured reasoning capture
For reasoning models (gpt-5 family, o4-mini, grok reasoning), set reasoning_effort to low or mediumand persist the visible chain alongside the final answer. Pair the stored trace with the logprobs from pattern 1 and you have a two-layer audit record: what the model considered, and how confident it was at each emitted token.