Back to DocsRecipe

Custom eval runner

Ship a bespoke evaluation harness that scores model outputs against your own rubrics — no third-party eval framework required.

Overview

Meridian lets you define a custom runner that invokes your model, collects completions, and applies scoring logic you control. The runner is a single TypeScript module that exports a standard interface — drop it into your project and Meridian handles orchestration, caching, and result storage.

Runner contract

export interface EvalRunner {
  id: string;
  run(input: EvalInput): Promise<EvalResult>;
}

type EvalInput = {
  prompt: string;
  model: string;
  params?: Record<string, unknown>;
};

type EvalResult = {
  score: number;       // 0.0 – 1.0
  latencyMs: number;
  output: string;
  metadata?: Record<string, unknown>;
};

Wiring it up

Place your runner in evals/runners/ and reference it by id in your eval config. Meridian discovers runners automatically at build time — no manual registration needed.

Built-in scoring helpers

  • exactMatch — string equality
  • containsAll — substring presence
  • levenshtein — normalized edit distance
  • jsonSchema — structural validation

Next step

Read the eval config reference to connect your runner to a test suite.