Recipe

Embedding search patterns

Semantic search turns a question into a vector, then finds the closest matching documents from a pre-embedded corpus. This recipe walks through the three-step pattern that works for almost any knowledge base: embed once, store, and query against the same embedding model used at index time.

1. Pick one embedding model and stick with it

Vectors from different models are not comparable. If you re-embed queries with a newer model, you must re-embed the entire corpus or scores become meaningless. Pin the model in code and bump it deliberately as a full re-index.

2. Batch the index build

Meridian's batch embed endpoint accepts up to 2,048 inputs per call and is billed at the same per-token rate as single calls. For a 10,000-doc corpus that is roughly five round trips instead of ten thousand, which is the difference between minutes and hours.

3. Query, rerank, then generate

Top-k vector hits are a recall layer, not a final answer. Pull twenty candidates, rerank them with a cross-encoder or a small chat model, then pass the top three or four into your generation prompt. Recall plus rerank beats raw similarity on every internal benchmark we have run.

embedding-search.ts

import { Meridian } from '@meridian/sdk';

const meridian = new Meridian({ apiKey: process.env.MERIDIAN_KEY });

// 1. Embed the corpus once at index time
const docs = await loadDocs();
const embedded = await meridian.embed.batch({
  model: 'meridian/embed-large-v1',
  inputs: docs.map((d) => d.body),
});

// 2. Persist vectors in your store of choice
await vectorStore.upsert(
  embedded.map((vec, i) => ({
    id: docs[i].id,
    vector: vec.embedding,
    metadata: { title: docs[i].title },
  })),
);

// 3. At query time, embed the question and search
const query = await meridian.embed.single({
  model: 'meridian/embed-large-v1',
  input: 'How do I rotate my API key?',
});

const hits = await vectorStore.query({
  vector: query.embedding,
  topK: 5,
});

Back to all docs