← Back to docs

Recipe

Ollama Primer

Run open-weight LLMs locally, then expose them to Meridian as a drop-in OpenAI-compatible backend. Zero cloud cost, full data isolation, sub-200ms first token on consumer hardware.

1. Install & pull a model

Ollama ships a single binary for macOS, Linux, and Windows. After install, pull any model from the registry. Llama 3.1 8B is the sweet spot for a 16GB box; Qwen 2.5 14B if you have 24GB+ VRAM.

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
ollama run llama3.1:8b "Hello"

2. Expose the OpenAI shim

Ollama serves an OpenAI-compatible endpoint at http://localhost:11434/v1. Point any SDK at it with a placeholder API key. Meridian routes requests transparently when you register the host as a custom upstream.

3. Wire it into Meridian

Open the Meridian dashboard, add a new upstream with base URL http://your-host:11434/v1, and tag it local. Route rules can then prefer the local model for cheap traffic and fall back to a hosted model on overflow.

Need a hosted fallback? See the Azure routing recipe next.