Recipe
Ollama Primer
Run open-weight LLMs locally, then expose them to Meridian as a drop-in OpenAI-compatible backend. Zero cloud cost, full data isolation, sub-200ms first token on consumer hardware.
1. Install & pull a model
Ollama ships a single binary for macOS, Linux, and Windows. After install, pull any model from the registry. Llama 3.1 8B is the sweet spot for a 16GB box; Qwen 2.5 14B if you have 24GB+ VRAM.
curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.1:8b ollama run llama3.1:8b "Hello"
2. Expose the OpenAI shim
Ollama serves an OpenAI-compatible endpoint at http://localhost:11434/v1. Point any SDK at it with a placeholder API key. Meridian routes requests transparently when you register the host as a custom upstream.
3. Wire it into Meridian
Open the Meridian dashboard, add a new upstream with base URL http://your-host:11434/v1, and tag it local. Route rules can then prefer the local model for cheap traffic and fall back to a hosted model on overflow.