Gemini routing

Choose the right Gemini model for every request.

gemini-3.1-pro

Sub‑2s native inference. Use when latency matters and the prompt fits within the model’s native context window. Ideal for real‑time chat, inline autocomplete, and API responses that block a user‑facing UI.

• No bridge overhead
• Best for <8k token prompts
• Highest throughput tier

gemini-3-flash

Cheapest per‑token cost. Use for bulk summarization, log analysis, or background jobs where a 300ms–800ms response is acceptable. Flash shares the same safety filters as Pro but runs on reduced‑precision hardware.

• ~40% cheaper than Pro
• Slightly higher latency p99
• Great for batch workloads

gemini-chrome

Bridge mode that carries your full Google profile — Drive files, Gmail context, Calendar awareness. Use when the prompt needs personal data or cross‑app reasoning. Latency is higher because the bridge resolves OAuth scopes before inference.

• Full Google identity
• Drive / Gmail / Calendar access
• Expect 2s–5s cold start

Rule of thumb: start with gemini-3.1-pro. Fall back to gemini-3-flash when cost dominates. Reach for gemini-chrome only when the prompt explicitly needs your Google data.