Gemini routing
Choose the right Gemini model for every request.
gemini-3.1-pro
Sub‑2s native inference. Use when latency matters and the prompt fits within the model’s native context window. Ideal for real‑time chat, inline autocomplete, and API responses that block a user‑facing UI.
- • No bridge overhead
- • Best for <8k token prompts
- • Highest throughput tier
gemini-3-flash
Cheapest per‑token cost. Use for bulk summarization, log analysis, or background jobs where a 300ms–800ms response is acceptable. Flash shares the same safety filters as Pro but runs on reduced‑precision hardware.
- • ~40% cheaper than Pro
- • Slightly higher latency p99
- • Great for batch workloads
gemini-chrome
Bridge mode that carries your full Google profile — Drive files, Gmail context, Calendar awareness. Use when the prompt needs personal data or cross‑app reasoning. Latency is higher because the bridge resolves OAuth scopes before inference.
- • Full Google identity
- • Drive / Gmail / Calendar access
- • Expect 2s–5s cold start
Rule of thumb: start with gemini-3.1-pro. Fall back to gemini-3-flash when cost dominates. Reach for gemini-chrome only when the prompt explicitly needs your Google data.