Gemini routing

Choose the right Gemini model for every request.

gemini-3.1-pro

Sub‑2s native inference. Use when latency matters and the prompt fits within the model’s native context window. Ideal for real‑time chat, inline autocomplete, and API responses that block a user‑facing UI.

  • • No bridge overhead
  • • Best for <8k token prompts
  • • Highest throughput tier

gemini-3-flash

Cheapest per‑token cost. Use for bulk summarization, log analysis, or background jobs where a 300ms–800ms response is acceptable. Flash shares the same safety filters as Pro but runs on reduced‑precision hardware.

  • • ~40% cheaper than Pro
  • • Slightly higher latency p99
  • • Great for batch workloads

gemini-chrome

Bridge mode that carries your full Google profile — Drive files, Gmail context, Calendar awareness. Use when the prompt needs personal data or cross‑app reasoning. Latency is higher because the bridge resolves OAuth scopes before inference.

  • • Full Google identity
  • • Drive / Gmail / Calendar access
  • • Expect 2s–5s cold start

Rule of thumb: start with gemini-3.1-pro. Fall back to gemini-3-flash when cost dominates. Reach for gemini-chrome only when the prompt explicitly needs your Google data.