max_tokensvsmax_completion_tokens
Understanding the two token-limit parameters in the OpenAI API and how Meridian handles both transparently.
Why two parameters?
OpenAI originally exposed a single max_tokens field to cap the total tokens in a response. With the introduction of reasoning models (o1, o3, o4-mini), the API gained a new field: max_completion_tokens. This newer parameter controls only the visible completion tokens, excluding internal reasoning tokens that reasoning models consume behind the scenes.
| Parameter | Applies to | Counts reasoning tokens? |
|---|---|---|
max_tokens | Legacy models (GPT-4, GPT-3.5) | N/A — no reasoning tokens |
max_completion_tokens | Reasoning models (o1, o3, o4-mini) | No — visible output only |
How Meridian handles it
Meridian acts as a transparent proxy. When your application sends either parameter — or both — the gateway passes them through to the upstream provider unchanged. You control which field to use based on the model you are targeting.
- If you send
max_tokensto a reasoning model, the provider may ignore it or treat it as an alias. Meridian does not rewrite the field. - If you send
max_completion_tokensto a legacy model, the provider will reject the request with a validation error. Meridian surfaces that error directly. - Sending both is valid for models that accept both. The provider decides precedence; typically
max_completion_tokenswins.
Recommendation
For new integrations, prefer max_completion_tokens when targeting reasoning models, and fall back to max_tokens for GPT-4 and earlier. Meridian does not enforce a default — you retain full control over token budgeting at the edge.