Docs/max_tokens vs max_completion_tokens

`max_tokens`vs`max_completion_tokens`

Understanding the two token-limit parameters in the OpenAI API and how Meridian handles both transparently.

Why two parameters?

OpenAI originally exposed a single max_tokens field to cap the total tokens in a response. With the introduction of reasoning models (o1, o3, o4-mini), the API gained a new field: max_completion_tokens. This newer parameter controls only the visible completion tokens, excluding internal reasoning tokens that reasoning models consume behind the scenes.

Parameter	Applies to	Counts reasoning tokens?
`max_tokens`	Legacy models (GPT-4, GPT-3.5)	N/A — no reasoning tokens
`max_completion_tokens`	Reasoning models (o1, o3, o4-mini)	No — visible output only

How Meridian handles it

Meridian acts as a transparent proxy. When your application sends either parameter — or both — the gateway passes them through to the upstream provider unchanged. You control which field to use based on the model you are targeting.

If you send max_tokens to a reasoning model, the provider may ignore it or treat it as an alias. Meridian does not rewrite the field.
If you send max_completion_tokens to a legacy model, the provider will reject the request with a validation error. Meridian surfaces that error directly.
Sending both is valid for models that accept both. The provider decides precedence; typically max_completion_tokens wins.

Recommendation

For new integrations, prefer max_completion_tokens when targeting reasoning models, and fall back to max_tokens for GPT-4 and earlier. Meridian does not enforce a default — you retain full control over token budgeting at the edge.

← Back to docs Try in Playground →

max_tokensvsmax_completion_tokens

Why two parameters?

How Meridian handles it

Recommendation

`max_tokens`vs`max_completion_tokens`