Docs/max_tokens vs max_completion_tokens

max_tokensvsmax_completion_tokens

Understanding the two token-limit parameters in the OpenAI API and how Meridian handles both transparently.

Why two parameters?

OpenAI originally exposed a single max_tokens field to cap the total tokens in a response. With the introduction of reasoning models (o1, o3, o4-mini), the API gained a new field: max_completion_tokens. This newer parameter controls only the visible completion tokens, excluding internal reasoning tokens that reasoning models consume behind the scenes.

ParameterApplies toCounts reasoning tokens?
max_tokensLegacy models (GPT-4, GPT-3.5)N/A — no reasoning tokens
max_completion_tokensReasoning models (o1, o3, o4-mini)No — visible output only

How Meridian handles it

Meridian acts as a transparent proxy. When your application sends either parameter — or both — the gateway passes them through to the upstream provider unchanged. You control which field to use based on the model you are targeting.

  • If you send max_tokens to a reasoning model, the provider may ignore it or treat it as an alias. Meridian does not rewrite the field.
  • If you send max_completion_tokens to a legacy model, the provider will reject the request with a validation error. Meridian surfaces that error directly.
  • Sending both is valid for models that accept both. The provider decides precedence; typically max_completion_tokens wins.

Recommendation

For new integrations, prefer max_completion_tokens when targeting reasoning models, and fall back to max_tokens for GPT-4 and earlier. Meridian does not enforce a default — you retain full control over token budgeting at the edge.