Docs / Anti-Patterns

Common anti-patterns

Patterns that degrade security, reliability, and cost-efficiency in AI-integrated applications — and what to do instead.

ANTI-PATTERN 01

Storing API keys in client-side code

Embedding provider keys in browser bundles, environment variables prefixed with NEXT_PUBLIC_, or hardcoding secrets in JavaScript shipped to the client exposes credentials to every end user. Any key that reaches the browser is extractable via DevTools, network inspection, or source maps.

Route all model calls through a backend proxy. Store keys server-side only, injected at runtime via sealed secrets or a vault. The client never sees the credential — it sends a request to your API, and your API forwards it with the key attached.

ANTI-PATTERN 02

Infinite retry loops without backoff

Wrapping a failed API call in a while(true) loop with no delay, no max attempts, and no circuit breaker turns a transient 429 into a self-inflicted denial-of-service. Each retry compounds load on the upstream provider, guarantees rate-limit hits, and burns quota on requests that will never succeed.

Cap retries at 3–5 attempts. Use exponential backoff with jitter (e.g., 1s → 2s → 4s). On 429 responses, respect the Retry-After header. Implement a circuit breaker that halts all requests after a threshold of consecutive failures.

ANTI-PATTERN 03

No request timeout configured

Omitting a timeout on fetch calls, streaming connections, or SDK clients means a hung connection blocks the request thread indefinitely. In serverless environments this burns function duration. In user-facing UIs it produces a spinner that never resolves. Long-running generations with no deadline also accumulate cost silently.

Set an explicit timeout on every outbound call — 30s for standard completions, 120s for long generations. Use AbortController in the browser and equivalent deadline mechanisms server-side. Surface timeout errors to the user with a clear retry affordance.

ANTI-PATTERN 04

Ignoring streaming response chunks

Requesting a streaming completion but buffering the entire response before rendering defeats the purpose of streaming. Users stare at a blank screen for the full generation duration, then receive the output in one burst. This pattern also consumes unnecessary memory holding the full payload.

Process chunks as they arrive. Use ReadableStream readers or async iterators to yield tokens incrementally. Update the UI on each chunk — even a single character at a time dramatically improves perceived responsiveness.

ANTI-PATTERN 05

Prompts containing embedded secrets

Concatenating database connection strings, internal hostnames, customer PII, or proprietary system prompts directly into the messages sent to a third-party model leaks that data to the provider. Logging, training-data inclusion, and provider-side retention policies are outside your control once the bytes leave your boundary.

Sanitize all prompt inputs. Replace secrets with placeholder tokens and resolve them post-generation. Never include raw credentials, internal IPs, or unmasked user data in model requests. Treat every prompt as if it will be stored and reviewed by the provider.

ANTI-PATTERN 06

Omitting max_tokens from generation requests

Leaving max_tokens unset or null allows the model to generate up to its context limit. A runaway generation — triggered by a malformed prompt, repetition loop, or adversarial input — can consume thousands of tokens before stopping. In pay-per-token billing models this translates directly to cost overruns with zero user value.

Always set a sensible max_tokens bound. Match it to your UI constraints — 256 for classifications, 1024 for summaries, 4096 for long-form generation. Treat it as a hard cost-control gate, not an optional parameter.