Recipe

GDPR design for LLM apps

A practical recipe for shipping LLM features that respect data subject rights, minimize PII exposure, and survive a DPA audit.

Large language models complicate GDPR compliance because prompts and completions routinely contain personal data, model providers act as processors, and embeddings can leak identifying signal. This recipe walks through three design moves that turn a compliant LLM app from aspiration into architecture: data minimization at the prompt boundary, lawful basis selection for inference, and erasure plumbing that reaches into vector stores and logs.

1. Minimize at the prompt boundary

Strip or pseudonymize PII before it crosses into the model context. Treat the prompt as a hostile log line: redact emails, names, account numbers, and free-text identifiers with deterministic tokens you can reverse server-side.

// pseudocode
const redacted = redactPII(userInput, {
  email: "<EMAIL_0>",
  name:  "<NAME_0>",
  iban:  "<IBAN_0>"
});
const completion = await meridian.chat({
  model: "azure/model-router",
  messages: [{ role: "user", content: redacted }]
});
const final = rehydrate(completion, redacted.map);

2. Pick a lawful basis you can defend

Contract is the cleanest basis for product-core LLM features the user explicitly invoked. Legitimate interest works for fraud and abuse detection if you document the balancing test. Consent is fragile for inference but mandatory for training-data reuse. Record the basis per feature, per data category, and surface it in the privacy notice.

3. Wire erasure end to end

Article 17 requests must reach every store touched by inference: transcript tables, vector indexes, evaluation traces, and provider-side retention. Tag every embedding with a subject id, expose a hard-delete worker, and confirm processor retention windows (Azure OpenAI, Anthropic, OpenAI) in your DPA.