Safety filters
Meridian routes every generation request through Azure AI Content Safety. Configure severity thresholds to control what your application allows through.
Filter categories
Speech that attacks or discriminates based on protected characteristics.
Sexually explicit content, innuendo, or suggestive material.
Depictions of physical harm, gore, or weapons-related content.
Content that promotes or depicts self-injury or suicide.
Severity thresholds
Each category accepts a threshold from 0 (most permissive) to 7 (most restrictive). The default is 4 — moderate filtering that catches obvious violations while allowing edge cases through.
| Level | Behavior |
|---|---|
| 0–1 | Off — no filtering applied. |
| 2–3 | Low — blocks only severe content. |
| 4–5 | Medium — balanced filtering (default). |
| 6–7 | High — aggressive filtering; may trigger false positives. |
Understanding content_filter responses
When a threshold is set high (6–7), Azure may return an HTTP 200 with finish_reason=content_filter instead of a generation. The request succeeded — the model simply refused to produce output.
{
"choices": [{
"index": 0,
"finish_reason": "content_filter",
"content_filter_results": {
"hate": { "filtered": true, "severity": "high" },
"sexual": { "filtered": false, "severity": "safe" },
"violence": { "filtered": false, "severity": "low" },
"self_harm": { "filtered": false, "severity": "safe" }
}
}]
}Meridian surfaces the filtered categories and severities in your dashboard logs so you can tune thresholds without guesswork.
Recommendations
- ▸Start at severity 4 and monitor logs for one week before adjusting.
- ▸If your application handles user-generated prompts, keep hate and self-harm at 5+.
- ▸Avoid severity 7 in production — it frequently blocks benign requests that contain flagged keywords out of context.
- ▸Use the dashboard's per-category breakdown to identify which filter triggers most often, then tune that single category rather than raising all thresholds globally.