Content moderation
Pre-screen user inputs before they reach your application. The moderation endpoint returns per-category probability scores so you can decide what to flag, block, or review.
Quickstart
Send a POST to /v1/moderations with model=text-moderation-stable and the input text.
curl https://api.meridian.sh/v1/moderations \
-H "Authorization: Bearer $MERIDIAN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-moderation-stable",
"input": "I want to hurt someone."
}'Response
Each category returns a score between 0 and 1. Higher scores indicate a greater likelihood the content violates the policy. A flagged boolean istrue when any category exceeds the threshold.
{
"id": "modr-9gH7xK2LpQ",
"model": "text-moderation-stable",
"results": [
{
"flagged": true,
"categories": {
"harassment": false,
"harassment/threatening": false,
"hate": false,
"hate/threatening": false,
"self-harm": false,
"self-harm/intent": false,
"self-harm/instructions": false,
"sexual": false,
"sexual/minors": false,
"violence": true,
"violence/graphic": false
},
"category_scores": {
"harassment": 0.0003,
"harassment/threatening": 0.0001,
"hate": 0.0002,
"hate/threatening": 0.0001,
"self-harm": 0.0001,
"self-harm/intent": 0.0001,
"self-harm/instructions": 0.0001,
"sexual": 0.0001,
"sexual/minors": 0.0001,
"violence": 0.987,
"violence/graphic": 0.002
}
}
]
}Categories
| Category key | Description |
|---|---|
| harassment | Harassment |
| harassment/threatening | Harassment / Threatening |
| hate | Hate |
| hate/threatening | Hate / Threatening |
| self-harm | Self-harm |
| self-harm/intent | Self-harm / Intent |
| self-harm/instructions | Self-harm / Instructions |
| sexual | Sexual |
| sexual/minors | Sexual / Minors |
| violence | Violence |
| violence/graphic | Violence / Graphic |
Best practices
- ▸Run moderation before persisting or displaying user-generated content.
- ▸Use per-category scores to build graduated responses — warn on low-confidence flags, block on high-confidence.
- ▸Combine moderation with a human-review queue for edge cases where scores fall in a middle band.
- ▸The model is optimized for English. For other languages, test accuracy before deploying to production.