Recipe

Recipe: Kid-safe chatbot

Build a chatbot with strict safety filters and real-time moderation so children interact only with age-appropriate content.

Overview

This recipe layers three defenses: a pre-flight keyword blocklist, an in-flight toxicity classifier, and a post-response audit log. Together they prevent the model from generating harmful, violent, or adult content even under adversarial prompting.

Step 1 — Blocklist filter

Before any prompt reaches the LLM, scan it against a curated blocklist of 2,400+ unsafe terms covering self-harm, violence, grooming language, and explicit material. Reject the request immediately with a generic “I can’t answer that” message — never echo the blocked term back to the child.

Step 2 — Toxicity classifier

Route every user message through a lightweight toxicity model (e.g. Perspective API or a fine-tuned BERT classifier). Set a conservative threshold of 0.3 on the toxicity axis. If the score exceeds the threshold, surface a gentle redirection: “Let’s talk about something else!”

Step 3 — Output guard

Run the same classifier on the model’s response before streaming it to the child. If the response scores above threshold, replace it with a safe fallback and flag the interaction for human review.

Step 4 — Audit logging

Log every blocked prompt, flagged response, and classifier score to a tamper-proof audit table. Include a session-bound pseudonymous ID so parents or moderators can review patterns without storing personally identifiable information.

Sample guard pipeline

user_msg → blocklist? → reject
         ↓ pass
       toxicity? → score > 0.3 → redirect
         ↓ pass
       LLM response → toxicity? → score > 0.3 → fallback
                                     ↓ pass
                                   stream to child

Next steps

Combine this recipe with the parental-consent gateway and time-limit enforcer for a complete child-safe AI experience. See the linked recipes in the sidebar.

Meridian tip: Run the blocklist and classifier on a separate lightweight worker so guard latency stays under 50ms. Children abandon slow chatbots.