Recipe: Voice cloning workflow

A step-by-step guide to creating a high-fidelity voice clone using Meridian. This workflow assumes you have explicit, documented consent from the voice subject.

⚠️ Ethics & Consent

Voice cloning is a powerful technology that demands responsibility. You must obtain clear, informed consent from any individual whose voice you clone. Never clone a voice without permission. Never use cloned voices for deception, fraud, impersonation, or harassment. Meridian reserves the right to suspend accounts that violate these principles. When in doubt, don't clone it.

1. Source audio

Record or upload 3–5 minutes of clean, dry speech. No background noise, no music, no reverb. A single speaker, natural cadence. WAV or FLAC at 24kHz+ yields best results. Avoid phone calls or compressed Zoom recordings.

2. Preprocessing

Trim silence from head and tail. Normalize to -23 LUFS. Remove breaths and mouth clicks with a light de-noise pass. Split into 10–15 second segments. Meridian's ingestion pipeline handles the rest automatically.

3. Training

Submit your preprocessed clips via the dashboard. Training typically completes in 8–12 minutes. You'll receive a notification when the model is ready. Quality scales with source audio fidelity — garbage in, garbage out.

4. Inference

Use the API or playground to generate speech. Provide clean text input. Adjust stability and similarity sliders to taste. Lower stability = more expressive, higher = more consistent. Export as WAV or MP3.

5. Verification

Always A/B test the output against the original speaker. Have a third party listen blind. If it doesn't pass, revisit your source audio quality and preprocessing steps.

Questions? Reach out on Discord.