Recipe

Voice chat design

A complete blueprint for building real-time voice rooms with spatial audio, speaker indicators, and low-latency transport.

Overview

Voice chat is the highest-bandwidth social feature in any multiplayer product. This recipe covers WebRTC signaling, Opus encoding, server-side mixing vs peer-to-peer tradeoffs, and the UI patterns that make a voice room feel alive.

Architecture decision

For rooms under 25 participants, a selective forwarding unit (SFU) gives the best balance of quality and server cost. Each client sends one Opus stream; the SFU forwards the N loudest streams to each participant. Above 25, switch to server-side mixing with a single downmixed Opus frame per client.

Spatial audio

Position each speaker in a virtual 2D room. Apply HRTF panning and distance attenuation client-side via the Web Audio API. Store speaker positions in a flat array synced over the data channel at 10 Hz. Mute speakers beyond a configurable radius.

UI patterns

Speaker ring: circular avatars with a pulsing violet border when active.
Voice activity: fill the avatar border proportional to RMS energy.
Push-to-talk: hold Space or a configurable hotkey; show a subtle overlay.
Mute states: local mute, server mute, deafen — each with a distinct icon.

Transport

Use UDP for media with DTLS-SRTP key exchange. Fall back to TURN/TCP only when symmetric NAT is detected. Keep the signaling channel on a reliable WebSocket. Reconnect with exponential backoff capped at 2 seconds.

Next: read the Presence recipe to layer online status and typing indicators on top of your voice infrastructure.