Voice chat design
A complete blueprint for building real-time voice rooms with spatial audio, speaker indicators, and low-latency transport.
Overview
Voice chat is the highest-bandwidth social feature in any multiplayer product. This recipe covers WebRTC signaling, Opus encoding, server-side mixing vs peer-to-peer tradeoffs, and the UI patterns that make a voice room feel alive.
Architecture decision
For rooms under 25 participants, a selective forwarding unit (SFU) gives the best balance of quality and server cost. Each client sends one Opus stream; the SFU forwards the N loudest streams to each participant. Above 25, switch to server-side mixing with a single downmixed Opus frame per client.
Spatial audio
Position each speaker in a virtual 2D room. Apply HRTF panning and distance attenuation client-side via the Web Audio API. Store speaker positions in a flat array synced over the data channel at 10 Hz. Mute speakers beyond a configurable radius.
UI patterns
- Speaker ring: circular avatars with a pulsing violet border when active.
- Voice activity: fill the avatar border proportional to RMS energy.
- Push-to-talk: hold Space or a configurable hotkey; show a subtle overlay.
- Mute states: local mute, server mute, deafen — each with a distinct icon.
Transport
Use UDP for media with DTLS-SRTP key exchange. Fall back to TURN/TCP only when symmetric NAT is detected. Keep the signaling channel on a reliable WebSocket. Reconnect with exponential backoff capped at 2 seconds.
Next: read the Presence recipe to layer online status and typing indicators on top of your voice infrastructure.