Overview
Smart Turn Detection uses an advanced machine learning model to determine when a user has finished speaking and your bot should respond. Unlike basic Voice Activity Detection (VAD) which only detects speech vs. non-speech, Smart Turn Detection recognizes natural conversational cues like intonation patterns and linguistic signals for more natural conversations.GitHub Repository
Open source model for conversational turn detection
Model weights
ONNX weights file for Smart Turn v3
Key Benefits
- Natural conversations: More human-like turn-taking patterns
- Free to use: The model is fully open source
- Scalable: Smart Turn v3 supports fast CPU inference directly inside your Pipecat Cloud instance
Quick Start
To enable Smart Turn Detection in your Pipecat Cloud bot, add theLocalSmartTurnAnalyzerV3
analyzer to your transport configuration.
The model weights are bundled with Pipecat, so there’s no need to download them separately.
Smart Turn Detection requires VAD to be enabled with
stop_secs=0.2
. This
value mimics the training data and allows Smart Turn to dynamically adjust
timing based on the model’s predictions.How It Works
- Audio Analysis: The system continuously analyzes incoming audio for speech patterns
- VAD Processing: Voice Activity Detection segments audio into speech and silence
- Turn Classification: When VAD detects a pause, the ML model analyzes the speech segment for natural completion cues
- Smart Response: The model determines if the turn is complete or if the user is likely to continue speaking
Training Data Collection
The smart-turn model is trained on real conversational data collected through these applications. Help us improve the model by contributing your own data or classifying existing data:Data Collector
Contribute conversational data to improve the model
Data Classifier
Help classify turn completion patterns in conversations