Overview
UltravoxSTTService
provides real-time speech-to-text using the Ultravox multimodal model running locally. Ultravox directly encodes audio into the LLM’s embedding space, eliminating traditional ASR components and providing faster, more efficient transcription with built-in conversational understanding.
Ultravox STT API Reference
Pipecat’s API methods for Ultravox STT integration
Example Implementation
Complete example with GPU optimization
Ultravox Documentation
Official Ultravox documentation and features
Hugging Face Models
Access Ultravox models and get HF tokens
Installation
To use Ultravox services, install the required dependency:Prerequisites
Ultravox Model Setup
Before using Ultravox STT services, you need:- Hugging Face Account: Sign up at Hugging Face
- HF Token: Generate a Hugging Face token for model access
- GPU Resources: Recommended for optimal performance with local model inference
Required Environment Variables
HF_TOKEN
: Your Hugging Face token for model access