Ultravox

On this page

Overview
Installation
Prerequisites
Ultravox Model Setup
Required Environment Variables

Overview

UltravoxSTTService provides real-time speech-to-text using the Ultravox multimodal model running locally. Ultravox directly encodes audio into the LLM’s embedding space, eliminating traditional ASR components and providing faster, more efficient transcription with built-in conversational understanding.

Ultravox STT API Reference

Pipecat’s API methods for Ultravox STT integration

Example Implementation

Complete example with GPU optimization

Ultravox Documentation

Official Ultravox documentation and features

Hugging Face Models

Access Ultravox models and get HF tokens

Installation

To use Ultravox services, install the required dependency:

pip install "pipecat-ai[ultravox]"

Prerequisites

Ultravox Model Setup

Before using Ultravox STT services, you need:

Hugging Face Account: Sign up at Hugging Face
HF Token: Generate a Hugging Face token for model access
GPU Resources: Recommended for optimal performance with local model inference

Required Environment Variables

HF_TOKEN: Your Hugging Face token for model access

Speechmatics Whisper

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

Ultravox STT API Reference

Example Implementation

Ultravox Documentation

Hugging Face Models

Installation

Prerequisites

Ultravox Model Setup

Required Environment Variables

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

Ultravox STT API Reference

Example Implementation

Ultravox Documentation

Hugging Face Models

​Installation

​Prerequisites

​Ultravox Model Setup

​Required Environment Variables

Overview

Installation

Prerequisites

Ultravox Model Setup

Required Environment Variables