Overview

WhisperSTTService provides offline speech recognition using OpenAI’s Whisper models running locally. Supports multiple model sizes and hardware acceleration options including CPU, CUDA, and Apple Silicon (MLX) for privacy-focused transcription without external API calls.

Installation

Choose your installation based on your hardware:

Standard Whisper (CPU/CUDA)

pip install "pipecat-ai[whisper]"

MLX Whisper (Apple Silicon)

pip install "pipecat-ai[mlx-whisper]"

Prerequisites

Local Model Setup

Before using Whisper STT services, you need:
  1. Model Selection: Choose appropriate Whisper model size (tiny, base, small, medium, large)
  2. Hardware Configuration: Set up CPU, CUDA, or Apple Silicon acceleration
  3. Storage Space: Ensure sufficient disk space for model downloads

Configuration Options

  • Model Size: Balance between accuracy and performance based on your hardware
  • Hardware Acceleration: Configure CUDA for NVIDIA GPUs or MLX for Apple Silicon
  • Language Support: Whisper supports 99+ languages out of the box
No API keys required - Whisper runs entirely locally for complete privacy.