Coqui, the XTTS maintainer, has shut down. XTTS may not receive future updates or support.

Overview

XTTSTTSService provides multilingual voice synthesis with voice cloning capabilities through a locally hosted streaming server. The service supports real-time streaming and custom voice training using Coqui’s XTTS-v2 model for cross-lingual text-to-speech.

Installation

XTTS requires a running streaming server. Start the server using Docker:
docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 8000:80 \
  ghcr.io/coqui-ai/xtts-streaming-server:latest-cuda121

Prerequisites

XTTS Server Setup

Before using XTTSTTSService, you need:
  1. Docker Environment: Set up Docker with GPU support for optimal performance
  2. XTTS Server: Run the XTTS streaming server container
  3. Voice Models: Configure voice models and cloning samples as needed

Required Configuration

  • Server URL: Configure the XTTS server endpoint (default: http://localhost:8000)
  • Voice Selection: Set up voice models or voice cloning samples
GPU acceleration is recommended for optimal performance. The server requires CUDA support for best results.