✓ Verified ✍️ Content Creation ✓ Enhanced Data

Lb Pocket Tts Skill

Generate speech from text using Kyutai Pocket TTS - lightweight, CPU-friendly, streaming TTS with vo

Rating
4.3 (261 reviews)
Downloads
11,528 downloads
Version
1.0.0

Overview

Generate speech from text using Kyutai Pocket TTS - lightweight, CPU-friendly, streaming TTS with voice cloning.

Key Features

1

100M parameters - Small, efficient model

2

CPU-optimized - No GPU needed, uses only 2 cores

3

~6x real-time - Fast generation on modern CPUs

4

~200ms latency - To first audio chunk (streaming)

5

Voice cloning - From 3-10s audio samples

6

24kHz mono WAV - High-quality output

7

English only - More languages planned

8

--

Complete Documentation

View Source →

Pocket TTS

Lightweight CPU-friendly text-to-speech with voice cloning. No GPU required.

When to Use

  • Generating speech from text on CPU without GPU
  • Voice cloning from audio samples
  • Streaming audio generation (low latency)
  • Local TTS without API dependencies
  • Real-time speech synthesis (~6x faster than real-time)

Key Features

  • 100M parameters - Small, efficient model
  • CPU-optimized - No GPU needed, uses only 2 cores
  • ~6x real-time - Fast generation on modern CPUs
  • ~200ms latency - To first audio chunk (streaming)
  • Voice cloning - From 3-10s audio samples
  • 24kHz mono WAV - High-quality output
  • English only - More languages planned

Installation

bash
pip install pocket-tts
# or
uv add pocket-tts


CLI Commands

Generate Speech

bash
# Basic generation (default voice)
pocket-tts generate --text "Hello world"

# Custom voice (local file, URL, or safetensors)
pocket-tts generate --voice ./my_voice.wav
pocket-tts generate --voice "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
pocket-tts generate --voice ./voice.safetensors

# Quality tuning
pocket-tts generate --temperature 0.7 --lsd-decode-steps 3

See docs/generate.md for full CLI reference.

Start Web Server

bash
# Start FastAPI server with web UI
pocket-tts serve

# Custom host/port
pocket-tts serve --host localhost --port 8080

See docs/serve.md for server options.

Export Voice Embeddings

Convert audio files to .safetensors for faster loading:

bash
# Single file
pocket-tts export-voice voice.mp3 voice.safetensors

# Batch conversion
pocket-tts export-voice voices/ embeddings/ --truncate

See docs/export_voice.md for export options.


Python API

Basic Usage

python
from pocket_tts import TTSModel
import scipy.io.wavfile

# Load model
model = TTSModel.load_model()

# Get voice state
voice = model.get_state_for_audio_prompt(
    "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)

# Generate audio
audio = model.generate_audio(voice, "Hello world!")

# Save
scipy.io.wavfile.write("output.wav", model.sample_rate, audio.numpy())

Load Model

python
model = TTSModel.load_model(
    config="b6369a24",       # Model variant
    temp=0.7,                # Temperature (0.5-1.0)
    lsd_decode_steps=1,      # Generation steps (1-5)
    eos_threshold=-4.0       # End-of-sequence threshold
)

Voice State

python
# From audio file/URL
voice = model.get_state_for_audio_prompt("./voice.wav")
voice = model.get_state_for_audio_prompt("hf://kyutai/tts-voices/alba-mackenna/casual.wav")

# From safetensors (fast loading)
voice = model.get_state_for_audio_prompt("./voice.safetensors")

Streaming Generation

python
# Stream audio chunks
for chunk in model.generate_audio_stream(voice, "Long text..."):
    # Process/save/play each chunk as generated
    print(f"Chunk: {chunk.shape[0]} samples")

Multi-Voice Management

python
# Preload multiple voices
voices = {
    "casual": model.get_state_for_audio_prompt("hf://kyutai/tts-voices/alba-mackenna/casual.wav"),
    "announcer": model.get_state_for_audio_prompt("./announcer.safetensors"),
}

# Use different voices
audio1 = model.generate_audio(voices["casual"], "Hey there!")
audio2 = model.generate_audio(voices["announcer"], "Breaking news!")

See docs/python-api.md for complete API reference.


Available Voices

Pre-made voices from hf://kyutai/tts-voices/:

  • alba-mackenna/casual.wav (default, female)
  • jessica-jian/casual.wav (female)
  • voice-donations/Selfie.wav (male, marius)
  • voice-donations/Butter.wav (male, javert)
  • ears/p010/freeform_speech_01.wav (male, jean)
  • vctk/p244_023.wav (female, fantine)
  • vctk/p262_023.wav (female, eponine)
  • vctk/p303_023.wav (female, azelma)
Or clone any voice from your own audio samples.


Voice Cloning Tips

  • Clean audio - Remove background noise (use Adobe Podcast Enhance)
  • Length - 3-10 seconds of speech is ideal
  • Quality - Input quality affects output quality
  • Format - WAV, MP3, or any common audio format supported

Performance Tips

  • CPU-only - GPU provides no speedup (model too small, batch size 1)
  • 2 cores - Uses only 2 CPU cores efficiently
  • Streaming - Low latency (<200ms to first chunk)
  • Safetensors - Pre-process voices to .safetensors for instant loading

Output Format

All commands output WAV files:

  • Sample rate: 24 kHz
  • Channels: Mono
  • Bit depth: 16-bit PCM

Links

Installation

Terminal bash

openclaw install lb-pocket-tts-skill
    
Copied!

💻Code Examples

uv add pocket-tts

uv-add-pocket-tts.txt
---

## CLI Commands

### Generate Speech

pocket-tts generate --temperature 0.7 --lsd-decode-steps 3

pocket-tts-generate---temperature-07---lsd-decode-steps-3.txt
**See** `docs/generate.md` for full CLI reference.

### Start Web Server

pocket-tts serve --host localhost --port 8080

pocket-tts-serve---host-localhost---port-8080.txt
**See** `docs/serve.md` for server options.

### Export Voice Embeddings

Convert audio files to `.safetensors` for faster loading:

pocket-tts export-voice voices/ embeddings/ --truncate

pocket-tts-export-voice-voices-embeddings---truncate.txt
**See** `docs/export_voice.md` for export options.

---

## Python API

### Basic Usage
example.sh
pip install pocket-tts
# or
uv add pocket-tts
example.sh
# Basic generation (default voice)
pocket-tts generate --text "Hello world"

# Custom voice (local file, URL, or safetensors)
pocket-tts generate --voice ./my_voice.wav
pocket-tts generate --voice "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
pocket-tts generate --voice ./voice.safetensors

# Quality tuning
pocket-tts generate --temperature 0.7 --lsd-decode-steps 3
example.sh
# Start FastAPI server with web UI
pocket-tts serve

# Custom host/port
pocket-tts serve --host localhost --port 8080
example.sh
# Single file
pocket-tts export-voice voice.mp3 voice.safetensors

# Batch conversion
pocket-tts export-voice voices/ embeddings/ --truncate
example.py
from pocket_tts import TTSModel
import scipy.io.wavfile

# Load model
model = TTSModel.load_model()

# Get voice state
voice = model.get_state_for_audio_prompt(
    "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)

# Generate audio
audio = model.generate_audio(voice, "Hello world!")

# Save
scipy.io.wavfile.write("output.wav", model.sample_rate, audio.numpy())
example.py
model = TTSModel.load_model(
    config="b6369a24",       # Model variant
    temp=0.7,                # Temperature (0.5-1.0)
    lsd_decode_steps=1,      # Generation steps (1-5)
    eos_threshold=-4.0       # End-of-sequence threshold
)

Tags

#media_and-streaming

Quick Info

Category Content Creation
Model Claude 3.5
Complexity One-Click
Author leonaaardob
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install lb-pocket-tts-skill