Local Piper Tts Multilang Secure
Local offline text-to-speech via Piper TTS.
- Rating
- 4.3 (205 reviews)
- Downloads
- 1,305 downloads
- Version
- 1.0.0
Overview
Local offline text-to-speech via Piper TTS.
✨Key Features
Fully offline (no API keys)
Self-contained setup via setup() — installs Piper into an isolated venv, no system-wide changes
Automatic language detection for 20+ languages with English as default
Per-call voice selection via voice parameter
On-demand voice download via downloadVoices() — no models bundled, choose what you need
Voice removal via removeVoice() — clean up voices you no longer want
Extensible: add any language by installing a Piper .onnx model
Writes outputs into OpenClaw workspace
Complete Documentation
View Source →
local-piper-tts-multilang-secure
Description
Local (offline) text-to-speech via Piper.Purpose: generate audio files (OGG/Opus by default) from text, fully offline. No sending is performed by the skill — sending is handled by the agent after the file is ready.
Features
- Fully offline (no API keys)
- Self-contained setup via
setup()— installs Piper into an isolated venv, no system-wide changes - Automatic language detection for 20+ languages with English as default
- Per-call voice selection via
voiceparameter - On-demand voice download via
downloadVoices()— no models bundled, choose what you need - Voice removal via
removeVoice()— clean up voices you no longer want - Extensible: add any language by installing a Piper
.onnxmodel - Writes outputs into OpenClaw workspace
First-run flow — full agent procedure
Follow this sequence exactly when the user asks to use TTS for the first time in a setup context.
Step 1 — check status
const s = await status();
Step 2 — install Piper if needed
Ifs.stage is not-setup or no-piper:
- Tell the user: "To use local TTS I need to install piper-tts into the skill's venv (~30 seconds, one-time). OK to proceed?"
- Wait for confirmation, then call
setup(). - If setup returns a step containing "WARNING: espeak-ng not found", relay the warning and install instructions to the user.
- Call
status()again after setup completes.
Step 3 — offer voice download if no models present
Ifs.stage is no-model (Piper installed but no .onnx files):3a. Offer English defaults: Explain that two English voices are available as defaults (~65 MB each):
en_US-ryan-medium— male, Americanen_US-amy-medium— female, American
3b. Ask about other languages: After the English choice, ask: "Do you need any other languages? For example German, French, Spanish, Polish, Italian, Portuguese, Russian… Just tell me and I'll check what's available."
If the user names a language, look up the available models at https://github.com/rhasspy/piper/blob/master/VOICES.md and list the options. Download whatever the user picks using the same downloadVoices() call.
3c. Download everything at once:
const result = await downloadVoices(['en_US-ryan-medium', 'en_US-amy-medium', /* + any others */]);
// result.downloaded — succeeded
// result.failed — [{stem, error}] if any failed
If any downloads fail:
- Check internet connectivity
- Verify the stem exists at https://github.com/rhasspy/piper/blob/master/VOICES.md
- Offer to retry
Step 4 — play samples so the user can choose
After downloading, generate a short audio sample for each downloaded voice and send it to the user.For each voice, use a greeting in the voice's language:
- English:
"Hello, I'm [name]. How can I help you today?" - German:
"Hallo, ich heiße [Name]. Wie kann ich Ihnen helfen?" - French:
"Bonjour, je m'appelle [prénom]. Comment puis-je vous aider?" - Spanish:
"Hola, me llamo [nombre]. ¿Cómo puedo ayudarte?" - Polish:
"Cześć, mam na imię [imię]. Jak mogę Ci pomóc?" - Italian:
"Ciao, mi chiamo [nome]. Come posso aiutarti?" - Portuguese:
"Olá, meu nome é [nome]. Como posso ajudar?" - Russian:
"Привет, меня зовут [имя]. Чем могу помочь?" - For other languages: use an equivalent native greeting.
[name] with the voice name (e.g. Ryan, Amy, Thorsten).const sample = await tts({ text: 'Hello, I\'m Ryan. How can I help you today?', voice: 'en_US-ryan-medium' });
// send sample.path to the user as a voice message
Send all samples, then ask: "Which voice do you prefer? Or shall I download a different one?"
Step 5 — choose speech speed
After the user picks a voice, ask: "How fast should I speak? Normal is 100%. Some options: 125% (faster), 115% (slightly faster), 100% (normal), 80% (slower) — or tell me a percentage."Always present speed as a percentage to the user. Never mention lengthScale directly.
lengthScale is the internal duration multiplier — lower = faster. To convert: lengthScale = 1 / (speed% / 100).
Examples:
- 125% speed → lengthScale 0.8
- 115% speed → lengthScale 0.87
- 100% speed → lengthScale 1.0 (default)
- 80% speed → lengthScale 1.25
const sample = await tts({ text: 'This is how I sound at this speed.', voice: 'chosen-voice', lengthScale: 0.8 });
// send sample.path to the user
Confirm with the user, then offer to save it permanently: "Should I save this as your default speed? It'll be used automatically every session."
If the user agrees:
await saveConfig({ lengthScale: 0.8 });
Once saved, tts() reads it from config.json in the skill directory automatically — no need to pass lengthScale on every call.
Step 6 — note the preferred voice and speed
Once confirmed, remember bothvoice and lengthScale for the session. Pass them to every subsequent tts() call unless the user asks to change them.Before first use — always call status()
Always call status() before the first tts() call in a session to determine what is needed.
stage | Meaning | What to do |
|---|---|---|
| ready | Fully installed, at least one voice model present | Proceed with tts() |
| not-setup | Piper not installed | Ask user for confirmation, then call setup() |
| no-piper | Venv exists but piper binary missing | Ask user for confirmation, then call setup() |
| no-model | Piper installed but no voice model downloaded | Follow Steps 3–5 of first-run flow above |
setup().
It installs the piper-tts package from PyPI into a venv inside the skill directory.Usage
- Input:
text, optionalformat("ogg"or"wav"), optionalvoice(model stem), optionallengthScale(speech speed, default1.0) - Output: path to generated file (usually
.ogg)
Controlling voice and language
To list installed voices, call listVoices() — returns stems of all installed .onnx models.
Never assume a fixed list; it varies per user and installation.
Auto-detection (no voice param):
The script detects language from the text using character and script analysis:
- Non-Latin scripts: Cyrillic (Russian, Ukrainian, Bulgarian), Greek, Arabic, Persian, Chinese, Japanese, Korean, Georgian
- Latin-script languages: Vietnamese, Polish, Romanian, Turkish, Czech, Slovak, Hungarian, Portuguese, Spanish, Catalan, German, Finnish, Scandinavian (Swedish, Norwegian, Danish), French, Italian
- Fallback: English keywords → first English model → any installed model
voice parameter explicitly.Explicit override: set PIPER_VOICE_MODEL env var to a full .onnx path (overrides everything).
When the user requests a specific voice or language:
- Call
listVoices()to see what is installed - Pass the matching stem as
voicetotts(), e.g.voice: "en_US-amy-medium" - If the requested voice is not installed, offer to download it with
downloadVoices([stem])
voice parameter.Downloading additional voices
The user may say things like "I don't like this voice, use a female one" or "Download a German voice". When this happens:
- Find the model at https://github.com/rhasspy/piper/blob/master/VOICES.md
- Confirm the stem (e.g.
de_DE-thorsten-medium) and calldownloadVoices([stem]) - Generate a sample and send it to the user
- Confirm with
listVoices()— the new voice is immediately usable
Removing voices
The user may say "remove that voice" or "I don't need the German voice anymore". When this happens:
- Call
listVoices()to confirm which voices are installed - Confirm with the user which voice to remove
- Call
removeVoice(stem)— e.g.removeVoice('de_DE-thorsten-medium') - Returns
{ removed, filesDeleted }on success - If the removed voice was the user's preferred voice, ask them to pick a new one
Changing speech speed
The user may say things like "speak faster", "too slow", or "speed it up". When this happens:
- Ask what speed they want in %, or suggest: 125% (faster), 115%, 100% (normal), 80% (slower)
- Convert their % to lengthScale:
lengthScale = 1 / (speed% / 100) - Generate a short sample:
await tts({ text: '...', voice: 'current-voice', lengthScale: 0.8 }) - Send the sample and confirm
- Offer to persist: "Save this as default?" — if yes, call
saveConfig({ lengthScale: 0.8 }) - Use the new
lengthScalefor all subsequenttts()calls in the session
Where files are written
OPENCLAW_WORKSPACE/tts/ifOPENCLAW_WORKSPACEenv var is set- otherwise:
~/.openclaw/workspace/tts/
Dependencies
python3(3.8+) — required forsetup()to create the venvffmpeg— for WAV → OGG/Opus conversionespeak-ng— system library used by Piper internally;setup()checks for it and warns if missing.
sudo apt install espeak-ng (Debian/Ubuntu), sudo dnf install espeak-ng (Fedora),
brew install espeak (macOS)
- At least one Piper
.onnx+.onnx.jsonvoice model pair in the skill directory
Platform support
- Linux x86_64: fully supported
- macOS x86_64 / arm64: fully supported
- Linux ARM: may require building piper-tts from source
- Windows: not supported
Remove
rm -rf ~/.openclaw/skills/local-piper-tts-multilang-secure
Installation
openclaw install local-piper-tts-multilang-secure
💻Code Examples
**3c. Download everything at once:**
const result = await downloadVoices(['en_US-ryan-medium', 'en_US-amy-medium', /* + any others */]);
// result.downloaded — succeeded
// result.failed — [{stem, error}] if any failed// send sample.path to the user as a voice message
Send all samples, then ask: *"Which voice do you prefer? Or shall I download a different one?"*
### Step 5 — choose speech speed
After the user picks a voice, ask:
*"How fast should I speak? Normal is 100%. Some options: 125% (faster), 115% (slightly faster), 100% (normal), 80% (slower) — or tell me a percentage."*
Always present speed as a percentage to the user. Never mention `lengthScale` directly.
`lengthScale` is the internal duration multiplier — lower = faster. To convert: `lengthScale = 1 / (speed% / 100)`.
Examples:
- 125% speed → lengthScale 0.8
- 115% speed → lengthScale 0.87
- 100% speed → lengthScale 1.0 (default)
- 80% speed → lengthScale 1.25
Generate a short sample at the chosen speed so the user can hear the difference:// send sample.path to the user
Confirm with the user, then offer to save it permanently:
*"Should I save this as your default speed? It'll be used automatically every session."*
If the user agrees:await saveConfig({ lengthScale: 0.8 });
Once saved, `tts()` reads it from `config.json` in the skill directory automatically — no need to pass `lengthScale` on every call.
### Step 6 — note the preferred voice and speed
Once confirmed, remember both `voice` and `lengthScale` for the session. Pass them to every subsequent `tts()` call unless the user asks to change them.
---
## Before first use — always call status()
**Always call `status()` before the first `tts()` call in a session** to determine what is needed.
| `stage` | Meaning | What to do |
|---|---|---|
| `ready` | Fully installed, at least one voice model present | Proceed with `tts()` |
| `not-setup` | Piper not installed | Ask user for confirmation, then call `setup()` |
| `no-piper` | Venv exists but piper binary missing | Ask user for confirmation, then call `setup()` |
| `no-model` | Piper installed but no voice model downloaded | Follow Steps 3–5 of first-run flow above |
**IMPORTANT: Always ask the user for confirmation before calling `setup()`.**
It installs the `piper-tts` package from PyPI into a venv inside the skill directory.
## Usage
- Input: `text`, optional `format` (`"ogg"` or `"wav"`), optional `voice` (model stem), optional `lengthScale` (speech speed, default `1.0`)
- Output: path to generated file (usually `.ogg`)
## Controlling voice and language
**To list installed voices**, call `listVoices()` — returns stems of all installed `.onnx` models.
Never assume a fixed list; it varies per user and installation.
**Auto-detection (no `voice` param):**
The script detects language from the text using character and script analysis:
- Non-Latin scripts: Cyrillic (Russian, Ukrainian, Bulgarian), Greek, Arabic, Persian, Chinese, Japanese, Korean, Georgian
- Latin-script languages: Vietnamese, Polish, Romanian, Turkish, Czech, Slovak, Hungarian, Portuguese, Spanish, Catalan, German, Finnish, Scandinavian (Swedish, Norwegian, Danish), French, Italian
- Fallback: English keywords → first English model → any installed model
Auto-detection is best-effort. For reliable results with a specific language, always pass the `voice` parameter explicitly.
**Explicit override:** set `PIPER_VOICE_MODEL` env var to a full `.onnx` path (overrides everything).
**When the user requests a specific voice or language:**
1. Call `listVoices()` to see what is installed
2. Pass the matching stem as `voice` to `tts()`, e.g. `voice: "en_US-amy-medium"`
3. If the requested voice is not installed, offer to download it with `downloadVoices([stem])`
**To switch back to auto-detect**, omit the `voice` parameter.
## Downloading additional voices
The user may say things like *"I don't like this voice, use a female one"* or
*"Download a German voice"*. When this happens:
1. Find the model at https://github.com/rhasspy/piper/blob/master/VOICES.md
2. Confirm the stem (e.g. `de_DE-thorsten-medium`) and call `downloadVoices([stem])`
3. Generate a sample and send it to the user
4. Confirm with `listVoices()` — the new voice is immediately usable
## Removing voices
The user may say *"remove that voice"* or *"I don't need the German voice anymore"*. When this happens:
1. Call `listVoices()` to confirm which voices are installed
2. Confirm with the user which voice to remove
3. Call `removeVoice(stem)` — e.g. `removeVoice('de_DE-thorsten-medium')`
4. Returns `{ removed, filesDeleted }` on success
5. If the removed voice was the user's preferred voice, ask them to pick a new one
**Never remove the last remaining voice without warning the user that TTS will stop working.**
## Changing speech speed
The user may say things like *"speak faster"*, *"too slow"*, or *"speed it up"*. When this happens:
1. Ask what speed they want in %, or suggest: 125% (faster), 115%, 100% (normal), 80% (slower)
2. Convert their % to lengthScale: `lengthScale = 1 / (speed% / 100)`
3. Generate a short sample: `await tts({ text: '...', voice: 'current-voice', lengthScale: 0.8 })`
4. Send the sample and confirm
5. Offer to persist: *"Save this as default?"* — if yes, call `saveConfig({ lengthScale: 0.8 })`
6. Use the new `lengthScale` for all subsequent `tts()` calls in the session
## Where files are written
- `OPENCLAW_WORKSPACE/tts/` if `OPENCLAW_WORKSPACE` env var is set
- otherwise: `~/.openclaw/workspace/tts/`
## Dependencies
- `python3` (3.8+) — required for `setup()` to create the venv
- `ffmpeg` — for WAV → OGG/Opus conversion
- `espeak-ng` — system library used by Piper internally; `setup()` checks for it and warns if missing.
Install: `sudo apt install espeak-ng` (Debian/Ubuntu), `sudo dnf install espeak-ng` (Fedora),
`brew install espeak` (macOS)
- At least one Piper `.onnx` + `.onnx.json` voice model pair in the skill directory
## Platform support
- Linux x86_64: fully supported
- macOS x86_64 / arm64: fully supported
- Linux ARM: may require building piper-tts from source
- Windows: not supported
## RemoveTags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.