Elevenlabs Tts
ElevenLabs TTS - the best ElevenLabs integration for OpenClaw.
- Rating
- 4.6 (77 reviews)
- Downloads
- 1,815 downloads
- Version
- 1.0.0
Overview
ElevenLabs TTS - the best ElevenLabs integration for OpenClaw.
Complete Documentation
View Source →
ElevenLabs TTS (Text-to-Speech)
Generate expressive voice messages using ElevenLabs v3 with audio tags.
Prerequisites
- ElevenLabs API Key (
ELEVENLABS_API_KEY): Required. Get one at elevenlabs.io → Profile → API Keys. Configure inopenclaw.jsonundermessages.tts.elevenlabs.apiKey. - ffmpeg: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.
Quick Start Examples
Storytelling (emotional journey):
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
Horror/Suspense (building dread):
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
Conversation with reactions:
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
Hebrew (romantic moment):
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?
Spanish (celebration to reflection):
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
Configuration (OpenClaw)
In openclaw.json, configure TTS under messages.tts:
{
"messages": {
"tts": {
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "sk_your_api_key_here",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"modelId": "eleven_v3",
"languageCode": "en",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0,
"useSpeakerBoost": true,
"speed": 1
}
}
}
}
}
Getting your API Key:
- Go to https://elevenlabs.io
- Sign up/login
- Click profile → API Keys
- Copy your key
Recommended Voices for v3
These premade voices are optimized for v3 and work well with audio tags:
| Voice | ID | Gender | Accent | Best For |
|---|---|---|---|---|
| Adam | pNInz6obpgDQGcFmaJgB | Male | American | Deep narration, general use |
| Rachel | 21m00Tcm4TlvDq8ikWAM | Female | American | Calm narration, conversational |
| Brian | nPczCjzI2devNBz1zQrb | Male | American | Deep narration, podcasts |
| Charlotte | XB0fDUnXU5powFXDhCwa | Female | English-Swedish | Expressive, video games |
| George | JBFqnCBsd6RMkjVDRZzb | Male | British | Raspy narration, storytelling |
- Browse: https://elevenlabs.io/voice-library
- v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH
- API:
GET https://api.elevenlabs.io/v1/voices
- Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
- Match voice character to your use case (whispering voice won't shout well)
- For expressive IVCs, include varied emotional tones in training samples
Model Settings
- Model:
eleven_v3(alpha) - ONLY model supporting audio tags - Languages: 70+ supported with full audio tag control
Stability Modes
| Mode | Stability | Description |
|---|---|---|
| Creative | 0.3-0.5 | More emotional/expressive, may hallucinate |
| Natural | 0.5-0.7 | Balanced, closest to original voice |
| Robust | 0.7-1.0 | Highly stable, less responsive to tags |
Speed Control
Range: 0.7 (slow) to 1.2 (fast), default 1.0
Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].
Critical Rules
Length Limits
- Optimal: <800 characters per segment (best quality)
- Maximum: 10,000 characters (API hard limit)
- Quality degrades with longer text - voice becomes inconsistent
Audio Tags - Best Practices for Natural Sound
How many tags to use:
- 1-2 tags per sentence or phrase (not more!)
- Tags persist until the next tag - no need to repeat
- Overusing tags sounds unnatural and robotic
- At emotional transition points
- Before key dramatic moments
- When energy/pace changes
- Write text that matches the tag emotion
- Longer text with context = better interpretation
- Example:
[nervous] I... I'm not sure about this. What if it doesn't work?works better than[nervous] Hello.
[nervously][whispers]= nervous whispering[excited][laughs]= excited laughter- Keep combinations to 2 tags max
- v3 is non-deterministic - same text = different outputs
- Generate 3+ versions, pick the best
- Small text tweaks can improve results
- Don't use
[shouts]on a whispering voice - Don't use
[whispers]on a loud/energetic voice - Test tags with your chosen voice
SSML Not Supported
v3 does NOT support SSML break tags. Use audio tags and punctuation instead.Punctuation Effects (use with tags!)
Punctuation enhances audio tags:
- Ellipses (...) → dramatic pauses:
[nervous] I... I don't know... - CAPS → emphasis:
[excited] That's AMAZING! - Dashes (—) → interruptions:
[explaining] So what you do is— [interrupting] Wait! - Question marks → uncertainty:
[nervous] Are you sure about this? - Exclamation! → energy boost:
[happy] We did it!
[tired] It was a long day... [sighs] Nobody listens anymore.
WhatsApp Voice Messages
Complete Workflow
- Generate with
ttstool (returns MP3) - Convert to Opus (required for Android!)
- Send with
messagetool
Step-by-Step
1. Generate TTS (add [pause] at end to prevent cutoff):
tts text="[excited] This is amazing! [pause]" channel=whatsapp
MEDIA:/tmp/tts-xxx/voice-123.mp32. Convert MP3 → Opus:
ffmpeg -i /tmp/tts-xxx/voice-123.mp3 -c:a libopus -b:a 64k -vbr on -application voip /tmp/tts-xxx/voice-123.ogg
3. Send the Opus file:
Note: The message field below contains a Unicode Left-to-Right Mark (U+200E) between the quotes.
This is intentional — WhatsApp requires a non-empty message body to send voice notes.
The LTR mark is invisible but satisfies this requirement without displaying any text.
message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message=""
Why Opus?
| Format | iOS | Android | Transcribe |
|---|---|---|---|
| MP3 | ✅ Works | ❌ May fail | ❌ No |
| Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |
- Works on all devices (iOS + Android)
- Supports WhatsApp's transcribe button
Audio Cutoff Fix
ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:
[excited] This is amazing! [pause]
Long-Form Audio (Podcasts)
For content >800 chars:
- Split into short segments (<800 chars each)
- Generate each with
ttstool - Concatenate with ffmpeg:
cat > list.txt << EOF
file '/path/file1.mp3'
file '/path/file2.mp3'
EOF
ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3
- Convert to Opus for WhatsApp
- Send as single voice message
Multi-Speaker Dialogue
v3 can handle multiple characters in one generation:
Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!
Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]
Audio Tags Quick Reference
| Category | Tags | When to Use |
|---|---|---|
| Emotions | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section |
| Delivery | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes |
| Reactions | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly |
| Pacing | [pause], [hesitates], [stammers], [breathes] | Dramatic timing |
| Character | [French accent], [British accent], [robotic tone] | Character voice shifts |
| Dialogue | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |
- Emotions:
[excited],[nervous],[sad],[happy] - Reactions:
[laughs],[sighs],[whispers] - Pacing:
[pause]
- Sound effects:
[explosion],[gunshot] - Accents: results vary by voice
Troubleshooting
Tags read aloud?
- Verify using
eleven_v3model - Use IVC/premade voices, not PVC
- Simplify tags (no "tone" suffix)
- Increase text length (250+ chars)
- Segment is too long - split at <800 chars
- Regenerate (v3 is non-deterministic)
- Try lower stability setting
- Convert to Opus format (see above)
- Voice may not match tag style
- Try Creative stability mode (0.5)
- Add more context around the tag
Installation
openclaw install elevenlabs-tts
💻Code Examples
}
**Getting your API Key:**
1. Go to https://elevenlabs.io
2. Sign up/login
3. Click profile → API Keys
4. Copy your key
## Recommended Voices for v3
These premade voices are optimized for v3 and work well with audio tags:
| Voice | ID | Gender | Accent | Best For |
|-------|-----|--------|--------|----------|
| **Adam** | `pNInz6obpgDQGcFmaJgB` | Male | American | Deep narration, general use |
| **Rachel** | `21m00Tcm4TlvDq8ikWAM` | Female | American | Calm narration, conversational |
| **Brian** | `nPczCjzI2devNBz1zQrb` | Male | American | Deep narration, podcasts |
| **Charlotte** | `XB0fDUnXU5powFXDhCwa` | Female | English-Swedish | Expressive, video games |
| **George** | `JBFqnCBsd6RMkjVDRZzb` | Male | British | Raspy narration, storytelling |
**Finding more voices:**
- Browse: https://elevenlabs.io/voice-library
- v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH
- API: `GET https://api.elevenlabs.io/v1/voices`
**Voice selection tips:**
- Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
- Match voice character to your use case (whispering voice won't shout well)
- For expressive IVCs, include varied emotional tones in training samples
## Model Settings
- **Model**: `eleven_v3` (alpha) - ONLY model supporting audio tags
- **Languages**: 70+ supported with full audio tag control
### Stability Modes
| Mode | Stability | Description |
|------|-----------|-------------|
| **Creative** | 0.3-0.5 | More emotional/expressive, may hallucinate |
| **Natural** | 0.5-0.7 | Balanced, closest to original voice |
| **Robust** | 0.7-1.0 | Highly stable, less responsive to tags |
For audio tags, use **Creative** (0.5) or **Natural**. Higher stability reduces tag responsiveness.
### Speed Control
Range: 0.7 (slow) to 1.2 (fast), default 1.0
Extreme values affect quality. For pacing, prefer audio tags like `[rushed]` or `[drawn out]`.
## Critical Rules
### Length Limits
- **Optimal**: <800 characters per segment (best quality)
- **Maximum**: 10,000 characters (API hard limit)
- **Quality degrades** with longer text - voice becomes inconsistent
### Audio Tags - Best Practices for Natural Sound
**How many tags to use:**
- 1-2 tags per sentence or phrase (not more!)
- Tags persist until the next tag - no need to repeat
- Overusing tags sounds unnatural and robotic
**Where to place tags:**
- At emotional transition points
- Before key dramatic moments
- When energy/pace changes
**Context matters:**
- Write text that *matches* the tag emotion
- Longer text with context = better interpretation
- Example: `[nervous] I... I'm not sure about this. What if it doesn't work?` works better than `[nervous] Hello.`
**Combine tags for nuance:**
- `[nervously][whispers]` = nervous whispering
- `[excited][laughs]` = excited laughter
- Keep combinations to 2 tags max
**Regenerate for best results:**
- v3 is non-deterministic - same text = different outputs
- Generate 3+ versions, pick the best
- Small text tweaks can improve results
**Match tag to voice:**
- Don't use `[shouts]` on a whispering voice
- Don't use `[whispers]` on a loud/energetic voice
- Test tags with your chosen voice
### SSML Not Supported
v3 does NOT support SSML break tags. Use audio tags and punctuation instead.
### Punctuation Effects (use with tags!)
Punctuation enhances audio tags:
- **Ellipses (...)** → dramatic pauses: `[nervous] I... I don't know...`
- **CAPS** → emphasis: `[excited] That's AMAZING!`
- **Dashes (—)** → interruptions: `[explaining] So what you do is— [interrupting] Wait!`
- **Question marks** → uncertainty: `[nervous] Are you sure about this?`
- **Exclamation!** → energy boost: `[happy] We did it!`
Combine tags + punctuation for maximum effect:[tired] It was a long day... [sighs] Nobody listens anymore.
## WhatsApp Voice Messages
### Complete Workflow
1. **Generate** with `tts` tool (returns MP3)
2. **Convert** to Opus (required for Android!)
3. **Send** with `message` tool
### Step-by-Step
**1. Generate TTS (add [pause] at end to prevent cutoff):**tts text="[excited] This is amazing! [pause]" channel=whatsapp
Returns: `MEDIA:/tmp/tts-xxx/voice-123.mp3`
**2. Convert MP3 → Opus:**ffmpeg -i /tmp/tts-xxx/voice-123.mp3 -c:a libopus -b:a 64k -vbr on -application voip /tmp/tts-xxx/voice-123.ogg
**3. Send the Opus file:**
> **Note:** The `message` field below contains a Unicode Left-to-Right Mark (U+200E) between the quotes.
> This is intentional — WhatsApp requires a non-empty message body to send voice notes.
> The LTR mark is invisible but satisfies this requirement without displaying any text.message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message=""
### Why Opus?
| Format | iOS | Android | Transcribe |
|--------|-----|---------|------------|
| MP3 | ✅ Works | ❌ May fail | ❌ No |
| Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |
**Always convert to Opus** - it's the only format that:
- Works on all devices (iOS + Android)
- Supports WhatsApp's transcribe button
### Audio Cutoff Fix
ElevenLabs sometimes cuts off the last word. **Always add `[pause]` or `...` at the end:**[excited] This is amazing! [pause]
## Long-Form Audio (Podcasts)
For content >800 chars:
1. Split into short segments (<800 chars each)
2. Generate each with `tts` tool
3. Concatenate with ffmpeg:{
"messages": {
"tts": {
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "sk_your_api_key_here",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"modelId": "eleven_v3",
"languageCode": "en",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0,
"useSpeakerBoost": true,
"speed": 1
}
}
}
}
}cat > list.txt << EOF
file '/path/file1.mp3'
file '/path/file2.mp3'
EOF
ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
Adversarial Prompting
Adversarial analysis to critique, fix.