Zhipu Tts
Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model.
- Rating
- 3.8 (372 reviews)
- Downloads
- 16,968 downloads
- Version
- 1.0.0
Overview
Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model.
Complete Documentation
View Source →
Zhipu AI Text-to-Speech
Convert Chinese text to natural-sounding speech using Zhipu AI's GLM-TTS model.
Setup
1. Get your API Key: Get a key from Zhipu AI Console
2. Set it in your environment:
export ZHIPU_API_KEY="your-key-here"
Available Voices
System Voices (Pre-built)
- tongtong (彤彤) - Default voice, balanced tone
- chuichui (锤锤) - Male voice, deeper tone
- xiaochen (小陈) - Young professional voice
- jam - 动动动物圈 Jam voice
- kazi - 动动动物圈 Kazi voice
- douji - 动动动物圈 Douji voice
- luodo - 动动动物圈 Luodo voice
Usage
Basic Text-to-Speech
Convert text to speech with default settings (tongtong voice, normal speed, WAV format):
bash scripts/text_to_speech.sh "你好,今天天气怎么样"
Advanced Options
Specify voice, speed, format, and output filename:
bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav
Parameters:
text(required): Chinese text to convert (max 1024 characters)voice(optional): tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodospeed(optional): Speech speed from 0.5 to 2.0 (default: 1.0)output_format(optional): wav (default), pcmoutput_file(optional): Output filename (default: output.{format})
Voice Selection Guide
Choose tongtong (default) for:
- General purpose narration
- Professional presentations
- Balanced tone requirements
- Male voice needed
- Deeper, authoritative tone
- Documentary or formal content
- Young, energetic tone
- Modern, casual content
- Friendly assistant vibe
- Entertainment content
- Character voices
- Creative projects
Speed Control
Recommended speeds:
- 0.8-1.0: Clear, professional narration
- 1.0-1.2: Natural conversational pace (default: 1.0)
- 1.2-1.5: Energetic, upbeat delivery
- 1.5-2.0: Fast-paced summaries (may reduce clarity)
Output Formats
WAV (recommended):
- Standard audio format
- Widely compatible
- Better quality preservation
- Raw audio format
- Smaller file size
- Requires additional processing for playback
Examples
Create a professional greeting:
bash scripts/text_to_speech.sh "您好,感谢致电智能客服,请按1选择中文服务" tongtong 1.0 wav greeting.wav
Generate an energetic announcement:
bash scripts/text_to_speech.sh "热烈欢迎各位嘉宾参加今天的活动!" xiaochen 1.3 wav announcement.wav
Create a calm narration:
bash scripts/text_to_speech.sh "在这个宁静的夜晚,让我们一起欣赏美丽的星空" chuichui 0.9 wav narration.wav
Character Limits
- Maximum input: 1024 characters per request
- For longer texts, split into multiple segments
- Combine audio files post-generation
Audio Quality Tips
Best practices:
- Use punctuation for natural pauses (commas, periods)
- Break long sentences into shorter segments
- Use appropriate line breaks for paragraph pauses
- Test speed settings for your specific content
Troubleshooting
Text Length Issues:
- Split texts longer than 1024 characters
- Process segments separately
- Combine using audio editing tools
- Check text encoding (use UTF-8)
- Verify punctuation placement
- Adjust speed settings
- Try different voices
- Ensure format compatibility with your player
- WAV format works on most systems
- PCM may require conversion
API Notes
- Responses are returned as audio files
- Watermarking enabled by default (can be disabled in account settings)
- No strict rate limiting documented
- Audio generation typically completes in 1-3 seconds
Installation
openclaw install zhipu-tts
💻Code Examples
bash scripts/text_to_speech.sh "你好,今天天气怎么样"
### Advanced Options
Specify voice, speed, format, and output filename:bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav
**Parameters:**
- `text` (required): Chinese text to convert (max 1024 characters)
- `voice` (optional): tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodo
- `speed` (optional): Speech speed from 0.5 to 2.0 (default: 1.0)
- `output_format` (optional): wav (default), pcm
- `output_file` (optional): Output filename (default: output.{format})
## Voice Selection Guide
**Choose tongtong (default) for:**
- General purpose narration
- Professional presentations
- Balanced tone requirements
**Choose chuichui for:**
- Male voice needed
- Deeper, authoritative tone
- Documentary or formal content
**Choose xiaochen for:**
- Young, energetic tone
- Modern, casual content
- Friendly assistant vibe
**Choose jam/kazi/douji/luodo for:**
- Entertainment content
- Character voices
- Creative projects
## Speed Control
**Recommended speeds:**
- **0.8-1.0**: Clear, professional narration
- **1.0-1.2**: Natural conversational pace (default: 1.0)
- **1.2-1.5**: Energetic, upbeat delivery
- **1.5-2.0**: Fast-paced summaries (may reduce clarity)
## Output Formats
**WAV (recommended):**
- Standard audio format
- Widely compatible
- Better quality preservation
**PCM:**
- Raw audio format
- Smaller file size
- Requires additional processing for playback
## Examples
Create a professional greeting:Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.