Qwenspeak
Text-to-speech generation via Qwen3-TTS over SSH.
- Rating
- 4.5 (133 reviews)
- Downloads
- 42,954 downloads
- Version
- 1.0.0
Overview
Text-to-speech generation via Qwen3-TTS over SSH.
Complete Documentation
View Source →
qwenspeak
YAML-driven text-to-speech over SSH using Qwen3-TTS models.
For installation and deployment, see references/setup.md.
SSH Wrapper
Use scripts/qwenspeak.sh for all commands. It handles host, port, and host key acceptance via QWENSPEAK_HOST and QWENSPEAK_PORT env vars.
scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file
TTS Generation
Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.
# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml
# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}
# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"
# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"
# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wav
YAML Structure
Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.
steps:
- mode: custom-voice
model_size: 1.7b
speaker: Ryan
language: English
generate:
- text: "Hello world"
output: hello.wav
- text: "I cannot believe this!"
speaker: Vivian
instruct: "Speak angrily"
output: angry.wav
- mode: voice-design
generate:
- text: "Welcome to our store."
instruct: "A warm, friendly young female voice with a cheerful tone"
output: welcome.wav
- mode: voice-clone
model_size: 1.7b
ref_audio: ref.wav
ref_text: "Transcript of reference"
generate:
- text: "First line in cloned voice"
output: clone1.wav
- text: "Second line"
output: clone2.wav
Modes
custom-voice — Pick from 9 preset speakers. 1.7B supports emotion/style via instruct.
voice-design — Describe the voice in natural language via instruct. 1.7B only.
voice-clone — Clone from reference audio. Set ref_audio and ref_text at step level to reuse across generations. x_vector_only: true skips transcript.
Emotion trick for cloned voices
Upload references with different emotions, use separate steps:
scripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wav
steps:
- mode: voice-clone
ref_audio: refs/happy.wav
ref_text: "transcript of happy ref"
generate:
- text: "Great news everyone!"
output: happy1.wav
- mode: voice-clone
ref_audio: refs/angry.wav
ref_text: "transcript of angry ref"
generate:
- text: "This is unacceptable"
output: angry1.wav
Job Management
scripts/qwenspeak.sh "tts list-jobs" # list all
scripts/qwenspeak.sh "tts list-jobs --json" # JSON output
scripts/qwenspeak.sh "tts get-job <id>" # job details
scripts/qwenspeak.sh "tts get-job-log <id>" # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f" # follow log
scripts/qwenspeak.sh "tts cancel-job <id>" # cancel
Statuses: queued → running → completed | failed | cancelled
Completed jobs auto-cleaned after 1 day, all jobs after 1 week. UUID prefixes work (e.g. first 8 chars).
File Operations
All paths relative to the work directory. Traversal blocked.
| Command | Description |
|---|---|
| put | Upload file from stdin |
| get | Download file to stdout |
| list-files [--json] | List directory |
| remove-file | Delete a file |
| create-dir | Create directory |
| remove-dir | Remove empty directory |
| move-file | Move or rename |
| copy-file | Copy a file |
| file-exists | Check if file exists (true/false) |
| search-files | Glob search ( recursive) |
Speakers
| Speaker | Gender | Language | Description |
|---|---|---|---|
| Vivian | Female | Chinese | Bright, slightly edgy young voice |
| Serena | Female | Chinese | Warm, gentle young voice |
| Uncle_Fu | Male | Chinese | Seasoned, low mellow timbre |
| Dylan | Male | Chinese | Youthful Beijing dialect, clear natural timbre |
| Eric | Male | Chinese | Lively Chengdu/Sichuan dialect, slightly husky |
| Ryan | Male | English | Dynamic with strong rhythmic drive |
| Aiden | Male | English | Sunny American, clear midrange |
| Ono_Anna | Female | Japanese | Playful, light nimble timbre |
| Sohee | Female | Korean | Warm with rich emotion |
YAML Options
All settings cascade: global > step > generation.
| Field | Default | Description |
|---|---|---|
| dtype | float32 | float32, float16, bfloat16 (float16/bfloat16 GPU only) |
| flash_attn | auto | FlashAttention-2: auto-detects, auto-switches float32→bfloat16 |
| temperature | 0.9 | Sampling temperature |
| top_k | 50 | Top-k sampling |
| top_p | 1.0 | Top-p / nucleus sampling |
| repetition_penalty | 1.05 | Repetition penalty |
| max_new_tokens | 2048 | Max codec tokens to generate |
| no_sample | false | Greedy decoding |
| streaming | false | Streaming mode (lower latency) |
| mode | required | Step only: custom-voice, voice-design, or voice-clone |
| model_size | 1.7b | Step only: 1.7b or 0.6b |
| text | required | Text to synthesize |
| output | required | Output file path |
| speaker | Vivian | custom-voice: speaker name |
| language | Auto | Language for synthesis |
| instruct | - | custom-voice: emotion/style; voice-design: voice description |
| ref_audio | - | voice-clone: reference audio file path |
| ref_text | - | voice-clone: transcript of reference audio |
| x_vector_only | false | voice-clone: use speaker embedding only |
Installation
openclaw install qwenspeak
💻Code Examples
scripts/qwenspeak.sh <command> > output_file
## TTS Generation
Submit YAML, get a job UUID back immediately, poll for progress. Jobs run sequentially — one at a time, the rest queue up.scripts/qwenspeak.sh "get hello.wav" > hello.wav
## YAML Structure
Global settings + list of steps. Each step loads a model, runs all its generations, then unloads. Settings cascade: global > step > generation.output: clone2.wav
## Modes
**custom-voice** — Pick from 9 preset speakers. 1.7B supports emotion/style via `instruct`.
**voice-design** — Describe the voice in natural language via `instruct`. 1.7B only.
**voice-clone** — Clone from reference audio. Set `ref_audio` and `ref_text` at step level to reuse across generations. `x_vector_only: true` skips transcript.
### Emotion trick for cloned voices
Upload references with different emotions, use separate steps:scripts/qwenspeak.sh <command> [args]
scripts/qwenspeak.sh <command> < input_file
scripts/qwenspeak.sh <command> > output_file# Get the YAML template
scripts/qwenspeak.sh "tts print-yaml" > job.yaml
# Submit job
scripts/qwenspeak.sh "tts" < job.yaml
# {"id": "550e8400-...", "status": "queued", "total_steps": 3, "total_generations": 7}
# Check progress
scripts/qwenspeak.sh "tts get-job 550e8400"
# Follow job log
scripts/qwenspeak.sh "tts get-job-log 550e8400 -f"
# Download result
scripts/qwenspeak.sh "get hello.wav" > hello.wavsteps:
- mode: custom-voice
model_size: 1.7b
speaker: Ryan
language: English
generate:
- text: "Hello world"
output: hello.wav
- text: "I cannot believe this!"
speaker: Vivian
instruct: "Speak angrily"
output: angry.wav
- mode: voice-design
generate:
- text: "Welcome to our store."
instruct: "A warm, friendly young female voice with a cheerful tone"
output: welcome.wav
- mode: voice-clone
model_size: 1.7b
ref_audio: ref.wav
ref_text: "Transcript of reference"
generate:
- text: "First line in cloned voice"
output: clone1.wav
- text: "Second line"
output: clone2.wavscripts/qwenspeak.sh "create-dir refs"
scripts/qwenspeak.sh "put refs/happy.wav" < me_happy.wav
scripts/qwenspeak.sh "put refs/angry.wav" < me_angry.wavsteps:
- mode: voice-clone
ref_audio: refs/happy.wav
ref_text: "transcript of happy ref"
generate:
- text: "Great news everyone!"
output: happy1.wav
- mode: voice-clone
ref_audio: refs/angry.wav
ref_text: "transcript of angry ref"
generate:
- text: "This is unacceptable"
output: angry1.wavscripts/qwenspeak.sh "tts list-jobs" # list all
scripts/qwenspeak.sh "tts list-jobs --json" # JSON output
scripts/qwenspeak.sh "tts get-job <id>" # job details
scripts/qwenspeak.sh "tts get-job-log <id>" # view log
scripts/qwenspeak.sh "tts get-job-log <id> -f" # follow log
scripts/qwenspeak.sh "tts cancel-job <id>" # cancelTags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
Adversarial Prompting
Adversarial analysis to critique, fix.