✓ Verified
✍️ Content Creation
✓ Enhanced Data
Voice To Text
Convert voice messages and audio files to text using Vosk offline speech recognition.
- Rating
- 5 (394 reviews)
- Downloads
- 2,558 downloads
- Version
- 1.0.0
Overview
Convert voice messages and audio files to text using Vosk offline speech recognition.
Complete Documentation
View Source →
Voice to Text
Convert voice messages and audio files to text using Vosk, an offline speech recognition toolkit.
Setup
- Install dependencies:
bash
# macOS
brew install ffmpeg
pip install vosk
# Linux
apt-get install ffmpeg
pip install vosk
- Download a Vosk model:
bash
mkdir -p ~/.vosk/models && cd ~/.vosk/models
# Chinese (small, fast)
curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip
unzip vosk-model-small-cn-0.22.zip
# English (small)
curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
Usage
When the user provides a voice message or audio file path, run the transcription:
bash
python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"
For specific model selection, set the environment variable:
bash
VOSK_MODEL_PATH=~/.vosk/models/vosk-model-cn-0.22 python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"
Supported Audio Formats
- MP3, WAV, M4A, OGG, FLAC, AAC, WEBM
- Voice messages from WeChat, Telegram, WhatsApp, etc.
Available Models
| Model | Language | Size | Notes |
|---|---|---|---|
| vosk-model-small-cn-0.22 | Chinese | 42M | Fast, good accuracy |
| vosk-model-cn-0.22 | Chinese | 1.3G | High accuracy |
| vosk-model-small-en-us-0.15 | English | 40M | Fast, good accuracy |
| vosk-model-en-us-0.22 | English | 1.8G | High accuracy |
Example Workflow
- User sends a voice message via WeChat/Telegram
- OpenClaw receives the audio file
- Run:
python3 transcribe.py /path/to/voice.ogg - Return transcribed text to user
Troubleshooting
- No model found: Download a model to
~/.vosk/models/ - ffmpeg not found: Install via
brew install ffmpegorapt install ffmpeg - Poor accuracy: Try a larger model for better results
Notes
- Works completely offline after model download
- Supports multiple languages (download appropriate model)
- Audio is converted to 16kHz mono WAV for processing
Installation
Terminal bash
openclaw install voice-to-text
Copied!
💻Code Examples
example.sh
# macOS
brew install ffmpeg
pip install vosk
# Linux
apt-get install ffmpeg
pip install voskexample.sh
mkdir -p ~/.vosk/models && cd ~/.vosk/models
# Chinese (small, fast)
curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip
unzip vosk-model-small-cn-0.22.zip
# English (small)
curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zipTags
#media_and-streaming
Quick Info
Category Content Creation
Model Claude 3.5
Complexity One-Click
Author vae999
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
Ready to Install?
Get started with this skill in seconds
openclaw install voice-to-text
Related Skills
✓ Verified
💻 Development
4claw
4claw — a moderated imageboard for AI agents.
🧠 Claude-Ready
)}
★ 4.4 (118)
↓ 4,990
v1.0.0
✓ Verified
💻 Development
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
🧠 Claude-Ready
)}
★ 4.3 (89)
↓ 4,621
v1.0.0
✓ Verified
💻 Development
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
🧠 Claude-Ready
)}
★ 4.7 (88)
↓ 1,625
v1.0.0
✓ Verified
💻 Development
Adversarial Prompting
Adversarial analysis to critique, fix.
🧠 Claude-Ready
)}
★ 4.6 (372)
↓ 28,222
v1.0.0