✓ Verified ✍️ Content Creation ✓ Enhanced Data

Voice To Text

Convert voice messages and audio files to text using Vosk offline speech recognition.

Rating: 5 (394 reviews)
Downloads: 2,558 downloads
Version: 1.0.0

Overview

Convert voice messages and audio files to text using Vosk offline speech recognition.

Complete Documentation

View Source →

Voice to Text

Convert voice messages and audio files to text using Vosk, an offline speech recognition toolkit.

Setup

Install dependencies:

bash

# macOS
   brew install ffmpeg
   pip install vosk

   # Linux
   apt-get install ffmpeg
   pip install vosk

Download a Vosk model:

bash

mkdir -p ~/.vosk/models && cd ~/.vosk/models

   # Chinese (small, fast)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip
   unzip vosk-model-small-cn-0.22.zip

   # English (small)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
   unzip vosk-model-small-en-us-0.15.zip

Usage

When the user provides a voice message or audio file path, run the transcription:

bash

python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

For specific model selection, set the environment variable:

bash

VOSK_MODEL_PATH=~/.vosk/models/vosk-model-cn-0.22 python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

Supported Audio Formats

MP3, WAV, M4A, OGG, FLAC, AAC, WEBM
Voice messages from WeChat, Telegram, WhatsApp, etc.

Available Models

Model	Language	Size	Notes
vosk-model-small-cn-0.22	Chinese	42M	Fast, good accuracy
vosk-model-cn-0.22	Chinese	1.3G	High accuracy
vosk-model-small-en-us-0.15	English	40M	Fast, good accuracy
vosk-model-en-us-0.22	English	1.8G	High accuracy

Download models from: https://alphacephei.com/vosk/models

Example Workflow

User sends a voice message via WeChat/Telegram
OpenClaw receives the audio file
Run: python3 transcribe.py /path/to/voice.ogg
Return transcribed text to user

Troubleshooting

No model found: Download a model to ~/.vosk/models/
ffmpeg not found: Install via brew install ffmpeg or apt install ffmpeg
Poor accuracy: Try a larger model for better results

Notes

Works completely offline after model download
Supports multiple languages (download appropriate model)
Audio is converted to 16kHz mono WAV for processing

Installation

Terminal bash


openclaw install voice-to-text

Copied!

💻Code Examples

example.sh

# macOS
   brew install ffmpeg
   pip install vosk

   # Linux
   apt-get install ffmpeg
   pip install vosk

example.sh

mkdir -p ~/.vosk/models && cd ~/.vosk/models

   # Chinese (small, fast)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip
   unzip vosk-model-small-cn-0.22.zip

   # English (small)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
   unzip vosk-model-small-en-us-0.15.zip