✓ Verified ✍️ Content Creation ✓ Enhanced Data

Voice To Text

Convert voice messages and audio files to text using Vosk offline speech recognition.

Rating
5 (394 reviews)
Downloads
2,558 downloads
Version
1.0.0

Overview

Convert voice messages and audio files to text using Vosk offline speech recognition.

Complete Documentation

View Source →

Voice to Text

Convert voice messages and audio files to text using Vosk, an offline speech recognition toolkit.

Setup

  • Install dependencies:
bash
# macOS
   brew install ffmpeg
   pip install vosk

   # Linux
   apt-get install ffmpeg
   pip install vosk
  • Download a Vosk model:
bash
mkdir -p ~/.vosk/models && cd ~/.vosk/models

   # Chinese (small, fast)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip
   unzip vosk-model-small-cn-0.22.zip

   # English (small)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
   unzip vosk-model-small-en-us-0.15.zip

Usage

When the user provides a voice message or audio file path, run the transcription:

bash
python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

For specific model selection, set the environment variable:

bash
VOSK_MODEL_PATH=~/.vosk/models/vosk-model-cn-0.22 python3 ~/skills/voice-to-text/transcribe.py "<audio_file_path>"

Supported Audio Formats

  • MP3, WAV, M4A, OGG, FLAC, AAC, WEBM
  • Voice messages from WeChat, Telegram, WhatsApp, etc.

Available Models

ModelLanguageSizeNotes
vosk-model-small-cn-0.22Chinese42MFast, good accuracy
vosk-model-cn-0.22Chinese1.3GHigh accuracy
vosk-model-small-en-us-0.15English40MFast, good accuracy
vosk-model-en-us-0.22English1.8GHigh accuracy
Download models from: https://alphacephei.com/vosk/models

Example Workflow

  • User sends a voice message via WeChat/Telegram
  • OpenClaw receives the audio file
  • Run: python3 transcribe.py /path/to/voice.ogg
  • Return transcribed text to user

Troubleshooting

  • No model found: Download a model to ~/.vosk/models/
  • ffmpeg not found: Install via brew install ffmpeg or apt install ffmpeg
  • Poor accuracy: Try a larger model for better results

Notes

  • Works completely offline after model download
  • Supports multiple languages (download appropriate model)
  • Audio is converted to 16kHz mono WAV for processing

Installation

Terminal bash

openclaw install voice-to-text
    
Copied!

💻Code Examples

example.sh
# macOS
   brew install ffmpeg
   pip install vosk

   # Linux
   apt-get install ffmpeg
   pip install vosk
example.sh
mkdir -p ~/.vosk/models && cd ~/.vosk/models

   # Chinese (small, fast)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-cn-0.22.zip
   unzip vosk-model-small-cn-0.22.zip

   # English (small)
   curl -LO https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
   unzip vosk-model-small-en-us-0.15.zip

Tags

#media_and-streaming

Quick Info

Category Content Creation
Model Claude 3.5
Complexity One-Click
Author vae999
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install voice-to-text