Salute Speech
Transcribe audio files using Sber Salute Speech async API.
- Rating
- 4.6 (127 reviews)
- Downloads
- 537 downloads
- Version
- 1.0.0
Overview
Transcribe audio files using Sber Salute Speech async API.
Complete Documentation
View Source →
Audio Transcription with Sber Salute Speech
Transcribe audio/video files to text with timestamps via Salute Speech async REST API.
Requirements
- API Key: Environment variable
SALUTE_AUTH_DATAmust be set (Base64-encodedclient_id:client_secretor raw authorization key from https://developers.sber.ru/studio/). - SSL note: The script disables SSL verification by default (
verify_ssl=False) because Sber's certificate chain is non-standard. This is expected.
Supported formats & encodings
| Audio encoding | Content-Type | Typical extensions |
|---|---|---|
| MP3 | audio/mpeg | .mp3 |
| PCM_S16LE | audio/wav | .wav |
| OPUS | audio/ogg | .ogg, .opus |
| FLAC | audio/flac | .flac |
| ALAW | audio/alaw | .alaw |
| MULAW | audio/mulaw | .mulaw |
Supported languages
ru-RU, en-US, kk-KZ (Kazakh), ky-KG (Kyrgyz), uz-UZ (Uzbek).
Workflow
- Identify input files — from user request.
- Read API key from host environment.
- Run transcription — execute
salute_transcribe.pywithuvand appropriate arguments. - Deliver results — present to user human-readable transcript with timestamps to the user and give a direct link to files.
Usage
uv run --with requests {baseDir}/salute_transcribe.py \
--file /path/to/audio.mp3 \
--output_dir ~/.openclaw/workspace/transcriptions \
--lang ru-RU
Arguments
| Argument | Required | Default | Description |
|---|---|---|---|
| --file | Yes | — | Path to audio/video file |
| --output_dir | No | ~/.openclaw/workspace/transcribations | Output directory for results |
| --lang | No | ru-RU | Language code: ru-RU, en-US, kk-KZ, ky-KG, uz-UZ |
| --audio-encoding | No | MP3 | Codec: MP3, PCM_S16LE, OPUS, FLAC, ALAW, MULAW |
| --model | No | general | Recognition model: general or callcenter |
| --hyp-count | No | 1 | Number of alternative hypotheses: 1 or 2 |
| --max-wait-time | No | 300 | Max seconds to wait for async result |
| No | off | Also print transcription to stdout |
Content-Type mapping
When the file extension doesn't match audio/mpeg, adjust content_type in the script or add logic. Current default is audio/mpeg (MP3). For .wav files use audio/wav, etc.
Output files
For input file meetingABC.mp3 the script produces:
| File | Description |
|---|---|
| meetingABC_recognition_orig.json | Raw API response (full JSON with all hypotheses, timing, confidence) |
| meetingABC_pretty.txt | Formatted human-readable transcript with timestamps |
Output text format
[00:01 - 00:20]:
Ну, даже если сосредоточиться на идее узкой щели.
[00:20 - 00:45]:
Следующий фрагмент текста здесь.
Notes
- Token is valid for ~30 minutes; the script fetches a new one each run.
- Large files (>1 hour) may need
--max-wait-timeincreased beyond 300s. - The
callcentermodel is optimized for telephony audio (8kHz, mono). - Profanity filter is disabled by default (
enable_profanity_filter=False). - The script uses normalized text by default (numbers as digits, abbreviations expanded). Raw text is also available in the JSON output.
Installation
openclaw install salute-speech
💻Code Examples
--lang ru-RU
### Arguments
| Argument | Required | Default | Description |
|----------|----------|---------|-------------|
| `--file` | **Yes** | — | Path to audio/video file |
| `--output_dir` | No | `~/.openclaw/workspace/transcribations` | Output directory for results |
| `--lang` | No | `ru-RU` | Language code: `ru-RU`, `en-US`, `kk-KZ`, `ky-KG`, `uz-UZ` |
| `--audio-encoding` | No | `MP3` | Codec: `MP3`, `PCM_S16LE`, `OPUS`, `FLAC`, `ALAW`, `MULAW` |
| `--model` | No | `general` | Recognition model: `general` or `callcenter` |
| `--hyp-count` | No | `1` | Number of alternative hypotheses: `1` or `2` |
| `--max-wait-time` | No | `300` | Max seconds to wait for async result |
| `--print` | No | off | Also print transcription to stdout |
### Content-Type mapping
When the file extension doesn't match `audio/mpeg`, adjust `content_type` in the script or add logic. Current default is `audio/mpeg` (MP3). For `.wav` files use `audio/wav`, etc.
## Output files
For input file `meetingABC.mp3` the script produces:
| File | Description |
|------|-------------|
| `meetingABC_recognition_orig.json` | Raw API response (full JSON with all hypotheses, timing, confidence) |
| `meetingABC_pretty.txt` | Formatted human-readable transcript with timestamps |
### Output text formatuv run --with requests {baseDir}/salute_transcribe.py \
--file /path/to/audio.mp3 \
--output_dir ~/.openclaw/workspace/transcriptions \
--lang ru-RU[00:01 - 00:20]:
Ну, даже если сосредоточиться на идее узкой щели.
[00:20 - 00:45]:
Следующий фрагмент текста здесь.Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
Adversarial Prompting
Adversarial analysis to critique, fix.