✓ Verified ✍️ Content Creation ✓ Enhanced Data

Salute Speech

Transcribe audio files using Sber Salute Speech async API.

Rating
4.6 (127 reviews)
Downloads
537 downloads
Version
1.0.0

Overview

Transcribe audio files using Sber Salute Speech async API.

Complete Documentation

View Source →

Audio Transcription with Sber Salute Speech

Transcribe audio/video files to text with timestamps via Salute Speech async REST API.

Requirements

  • API Key: Environment variable SALUTE_AUTH_DATA must be set (Base64-encoded client_id:client_secret or raw authorization key from https://developers.sber.ru/studio/).
  • SSL note: The script disables SSL verification by default (verify_ssl=False) because Sber's certificate chain is non-standard. This is expected.

Supported formats & encodings

Audio encodingContent-TypeTypical extensions
MP3audio/mpeg.mp3
PCM_S16LEaudio/wav.wav
OPUSaudio/ogg.ogg, .opus
FLACaudio/flac.flac
ALAWaudio/alaw.alaw
MULAWaudio/mulaw.mulaw

Supported languages

ru-RU, en-US, kk-KZ (Kazakh), ky-KG (Kyrgyz), uz-UZ (Uzbek).

Workflow

  • Identify input files — from user request.
  • Read API key from host environment.
  • Run transcription — execute salute_transcribe.py with uv and appropriate arguments.
  • Deliver results — present to user human-readable transcript with timestamps to the user and give a direct link to files.

Usage

bash
uv run --with requests {baseDir}/salute_transcribe.py \
  --file /path/to/audio.mp3 \
  --output_dir ~/.openclaw/workspace/transcriptions \
  --lang ru-RU

Arguments

ArgumentRequiredDefaultDescription
--fileYesPath to audio/video file
--output_dirNo~/.openclaw/workspace/transcribationsOutput directory for results
--langNoru-RULanguage code: ru-RU, en-US, kk-KZ, ky-KG, uz-UZ
--audio-encodingNoMP3Codec: MP3, PCM_S16LE, OPUS, FLAC, ALAW, MULAW
--modelNogeneralRecognition model: general or callcenter
--hyp-countNo1Number of alternative hypotheses: 1 or 2
--max-wait-timeNo300Max seconds to wait for async result
--printNooffAlso print transcription to stdout

Content-Type mapping

When the file extension doesn't match audio/mpeg, adjust content_type in the script or add logic. Current default is audio/mpeg (MP3). For .wav files use audio/wav, etc.

Output files

For input file meetingABC.mp3 the script produces:

FileDescription
meetingABC_recognition_orig.jsonRaw API response (full JSON with all hypotheses, timing, confidence)
meetingABC_pretty.txtFormatted human-readable transcript with timestamps

Output text format

text
[00:01 - 00:20]:
Ну, даже если сосредоточиться на идее узкой щели.

[00:20 - 00:45]:
Следующий фрагмент текста здесь.

Notes

  • Token is valid for ~30 minutes; the script fetches a new one each run.
  • Large files (>1 hour) may need --max-wait-time increased beyond 300s.
  • The callcenter model is optimized for telephony audio (8kHz, mono).
  • Profanity filter is disabled by default (enable_profanity_filter=False).
  • The script uses normalized text by default (numbers as digits, abbreviations expanded). Raw text is also available in the JSON output.

Installation

Terminal bash

openclaw install salute-speech
    
Copied!

💻Code Examples

--lang ru-RU

---lang-ru-ru.txt
### Arguments

| Argument | Required | Default | Description |
|----------|----------|---------|-------------|
| `--file` | **Yes** | — | Path to audio/video file |
| `--output_dir` | No | `~/.openclaw/workspace/transcribations` | Output directory for results |
| `--lang` | No | `ru-RU` | Language code: `ru-RU`, `en-US`, `kk-KZ`, `ky-KG`, `uz-UZ` |
| `--audio-encoding` | No | `MP3` | Codec: `MP3`, `PCM_S16LE`, `OPUS`, `FLAC`, `ALAW`, `MULAW` |
| `--model` | No | `general` | Recognition model: `general` or `callcenter` |
| `--hyp-count` | No | `1` | Number of alternative hypotheses: `1` or `2` |
| `--max-wait-time` | No | `300` | Max seconds to wait for async result |
| `--print` | No | off | Also print transcription to stdout |

### Content-Type mapping

When the file extension doesn't match `audio/mpeg`, adjust `content_type` in the script or add logic. Current default is `audio/mpeg` (MP3). For `.wav` files use `audio/wav`, etc.

## Output files

For input file `meetingABC.mp3` the script produces:

| File | Description |
|------|-------------|
| `meetingABC_recognition_orig.json` | Raw API response (full JSON with all hypotheses, timing, confidence) |
| `meetingABC_pretty.txt` | Formatted human-readable transcript with timestamps |

### Output text format
example.sh
uv run --with requests {baseDir}/salute_transcribe.py \
  --file /path/to/audio.mp3 \
  --output_dir ~/.openclaw/workspace/transcriptions \
  --lang ru-RU
example.txt
[00:01 - 00:20]:
Ну, даже если сосредоточиться на идее узкой щели.

[00:20 - 00:45]:
Следующий фрагмент текста здесь.

Tags

#media_and-streaming #api

Quick Info

Category Content Creation
Model Claude 3.5
Complexity One-Click
Author chorus12
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install salute-speech