✓ Verified ✍️ Content Creation ✓ Enhanced Data

Salute Speech

Transcribe audio files using Sber Salute Speech async API.

Rating: 4.6 (127 reviews)
Downloads: 537 downloads
Version: 1.0.0

Overview

Transcribe audio files using Sber Salute Speech async API.

Complete Documentation

View Source →

Audio Transcription with Sber Salute Speech

Transcribe audio/video files to text with timestamps via Salute Speech async REST API.

Requirements

API Key: Environment variable SALUTE_AUTH_DATA must be set (Base64-encoded client_id:client_secret or raw authorization key from https://developers.sber.ru/studio/).
SSL note: The script disables SSL verification by default (verify_ssl=False) because Sber's certificate chain is non-standard. This is expected.

Supported formats & encodings

Audio encoding	Content-Type	Typical extensions
MP3	audio/mpeg	.mp3
PCM_S16LE	audio/wav	.wav
OPUS	audio/ogg	.ogg, .opus
FLAC	audio/flac	.flac
ALAW	audio/alaw	.alaw
MULAW	audio/mulaw	.mulaw

Supported languages

ru-RU, en-US, kk-KZ (Kazakh), ky-KG (Kyrgyz), uz-UZ (Uzbek).

Workflow

Identify input files — from user request.
Read API key from host environment.
Run transcription — execute salute_transcribe.py with uv and appropriate arguments.
Deliver results — present to user human-readable transcript with timestamps to the user and give a direct link to files.

Usage

bash

uv run --with requests {baseDir}/salute_transcribe.py \
  --file /path/to/audio.mp3 \
  --output_dir ~/.openclaw/workspace/transcriptions \
  --lang ru-RU

Arguments

Argument	Required	Default	Description
--file	Yes	—	Path to audio/video file
--output_dir	No	~/.openclaw/workspace/transcribations	Output directory for results
--lang	No	ru-RU	Language code: ru-RU, en-US, kk-KZ, ky-KG, uz-UZ
--audio-encoding	No	MP3	Codec: MP3, PCM_S16LE, OPUS, FLAC, ALAW, MULAW
--model	No	general	Recognition model: general or callcenter
--hyp-count	No	1	Number of alternative hypotheses: 1 or 2
--max-wait-time	No	300	Max seconds to wait for async result
--print	No	off	Also print transcription to stdout

Content-Type mapping

When the file extension doesn't match audio/mpeg, adjust content_type in the script or add logic. Current default is audio/mpeg (MP3). For .wav files use audio/wav, etc.

Output files

For input file meetingABC.mp3 the script produces:

File	Description
meetingABC_recognition_orig.json	Raw API response (full JSON with all hypotheses, timing, confidence)
meetingABC_pretty.txt	Formatted human-readable transcript with timestamps

Output text format

text

[00:01 - 00:20]:
Ну, даже если сосредоточиться на идее узкой щели.

[00:20 - 00:45]:
Следующий фрагмент текста здесь.

Notes

Token is valid for ~30 minutes; the script fetches a new one each run.
Large files (>1 hour) may need --max-wait-time increased beyond 300s.
The callcenter model is optimized for telephony audio (8kHz, mono).
Profanity filter is disabled by default (enable_profanity_filter=False).
The script uses normalized text by default (numbers as digits, abbreviations expanded). Raw text is also available in the JSON output.

Installation

Terminal bash


openclaw install salute-speech

Copied!

💻Code Examples

--lang ru-RU

---lang-ru-ru.txt

### Arguments

| Argument | Required | Default | Description |
|----------|----------|---------|-------------|
| `--file` | **Yes** | — | Path to audio/video file |
| `--output_dir` | No | `~/.openclaw/workspace/transcribations` | Output directory for results |
| `--lang` | No | `ru-RU` | Language code: `ru-RU`, `en-US`, `kk-KZ`, `ky-KG`, `uz-UZ` |
| `--audio-encoding` | No | `MP3` | Codec: `MP3`, `PCM_S16LE`, `OPUS`, `FLAC`, `ALAW`, `MULAW` |
| `--model` | No | `general` | Recognition model: `general` or `callcenter` |
| `--hyp-count` | No | `1` | Number of alternative hypotheses: `1` or `2` |
| `--max-wait-time` | No | `300` | Max seconds to wait for async result |
| `--print` | No | off | Also print transcription to stdout |

### Content-Type mapping

When the file extension doesn't match `audio/mpeg`, adjust `content_type` in the script or add logic. Current default is `audio/mpeg` (MP3). For `.wav` files use `audio/wav`, etc.

## Output files

For input file `meetingABC.mp3` the script produces:

| File | Description |
|------|-------------|
| `meetingABC_recognition_orig.json` | Raw API response (full JSON with all hypotheses, timing, confidence) |
| `meetingABC_pretty.txt` | Formatted human-readable transcript with timestamps |

### Output text format

example.sh

uv run --with requests {baseDir}/salute_transcribe.py \
  --file /path/to/audio.mp3 \
  --output_dir ~/.openclaw/workspace/transcriptions \
  --lang ru-RU

example.txt

[00:01 - 00:20]:
Ну, даже если сосредоточиться на идее узкой щели.

[00:20 - 00:45]:
Следующий фрагмент текста здесь.

Related Skills

✓ Verified 💻 Development

4claw

4claw — a moderated imageboard for AI agents.

🧠 Claude-Ready #ai_and-llms

✓ Verified 💻 Development

Aap Passport

Agent Attestation Protocol - The Reverse Turing Test.

🧠 Claude-Ready #ai_and-llms

✓ Verified 💻 Development

Adaptive Suite

A continuously adaptive skill suite that empowers Clawdbot.

🧠 Claude-Ready #ai_and-llms #bot

✓ Verified 💻 Development

Adversarial Prompting

Adversarial analysis to critique, fix.

🧠 Claude-Ready #ai_and-llms

Salute Speech

Overview

Complete Documentation

Audio Transcription with Sber Salute Speech

Requirements

Supported formats & encodings

Supported languages

Workflow

Usage

Arguments

Content-Type mapping

Output files

Output text format

Notes

Installation

💻Code Examples

--lang ru-RU

Tags

Quick Info

Ready to Install?

Resources

Related Skills

4claw

Aap Passport

Adaptive Suite

Adversarial Prompting