✓ Verified 💻 Development ✓ Enhanced Data

Zhipu Asr

Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model.

Rating
4.3 (53 reviews)
Downloads
8,864 downloads
Version
1.0.0

Overview

Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model.

Complete Documentation

View Source →

Zhipu AI Automatic Speech Recognition (ASR)

Transcribe Chinese audio files to text using Zhipu AI's GLM-ASR model.

Setup

1. Get your API Key: Get a key from Zhipu AI Console

2. Set it in your environment:

bash
export ZHIPU_API_KEY="your-key-here"

Supported Audio Formats

  • WAV - Recommended, best quality
  • MP3 - Widely supported
  • OGG - Auto-converted to MP3
  • M4A - Auto-converted to MP3
  • AAC - Auto-converted to MP3
  • FLAC - Auto-converted to MP3
  • WMA - Auto-converted to MP3
Note: The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.

File Constraints

  • Maximum file size: 25 MB
  • Maximum duration: 30 seconds
  • Recommended sample rate: 16000 Hz or higher
  • Audio channels: Mono or stereo

Usage

Basic Transcription

Transcribe an audio file with default settings:

bash
bash scripts/speech_to_text.sh recording.wav

Transcription with Context

Provide previous transcription or context for better accuracy:

bash
bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容,有助于提高准确性"

Transcription with Hotwords

Use custom vocabulary to improve recognition of specific terms:

bash
bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"

Full Options

Combine context and hotwords:

bash
bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"

Parameters:

  • audio_file (required): Path to audio file (.wav or .mp3)
  • prompt (optional): Previous transcription or context text (max 8000 chars)
  • hotwords (optional): Comma-separated list of specific terms (max 100 words)

Features

Context Prompts

Why use context prompts:

  • Improves accuracy in long conversations
  • Helps with domain-specific terminology
  • Maintains consistency across multiple segments
When to use:
  • Multi-part conversations or meetings
  • Technical or specialized content
  • Continuing from previous transcriptions
Example:
bash
bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容:讨论了项目进展和下一步计划"

Hotwords

What are hotwords: Custom vocabulary list that boosts recognition accuracy for specific terms.

Best use cases:

  • Proper names (people, places)
  • Domain-specific terminology
  • Company names and products
  • Technical jargon
  • Industry-specific terms
Examples:
bash
# Medical transcription
bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案"

# Business meeting
bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"

# Tech discussion
bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"

Workflow Examples

Transcribe a Meeting

bash
# Part 1
bash scripts/speech_to_text.sh meeting_part1.wav

# Part 2 with context
bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"

# Part 3 with context
bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"

Transcribe a Lecture

bash
bash scripts/speech_to_text.sh lecture.wav "" "教授,课程名称,专业术语1,专业术语2"

Process Multiple Files

bash
for file in recording_*.wav; do
    bash scripts/speech_to_text.sh "$file"
done

Audio Quality Tips

Best practices for accurate transcription:

  • Clear audio source
  • Minimize background noise
  • Use good quality microphone
  • Speak clearly and at moderate pace
  • Optimal audio settings
  • Sample rate: 16000 Hz or higher
  • Bit depth: 16-bit or higher
  • Single channel (mono) is sufficient
  • File preparation
  • Remove silence from beginning/end
  • Normalize audio levels
  • Ensure consistent volume

Output Format

The script outputs JSON with:

  • id: Task ID
  • created: Request timestamp (Unix timestamp)
  • request_id: Unique request identifier
  • model: Model name used
  • text: Transcribed text
Example output:
json
{
  "id": "task-12345",
  "created": 1234567890,
  "request_id": "req-abc123",
  "model": "glm-asr-2512",
  "text": "你好,这是转录的文本内容"
}

Troubleshooting

File Size Issues:

  • Split audio files larger than 25 MB
  • Reduce sample rate or bit depth
  • Use compression (MP3) for smaller files
Duration Issues:
  • Split recordings longer than 30 seconds
  • Process segments separately
  • Use context prompts to maintain continuity
Poor Accuracy:
  • Improve audio quality
  • Use hotwords for specific terms
  • Provide context prompts
  • Ensure clear speech and minimal noise
Format Issues:
  • Ensure file is .wav or .mp3
  • Check file is not corrupted
  • Verify audio can be played by standard players

Limitations

  • Maximum audio duration: 30 seconds per request
  • File size limit: 25 MB
  • Maximum hotwords: 100 terms
  • Context prompt limit: 8000 characters
  • Best performance with Chinese language audio

Performance Notes

  • Typical transcription time: 1-3 seconds
  • Real-time or faster for most audio
  • Processing time scales with audio quality and length

Installation

Terminal bash

openclaw install zhipu-asr
    
Copied!

💻Code Examples

bash scripts/speech_to_text.sh recording.wav

bash-scriptsspeechtotextsh-recordingwav.txt
### Transcription with Context

Provide previous transcription or context for better accuracy:

bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容,有助于提高准确性"

bash-scriptsspeechtotextsh-recordingwav-.txt
### Transcription with Hotwords

Use custom vocabulary to improve recognition of specific terms:

bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"

bash-scriptsspeechtotextsh-recordingmp3--.txt
### Full Options

Combine context and hotwords:

bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"

bash-scriptsspeechtotextsh-recordingwav--.txt
**Parameters:**
- `audio_file` (required): Path to audio file (.wav or .mp3)
- `prompt` (optional): Previous transcription or context text (max 8000 chars)
- `hotwords` (optional): Comma-separated list of specific terms (max 100 words)

## Features

### Context Prompts

**Why use context prompts:**
- Improves accuracy in long conversations
- Helps with domain-specific terminology
- Maintains consistency across multiple segments

**When to use:**
- Multi-part conversations or meetings
- Technical or specialized content
- Continuing from previous transcriptions

**Example:**

bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容:讨论了项目进展和下一步计划"

bash-scriptsspeechtotextsh-part2wav-.txt
### Hotwords

**What are hotwords:**
Custom vocabulary list that boosts recognition accuracy for specific terms.

**Best use cases:**
- Proper names (people, places)
- Domain-specific terminology
- Company names and products
- Technical jargon
- Industry-specific terms

**Examples:**

bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"

bash-scriptsspeechtotextsh-techwav--api.txt
## Workflow Examples

### Transcribe a Meeting

done

done.txt
## Audio Quality Tips

**Best practices for accurate transcription:**

1. **Clear audio source**
   - Minimize background noise
   - Use good quality microphone
   - Speak clearly and at moderate pace

2. **Optimal audio settings**
   - Sample rate: 16000 Hz or higher
   - Bit depth: 16-bit or higher
   - Single channel (mono) is sufficient

3. **File preparation**
   - Remove silence from beginning/end
   - Normalize audio levels
   - Ensure consistent volume

## Output Format

The script outputs JSON with:
- `id`: Task ID
- `created`: Request timestamp (Unix timestamp)
- `request_id`: Unique request identifier
- `model`: Model name used
- `text`: Transcribed text

Example output:
example.sh
# Medical transcription
bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案"

# Business meeting
bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"

# Tech discussion
bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"
example.sh
# Part 1
bash scripts/speech_to_text.sh meeting_part1.wav

# Part 2 with context
bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"

# Part 3 with context
bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"
example.sh
for file in recording_*.wav; do
    bash scripts/speech_to_text.sh "$file"
done

Tags

#ai_and-llms

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author franklu0819-lang
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install zhipu-asr