Voiceai Voiceover Creator
Turn scripts into publishable voiceovers with Voice.ai TTS, including segments, chapters, captions,
- Rating
- 4.3 (391 reviews)
- Downloads
- 47,784 downloads
- Version
- 1.0.0
Overview
Turn scripts into publishable voiceovers with Voice.ai TTS, including segments, chapters, captions, and video muxing.
Complete Documentation
View Source →
Voice.ai Creator Voiceover Pipeline
This skill follows the Agent Skills specification.
Turn any script into a publish-ready voiceover — complete with numbered segments, a stitched master, YouTube chapters, SRT captions, and a beautiful review page. Optionally, replace the audio track on an existing video.
Built for creators who want studio-quality voiceovers without the studio. Powered by Voice.ai.
When to use this skill
| Scenario | Why it fits |
|---|---|
| YouTube long-form | Full narration with chapter markers and captions |
| YouTube Shorts | Quick hooks with the shortform template |
| Podcasts | Consistent host voice, intro/outro templates |
| Course content | Professional narration for educational videos |
| Quick iteration | Smart caching — edit one section, only that segment re-renders |
| Video audio replacement | Drop AI voiceover onto screen recordings or B-roll |
The one-command workflow
Have a script and a video? Turn them into a finished video with AI voiceover in one shot:
node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title "My Video" \
--video ./my-recording.mp4 \
--mux
This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:
out/my-video/muxed.mp4— your video with the new voiceoverout/my-video/master.wav— the standalone audioout/my-video/review.html— listen and review each segmentout/my-video/chapters.txt— YouTube-ready chapter timestampsout/my-video/captions.srt— SRT captions
--sync pad if the audio is shorter than the video, or --sync trim to cut it to match.Requirements
- Node.js 20+ — runtime (no npm install needed — the CLI is a single bundled file)
- VOICE_AI_API_KEY — set as environment variable or in a
.envfile in the skill root. Get a key at voice.ai/dashboard. - ffmpeg (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video muxing. The pipeline still produces individual segments, the review page, chapters, and captions without it.
Configuration
The skill reads VOICE_AI_API_KEY from (in order):
- Environment variable
VOICE_AI_API_KEY - Environment variable
VOICEAI_API_KEY(alternate) .envfile in the skill root
echo 'VOICE_AI_API_KEY=your-key-here' > .env
Use --mock on any command to run the full pipeline without an API key (produces placeholder audio).
Commands
build — Generate a voiceover from a script
node voiceai-vo.cjs build \
--input <script.md or script.txt> \
--voice <voice-alias-or-uuid> \
--title "My Project" \
[--template youtube|podcast|shortform] \
[--language en] \
[--video input.mp4 --mux --sync shortest] \
[--force] [--mock]
What it does:
- Reads the script and splits it into segments (by
##headings for.md, or by sentence boundaries for.txt) - Optionally prepends/appends template intro/outro segments
- Renders each segment via Voice.ai TTS as a numbered WAV file
- Stitches a master audio file (if ffmpeg is available)
- Generates chapters, captions, a review page, and metadata files
- Optionally muxes the voiceover into an existing video
| Option | Description |
|---|---|
| -i, --input | Script file (.txt or .md) — required |
| -v, --voice | Voice alias or UUID — required |
| -t, --title | Project title (defaults to filename) |
| --template | youtube, podcast, or shortform |
| --mode | headings or auto (default: headings for .md) |
| --max-chars | Max characters per auto-chunk (default: 1500) |
--language | Language code (default: en) |
| --video | Input video for muxing |
| --mux | Enable video muxing (requires --video) |
| --sync | shortest, pad, or trim (default: shortest) |
| --force | Re-render all segments (ignore cache) |
| --mock | Mock mode — no API calls, placeholder audio |
| -o, --out | Custom output directory |
replace-audio — Swap the audio track on a video
node voiceai-vo.cjs replace-audio \
--video ./input.mp4 \
--audio ./out/my-project/master.wav \
[--out ./out/my-project/muxed.mp4] \
[--sync shortest|pad|trim]
Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.
| Sync policy | Behavior |
|---|---|
| shortest (default) | Output ends when the shorter track ends |
| pad | Pad audio with silence to match video duration |
| trim | Trim audio to match video duration |
-c:v copy). Audio is encoded as AAC. A mux report is saved alongside the output.Privacy: Video processing is entirely local. Only script text is sent to Voice.ai for TTS.
voices — List available voices
node voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock]
Available voices
Use short aliases or full UUIDs with --voice:
| Alias | Voice | Gender | Style |
|---|---|---|---|
| ellie | Ellie | F | Youthful, vibrant vlogger |
| oliver | Oliver | M | Friendly British |
| lilith | Lilith | F | Soft, feminine |
| smooth | Smooth Calm Voice | M | Deep, smooth narrator |
| corpse | Corpse Husband | M | Deep, distinctive |
| skadi | Skadi | F | Anime character |
| zhongli | Zhongli | M | Deep, authoritative |
| flora | Flora | F | Cheerful, high pitch |
| chief | Master Chief | M | Heroic, commanding |
voices command also returns any additional voices available on the API. Voice list is cached for 10 minutes.Build outputs
After a build, the output directory contains:
out/<title-slug>/
segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …)
master.wav # Stitched audio (requires ffmpeg)
master.mp3 # MP3 encode (requires ffmpeg)
manifest.json # Build metadata: voice, template, segment list, hashes
timeline.json # Segment durations and start times
review.html # Interactive review page with audio players
chapters.txt # YouTube-friendly chapter timestamps
captions.srt # SRT captions using segment boundaries
description.txt # YouTube description with chapters + Voice.ai credit
review.html
A standalone HTML page with:
- Master audio player (if stitched)
- Individual segment players with titles and durations
- Collapsible script text for each segment
- Regeneration command hints
Templates
Templates auto-inject intro/outro segments around the script content:
| Template | Prepends | Appends |
|---|---|---|
| youtube | templates/youtube_intro.txt | templates/youtube_outro.txt |
| podcast | templates/podcast_intro.txt | — |
| shortform | templates/shortform_hook.txt | — |
templates/ to customize the intro/outro text.Caching
Segments are cached by a hash of: text content + voice ID + language.
- Unchanged segments are skipped on rebuild — fast iteration
- Modified segments are re-rendered automatically
- Use
--forceto re-render everything - Cache manifest is stored in
segments/.cache.json
Multilingual support
Voice.ai supports 11 languages. Use --language to switch:
en, es, fr, de, it, pt, pl, ru, nl, sv, ca
The pipeline auto-selects the multilingual TTS model for non-English languages.
Troubleshooting
| Issue | Solution |
|---|---|
| ffmpeg missing | Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for master stitching and video muxing. |
| Rate limits (429) | Segments render sequentially, which stays under most limits. Wait and retry. |
| Insufficient credits (402) | Top up at voice.ai/dashboard. Cached segments won't re-use credits on retry. |
| Long scripts | Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls. |
| Windows paths | Wrap paths with spaces in quotes: --input "C:\My Scripts\script.md" |
references/TROUBLESHOOTING.md for more.References
- Agent Skills Specification
- Voice.ai
references/VOICEAI_API.md— API endpoints, audio formats, modelsreferences/TROUBLESHOOTING.md— Common issues and fixes
Installation
openclaw install voiceai-voiceover-creator
💻Code Examples
--mux
This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:
- `out/my-video/muxed.mp4` — your video with the new voiceover
- `out/my-video/master.wav` — the standalone audio
- `out/my-video/review.html` — listen and review each segment
- `out/my-video/chapters.txt` — YouTube-ready chapter timestamps
- `out/my-video/captions.srt` — SRT captions
Use `--sync pad` if the audio is shorter than the video, or `--sync trim` to cut it to match.
---
## Requirements
- **Node.js 20+** — runtime (no npm install needed — the CLI is a single bundled file)
- **VOICE_AI_API_KEY** — set as environment variable or in a `.env` file in the skill root. Get a key at [voice.ai/dashboard](https://voice.ai/dashboard).
- **ffmpeg** (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video muxing. The pipeline still produces individual segments, the review page, chapters, and captions without it.
---
## Configuration
The skill reads `VOICE_AI_API_KEY` from (in order):
1. Environment variable `VOICE_AI_API_KEY`
2. Environment variable `VOICEAI_API_KEY` (alternate)
3. `.env` file in the skill rootecho 'VOICE_AI_API_KEY=your-key-here' > .env
Use `--mock` on any command to run the full pipeline without an API key (produces placeholder audio).
---
## Commands
### `build` — Generate a voiceover from a script[--force] [--mock]
**What it does:**
1. Reads the script and splits it into segments (by `##` headings for `.md`, or by sentence boundaries for `.txt`)
2. Optionally prepends/appends template intro/outro segments
3. Renders each segment via Voice.ai TTS as a numbered WAV file
4. Stitches a master audio file (if ffmpeg is available)
5. Generates chapters, captions, a review page, and metadata files
6. Optionally muxes the voiceover into an existing video
**Full options:**
| Option | Description |
|---|---|
| `-i, --input <path>` | Script file (.txt or .md) — **required** |
| `-v, --voice <id>` | Voice alias or UUID — **required** |
| `-t, --title <title>` | Project title (defaults to filename) |
| `--template <name>` | `youtube`, `podcast`, or `shortform` |
| `--mode <mode>` | `headings` or `auto` (default: headings for .md) |
| `--max-chars <n>` | Max characters per auto-chunk (default: 1500) |
| `--language <code>` | Language code (default: en) |
| `--video <path>` | Input video for muxing |
| `--mux` | Enable video muxing (requires --video) |
| `--sync <policy>` | `shortest`, `pad`, or `trim` (default: shortest) |
| `--force` | Re-render all segments (ignore cache) |
| `--mock` | Mock mode — no API calls, placeholder audio |
| `-o, --out <dir>` | Custom output directory |
### `replace-audio` — Swap the audio track on a video[--sync shortest|pad|trim]
Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.
| Sync policy | Behavior |
|---|---|
| `shortest` (default) | Output ends when the shorter track ends |
| `pad` | Pad audio with silence to match video duration |
| `trim` | Trim audio to match video duration |
Video stream is copied without re-encoding (`-c:v copy`). Audio is encoded as AAC. A mux report is saved alongside the output.
**Privacy:** Video processing is entirely local. Only script text is sent to Voice.ai for TTS.
### `voices` — List available voicesnode voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock]
---
## Available voices
Use short aliases or full UUIDs with `--voice`:
| Alias | Voice | Gender | Style |
|----------|----------------------|--------|--------------------------|
| `ellie` | Ellie | F | Youthful, vibrant vlogger|
| `oliver` | Oliver | M | Friendly British |
| `lilith` | Lilith | F | Soft, feminine |
| `smooth` | Smooth Calm Voice | M | Deep, smooth narrator |
| `corpse` | Corpse Husband | M | Deep, distinctive |
| `skadi` | Skadi | F | Anime character |
| `zhongli`| Zhongli | M | Deep, authoritative |
| `flora` | Flora | F | Cheerful, high pitch |
| `chief` | Master Chief | M | Heroic, commanding |
The `voices` command also returns any additional voices available on the API. Voice list is cached for 10 minutes.
---
## Build outputs
After a build, the output directory contains:node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title "My Video" \
--video ./my-recording.mp4 \
--muxnode voiceai-vo.cjs build \
--input <script.md or script.txt> \
--voice <voice-alias-or-uuid> \
--title "My Project" \
[--template youtube|podcast|shortform] \
[--language en] \
[--video input.mp4 --mux --sync shortest] \
[--force] [--mock]node voiceai-vo.cjs replace-audio \
--video ./input.mp4 \
--audio ./out/my-project/master.wav \
[--out ./out/my-project/muxed.mp4] \
[--sync shortest|pad|trim]out/<title-slug>/
segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …)
master.wav # Stitched audio (requires ffmpeg)
master.mp3 # MP3 encode (requires ffmpeg)
manifest.json # Build metadata: voice, template, segment list, hashes
timeline.json # Segment durations and start times
review.html # Interactive review page with audio players
chapters.txt # YouTube-friendly chapter timestamps
captions.srt # SRT captions using segment boundaries
description.txt # YouTube description with chapters + Voice.ai credit⚙️Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
VOICE_AI_API_KEY | string | your-key-here | - |
Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.