Audio Conductor
Intelligently dispatches requests to the appropriate audio generation model (Music, Sound Effects, o
- Rating
- 4.3 (373 reviews)
- Downloads
- 1,456 downloads
- Version
- 1.0.0
Overview
Intelligently dispatches requests to the appropriate audio generation model (Music, Sound Effects, or TTS)
Complete Documentation
View Source →
Audio Conductor
This skill acts as an intelligent, unified dispatcher for all audio generation tasks. It analyzes a user's request and routes it to the most appropriate specialized model, whether it's for Music, Sound Effects (SFX), or Text-to-Speech (TTS).
When to Use
Use this skill as the primary entry point for any audio-related request. Instead of deciding which specific model or skill to call, simply describe the desired audio output.
- User says: "Create a background track for my video."
- User says: "I need a sound effect of a door creaking."
- User says: "Generate a voice-over for this script."
elevenlabs-mcp-server skill, which must be loaded in the agent's environment.Core Principles
- Unified Interface: Provides a single, consistent API for all audio generation, simplifying agent-level logic.
- Intelligent Routing: Automatically determines the
audio_type(music, sfx, tts) from the user's prompt. - Model Abstraction: Hides the complexity of individual model APIs (e.g., ElevenLabs Music vs. SFX vs. TTS), allowing for easier maintenance and upgrades.
- Structured Input/Output: Works with a standardized
AudioRequestinput and provides a predictableAudioOutput.
Workflow
- Receive Audio Request: The skill is triggered with a prompt and optional parameters.
- Analyze & Route: It analyzes the
promptto determine theaudio_type. See theaudio-routing-kb.mdfor detailed classification logic. - If the prompt describes melodies, moods, or instruments ->
music - If the prompt describes an event, action, or ambient noise ->
sfx - If the prompt contains a clear script to be spoken ->
tts - Delegate to Sub-Skill: Based on the
audio_type, it calls the appropriate internal generation skill (which is an implementation detail hidden from the end-user). music-> Calls themusic-generatorskill with aComposition Plan.sfx-> Calls thesfx-generatorskill.tts-> Calls thetts-generatorskill (e.g.,produce-final-narration).- Standardize Output: It receives the generated audio file and metadata from the sub-skill and formats it into a standard
AudioOutputobject, including the URL, duration, and type. - Return Result: The final, standardized output is returned to the calling agent.
Input: AudioRequest Schema
prompt: "A short, tense, cinematic swell, building in intensity."
# Optional parameters to refine the request
params:
duration_ms: 5000
# For TTS
voice_id: "21m00Tcm4TlvDq8ikWAM"
# For Music
instrumental: true
Output: AudioOutput Schema
audio_output:
audio_type: "sfx" # or "music", "tts"
audio_file: "cinematic_swell_tense.wav"
audio_url: "https://..."
duration_ms: 4980
model_used: "elevenlabs_sfx_v1"
request_details:
prompt: "A short, tense, cinematic swell, building in intensity."
References
- audio-routing-kb.md: Defines the logic and keywords used to classify an audio request into
music,sfx, ortts. - model-capabilities-kb.md: A knowledge base detailing the specific APIs and parameters for each underlying audio model (e.g., ElevenLabs, Suno, etc.).
Installation
openclaw install audio-conductor
💻Code Examples
prompt: "A short, tense, cinematic swell, building in intensity."
# Optional parameters to refine the request
params:
duration_ms: 5000
# For TTS
voice_id: "21m00Tcm4TlvDq8ikWAM"
# For Music
instrumental: trueaudio_output:
audio_type: "sfx" # or "music", "tts"
audio_file: "cinematic_swell_tense.wav"
audio_url: "https://..."
duration_ms: 4980
model_used: "elevenlabs_sfx_v1"
request_details:
prompt: "A short, tense, cinematic swell, building in intensity."Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.