✓ Verified 💻 Development ✓ Enhanced Data

Audio Conductor

Intelligently dispatches requests to the appropriate audio generation model (Music, Sound Effects, o

Rating: 4.3 (373 reviews)
Downloads: 1,456 downloads
Version: 1.0.0

Overview

Intelligently dispatches requests to the appropriate audio generation model (Music, Sound Effects, or TTS)

Complete Documentation

View Source →

Audio Conductor

This skill acts as an intelligent, unified dispatcher for all audio generation tasks. It analyzes a user's request and routes it to the most appropriate specialized model, whether it's for Music, Sound Effects (SFX), or Text-to-Speech (TTS).

When to Use

Use this skill as the primary entry point for any audio-related request. Instead of deciding which specific model or skill to call, simply describe the desired audio output.

User says: "Create a background track for my video."
User says: "I need a sound effect of a door creaking."
User says: "Generate a voice-over for this script."

This skill is designed to be called by high-level agent platforms like OpenClaw. It relies on the tools provided by the elevenlabs-mcp-server skill, which must be loaded in the agent's environment.

Core Principles

Unified Interface: Provides a single, consistent API for all audio generation, simplifying agent-level logic.
Intelligent Routing: Automatically determines the audio_type (music, sfx, tts) from the user's prompt.
Model Abstraction: Hides the complexity of individual model APIs (e.g., ElevenLabs Music vs. SFX vs. TTS), allowing for easier maintenance and upgrades.
Structured Input/Output: Works with a standardized AudioRequest input and provides a predictable AudioOutput.

Workflow

Receive Audio Request: The skill is triggered with a prompt and optional parameters.
Analyze & Route: It analyzes the prompt to determine the audio_type. See the audio-routing-kb.md for detailed classification logic.
If the prompt describes melodies, moods, or instruments -> music
If the prompt describes an event, action, or ambient noise -> sfx
If the prompt contains a clear script to be spoken -> tts
Delegate to Sub-Skill: Based on the audio_type, it calls the appropriate internal generation skill (which is an implementation detail hidden from the end-user).
music -> Calls the music-generator skill with a Composition Plan.
sfx -> Calls the sfx-generator skill.
tts -> Calls the tts-generator skill (e.g., produce-final-narration).
Standardize Output: It receives the generated audio file and metadata from the sub-skill and formats it into a standard AudioOutput object, including the URL, duration, and type.
Return Result: The final, standardized output is returned to the calling agent.

Input: `AudioRequest` Schema

yaml

prompt: "A short, tense, cinematic swell, building in intensity."
# Optional parameters to refine the request
params:
  duration_ms: 5000
  # For TTS
  voice_id: "21m00Tcm4TlvDq8ikWAM"
  # For Music
  instrumental: true

Output: `AudioOutput` Schema

yaml

audio_output:
  audio_type: "sfx" # or "music", "tts"
  audio_file: "cinematic_swell_tense.wav"
  audio_url: "https://..."
  duration_ms: 4980
  model_used: "elevenlabs_sfx_v1"
  request_details:
    prompt: "A short, tense, cinematic swell, building in intensity."

References

audio-routing-kb.md: Defines the logic and keywords used to classify an audio request into music, sfx, or tts.
model-capabilities-kb.md: A knowledge base detailing the specific APIs and parameters for each underlying audio model (e.g., ElevenLabs, Suno, etc.).

Installation

Terminal bash


openclaw install audio-conductor

Copied!

💻Code Examples

example.yml

prompt: "A short, tense, cinematic swell, building in intensity."
# Optional parameters to refine the request
params:
  duration_ms: 5000
  # For TTS
  voice_id: "21m00Tcm4TlvDq8ikWAM"
  # For Music
  instrumental: true

example.yml

audio_output:
  audio_type: "sfx" # or "music", "tts"
  audio_file: "cinematic_swell_tense.wav"
  audio_url: "https://..."
  duration_ms: 4980
  model_used: "elevenlabs_sfx_v1"
  request_details:
    prompt: "A short, tense, cinematic swell, building in intensity."