Video Analyzer Skill
Download, transcribe, and analyze videos from YouTube, X/Twitter, and TikTok with local Whisper proc
- Rating
- 4.5 (444 reviews)
- Downloads
- 1,383 downloads
- Version
- 1.0.0
Overview
Download, transcribe, and analyze videos from YouTube, X/Twitter, and TikTok with local Whisper processing.
Complete Documentation
View Source →
Video Analyzer 🎥
A tool to download, transcribe, and analyze videos from any platform using a smart two-tier system (yt-dlp for fast subtitles, local whisper-cpp for robust fallback).
How to Use
When the user asks you to summarize, transcribe, or download a video/audio from a URL, use the bundled python script:
uv run {baseDir}/scripts/analyze_video.py --action <ACTION> --url "<URL>" [--quality <normal|max>] [--lang <en|it|etc>]
Supported Actions:
transcript: Extracts the text with timestamps. Use this when the user asks for a summary or transcript.download-video: Downloads the video as MP4 to the Desktop.download-audio: Downloads the audio as M4A/MP3 to the Desktop.
Analyzing a Video (IMPORTANT)
If the user asks for a summary, analysis, or key moments:
- Run the script with
--action transcript. - The script will output the path to a
.txtfile containing the timestamped transcript. - Read that file.
- Output your response EXACTLY in this Markdown format:
## 📝 TL;DR
[A punchy 3-sentence summary of the video's core message]
## ⏱️ Key Moments
- [MM:SS] [Brief description of what is discussed]
- [MM:SS] [Brief description of what is discussed]
- [MM:SS] [Brief description of what is discussed]
*(Extract 3 to 7 key moments depending on video length)*
## 💡 Actionable Insights
1. [Practical takeaway 1]
2. [Practical takeaway 2]
3. [Practical takeaway 3]
---
Local Whisper Quality
If the script needs to fall back to Whisper (e.g., for X/Twitter videos), it usesnormal by default:
normal: Fast (~1 min for 30 min video) — Defaultmax: Best quality (~5 min for 30 min video) — use--quality maxwhen accuracy is critical
Multilingual Support
All Whisper models are multilingual by default. The skill can transcribe videos in any language (Italian, Spanish, Japanese, etc.).IMPORTANT: Always respond to the user in THEIR language, not the video's language. If the user speaks Italian but sends an English video, give them the summary in Italian.
Finding specific moments
The transcript includes precise timestamps like[05:53] text.... If the user asks "When do they talk about X?", grep the transcript and return the exact timestamp from the file.
Installation
openclaw install video-analyzer-skill
💻Code Examples
uv run {baseDir}/scripts/analyze_video.py --action <ACTION> --url "<URL>" [--quality <normal|max>] [--lang <en|it|etc>]
### Supported Actions:
- `transcript`: Extracts the text with timestamps. **Use this when the user asks for a summary or transcript.**
- `download-video`: Downloads the video as MP4 to the Desktop.
- `download-audio`: Downloads the audio as M4A/MP3 to the Desktop.
### Analyzing a Video (IMPORTANT)
If the user asks for a **summary, analysis, or key moments**:
1. Run the script with `--action transcript`.
2. The script will output the path to a `.txt` file containing the timestamped transcript.
3. Read that file.
4. Output your response **EXACTLY** in this Markdown format:## 📝 TL;DR
[A punchy 3-sentence summary of the video's core message]
## ⏱️ Key Moments
- [MM:SS] [Brief description of what is discussed]
- [MM:SS] [Brief description of what is discussed]
- [MM:SS] [Brief description of what is discussed]
*(Extract 3 to 7 key moments depending on video length)*
## 💡 Actionable Insights
1. [Practical takeaway 1]
2. [Practical takeaway 2]
3. [Practical takeaway 3]
---Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.