✓ Verified
💻 Development
✓ Enhanced Data
Video Understanding
Analyze videos with Google Gemini multimodal AI.
- Rating
- 3.8 (50 reviews)
- Downloads
- 5,588 downloads
- Version
- 1.0.0
Overview
Analyze videos with Google Gemini multimodal AI.
Complete Documentation
View Source →
Video Understanding (Gemini)
Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.
Requirements
yt-dlp—brew install yt-dlp/pip install yt-dlpffmpeg—brew install ffmpeg(for merging video+audio streams)GEMINI_API_KEYenvironment variable
Default Output
Returns structured JSON:
- transcript — Verbatim transcript with
[MM:SS]timestamps - description — Visual description (people, setting, UI, text on screen, flow)
- summary — 2-3 sentence summary
- duration_seconds — Estimated duration
- speakers — Identified speakers
Usage
Analyze a video (structured JSON output)
bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>"
Ask a question (adds "answer" field)
bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"
Override prompt entirely
bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw
Download only (no analysis)
bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4
Options
| Flag | Description | Default |
|---|---|---|
| -q / --question | Question to answer (added to default fields) | none |
| -p / --prompt | Override entire prompt (ignores -q) | structured JSON |
| -m / --model | Gemini model | gemini-2.5-flash |
| -o / --output | Save output to file | stdout |
| --keep | Keep downloaded video file | false |
| --download-only | Download only, skip analysis | false |
| --max-size | Max file size in MB | 500 |
| --raw | Raw text output instead of JSON | false |
How It Works
- YouTube URLs → Passed directly to Gemini (no download needed)
- All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
- Gemini analyzes video with structured prompt → returns JSON
- Temp files and Gemini uploads cleaned up automatically
Supported Sources
Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.
Tips
- Use
-qfor targeted questions on top of the full analysis - YouTube is fastest (no download step)
- Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
- The script auto-installs Python dependencies via
uv
Installation
Terminal bash
openclaw install video-understanding
Copied!
Tags
#coding_agents-and-ides
Quick Info
Category Development
Model Gemini 2.0
Complexity One-Click
Author bill492
Last Updated 3/10/2026
🚀
Optimized for
Gemini 2.0
Ready to Install?
Get started with this skill in seconds
openclaw install video-understanding
Related Skills
✓ Verified
💻 Development
4claw
4claw — a moderated imageboard for AI agents.
🧠 Claude-Ready
)}
★ 4.4 (118)
↓ 4,990
v1.0.0
✓ Verified
💻 Development
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
🧠 Claude-Ready
)}
★ 4.3 (89)
↓ 4,621
v1.0.0
✓ Verified
💻 Development
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
⚡ GPT-Optimized
)}
★ 3.8 (274)
↓ 17,648
v1.0.0
✓ Verified
💻 Development
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
🧠 Claude-Ready
)}
★ 4.7 (88)
↓ 1,625
v1.0.0