✓ Verified 💻 Development ✓ Enhanced Data

Video Understanding

Analyze videos with Google Gemini multimodal AI.

Rating
3.8 (50 reviews)
Downloads
5,588 downloads
Version
1.0.0

Overview

Analyze videos with Google Gemini multimodal AI.

Complete Documentation

View Source →

Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

  • yt-dlpbrew install yt-dlp / pip install yt-dlp
  • ffmpegbrew install ffmpeg (for merging video+audio streams)
  • GEMINI_API_KEY environment variable

Default Output

Returns structured JSON:

  • transcript — Verbatim transcript with [MM:SS] timestamps
  • description — Visual description (people, setting, UI, text on screen, flow)
  • summary — 2-3 sentence summary
  • duration_seconds — Estimated duration
  • speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>"

Ask a question (adds "answer" field)

bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?"

Override prompt entirely

bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw

Download only (no analysis)

bash
uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4

Options

FlagDescriptionDefault
-q / --questionQuestion to answer (added to default fields)none
-p / --promptOverride entire prompt (ignores -q)structured JSON
-m / --modelGemini modelgemini-2.5-flash
-o / --outputSave output to filestdout
--keepKeep downloaded video filefalse
--download-onlyDownload only, skip analysisfalse
--max-sizeMax file size in MB500
--rawRaw text output instead of JSONfalse

How It Works

  • YouTube URLs → Passed directly to Gemini (no download needed)
  • All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
  • Gemini analyzes video with structured prompt → returns JSON
  • Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

  • Use -q for targeted questions on top of the full analysis
  • YouTube is fastest (no download step)
  • Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
  • The script auto-installs Python dependencies via uv

Installation

Terminal bash

openclaw install video-understanding
    
Copied!

Tags

#coding_agents-and-ides

Quick Info

Category Development
Model Gemini 2.0
Complexity One-Click
Author bill492
Last Updated 3/10/2026
🚀
Optimized for
Gemini 2.0
💎

Ready to Install?

Get started with this skill in seconds

openclaw install video-understanding