✓ Verified
💻 Development
✓ Enhanced Data
Vision Tagger
Tag and annotate images using Apple Vision framework (macOS only)
- Rating
- 4.1 (144 reviews)
- Downloads
- 874 downloads
- Version
- 1.0.0
Overview
Tag and annotate images using Apple Vision framework (macOS only)
Complete Documentation
View Source →
Vision Tagger
macOS-native image analysis using Apple's Vision framework. All processing is local — no cloud APIs, no API keys needed.
Requirements
- macOS 12+ (Monterey or later)
- Xcode Command Line Tools
- Python 3 with Pillow
Setup (one-time)
bash
# Install Xcode CLI tools if needed
xcode-select --install
# Install Pillow
pip3 install Pillow
# Compile the Swift binary
cd scripts/
swiftc -O -o image_tagger image_tagger.swift
Usage
Analyze image → JSON
bash
./scripts/image_tagger /path/to/photo.jpg
Output includes:
faces— bounding boxes, roll/yaw/pitch, landmarks (eyes, nose, mouth)bodies— 18 skeleton joints with confidence scoreshands— 21 joints per hand (left/right)text— OCR results with bounding boxeslabels— scene classification (desk, outdoor, clothing, etc.)barcodes— QR codes, UPC, etc.saliency— attention and objectness regions
Annotate image with boxes
bash
python3 scripts/annotate_image.py photo.jpg output.jpg
Draws colored boxes:
- 🟢 Green: faces
- 🟠 Orange: body skeleton
- 🟣 Magenta: hands
- 🔵 Cyan: text regions
- 🟡 Yellow: rectangles/objects
- Scene labels at bottom
Python integration
python
import subprocess, json
def analyze(path):
r = subprocess.run(['./scripts/image_tagger', path], capture_output=True, text=True)
return json.loads(r.stdout[r.stdout.find('{'):])
tags = analyze('photo.jpg')
print(tags['labels']) # [{'label': 'desk', 'confidence': 0.85}, ...]
print(tags['faces']) # [{'bbox': {...}, 'confidence': 0.99, 'yaw': 5.2}]
Example JSON Output
json
{
"dimensions": {"width": 1920, "height": 1080},
"faces": [{"bbox": {"x": 0.3, "y": 0.4, "width": 0.15, "height": 0.2}, "confidence": 0.99, "roll": -2, "yaw": 5}],
"bodies": [{"joints": {"head_joint": {"x": 0.5, "y": 0.7, "confidence": 0.9}, "left_shoulder": {...}}, "confidence": 1}],
"hands": [{"chirality": "left", "joints": {"VNHLKWRI": {"x": 0.4, "y": 0.3, "confidence": 0.85}}}],
"text": [{"text": "HELLO", "confidence": 0.95, "bbox": {...}}],
"labels": [{"label": "outdoor", "confidence": 0.88}, {"label": "sky", "confidence": 0.75}],
"saliency": {"attentionBased": [{"x": 0.2, "y": 0.1, "width": 0.6, "height": 0.8}]}
}
Detection Capabilities
| Feature | Details |
|---|---|
| Faces | Bounding box, confidence, roll/yaw/pitch angles, 76-point landmarks |
| Bodies | 18 joints: head, neck, shoulders, elbows, wrists, hips, knees, ankles |
| Hands | 21 joints per hand, left/right chirality |
| Text (OCR) | Recognized text with confidence and bounding boxes |
| Labels | 1000+ scene/object categories (clothing, furniture, outdoor, etc.) |
| Barcodes | QR, UPC, EAN, Code128, PDF417, Aztec, DataMatrix |
| Saliency | Attention-based and objectness-based regions |
Use Cases
- Photo tagging — Auto-tag photos with detected objects/scenes
- Posture monitoring — Track face/body position for ergonomics
- Document scanning — Extract text from images
- Security — Detect people in camera feeds
- Accessibility — Describe image contents
Installation
Terminal bash
openclaw install vision-tagger
Copied!
💻Code Examples
swiftc -O -o image_tagger image_tagger.swift
swiftc--o--o-imagetagger-imagetaggerswift.txt
## Usage
### Analyze image → JSON./scripts/image_tagger /path/to/photo.jpg
scriptsimagetagger-pathtophotojpg.txt
Output includes:
- `faces` — bounding boxes, roll/yaw/pitch, landmarks (eyes, nose, mouth)
- `bodies` — 18 skeleton joints with confidence scores
- `hands` — 21 joints per hand (left/right)
- `text` — OCR results with bounding boxes
- `labels` — scene classification (desk, outdoor, clothing, etc.)
- `barcodes` — QR codes, UPC, etc.
- `saliency` — attention and objectness regions
### Annotate image with boxespython3 scripts/annotate_image.py photo.jpg output.jpg
python3-scriptsannotateimagepy-photojpg-outputjpg.txt
Draws colored boxes:
- 🟢 Green: faces
- 🟠 Orange: body skeleton
- 🟣 Magenta: hands
- 🔵 Cyan: text regions
- 🟡 Yellow: rectangles/objects
- Scene labels at bottom
### Python integrationexample.sh
# Install Xcode CLI tools if needed
xcode-select --install
# Install Pillow
pip3 install Pillow
# Compile the Swift binary
cd scripts/
swiftc -O -o image_tagger image_tagger.swiftexample.py
import subprocess, json
def analyze(path):
r = subprocess.run(['./scripts/image_tagger', path], capture_output=True, text=True)
return json.loads(r.stdout[r.stdout.find('{'):])
tags = analyze('photo.jpg')
print(tags['labels']) # [{'label': 'desk', 'confidence': 0.85}, ...]
print(tags['faces']) # [{'bbox': {...}, 'confidence': 0.99, 'yaw': 5.2}]example.json
{
"dimensions": {"width": 1920, "height": 1080},
"faces": [{"bbox": {"x": 0.3, "y": 0.4, "width": 0.15, "height": 0.2}, "confidence": 0.99, "roll": -2, "yaw": 5}],
"bodies": [{"joints": {"head_joint": {"x": 0.5, "y": 0.7, "confidence": 0.9}, "left_shoulder": {...}}, "confidence": 1}],
"hands": [{"chirality": "left", "joints": {"VNHLKWRI": {"x": 0.4, "y": 0.3, "confidence": 0.85}}}],
"text": [{"text": "HELLO", "confidence": 0.95, "bbox": {...}}],
"labels": [{"label": "outdoor", "confidence": 0.88}, {"label": "sky", "confidence": 0.75}],
"saliency": {"attentionBased": [{"x": 0.2, "y": 0.1, "width": 0.6, "height": 0.8}]}
}Tags
#devops_and-cloud
Quick Info
Category Development
Model Claude 3.5
Complexity One-Click
Author sagarjhaa
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
Ready to Install?
Get started with this skill in seconds
openclaw install vision-tagger
Related Skills
✓ Verified
💻 Development
4claw
4claw — a moderated imageboard for AI agents.
🧠 Claude-Ready
)}
★ 4.4 (118)
↓ 4,990
v1.0.0
✓ Verified
💻 Development
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
🧠 Claude-Ready
)}
★ 4.3 (89)
↓ 4,621
v1.0.0
✓ Verified
💻 Development
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
⚡ GPT-Optimized
)}
★ 3.8 (274)
↓ 17,648
v1.0.0
✓ Verified
💻 Development
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
🧠 Claude-Ready
)}
★ 4.7 (88)
↓ 1,625
v1.0.0