Skillbench
Track skill versions, benchmark performance, compare improvements, and get self-improvement signals.
- Rating
- 4.3 (135 reviews)
- Downloads
- 1,272 downloads
- Version
- 1.0.0
Overview
Track skill versions, benchmark performance, compare improvements, and get self-improvement signals.
Complete Documentation
View Source →
skillbench Skill
Self-improving skill ecosystem for AI agents.
Track skill versions, benchmark performance, compare improvements, and get signals on what to fix next.
Part of the ClawVault ecosystem | tasktime | ClawHub
Installation
npm install -g @versatly/skillbench
The Loop
1. Use a skill → skillbench use [email protected]
2. Do the task → tt start "Create PR" && ... && tt stop
3. Record result → skillbench record "Create PR" --success
4. Check scores → skillbench score github
5. Improve skill → Update skill, bump version
6. Repeat → Compare v1.0.0 vs v1.1.0
Commands
Track Skills
skillbench use [email protected] # Set active skill version
skillbench skills # List tracked skills + signals
Record Benchmarks
# Auto-pulls duration from tasktime
skillbench record "Create PR" --success
# Manual duration
skillbench record "Create PR" --duration 45s --success
# Record failures
skillbench record "Create PR" --fail --error-type "auth-error"
Score & Compare
skillbench score # All skills with grades
skillbench score github # Single skill
skillbench compare [email protected] [email protected]
Export & Dashboard
skillbench export --format markdown
skillbench export --format json
skillbench dashboard # Generate HTML dashboard
skillbench dashboard --open # Generate and open in browser
Automated Testing
skillbench test [email protected] # Run smoke test
skillbench test [email protected] --suite full # Run named suite
skillbench test [email protected] --dry-run # Test without recording
Sync
skillbench sync --clawhub # Import installed skills
skillbench sync --vault # Sync to ClawVault
skillbench sync --all # Everything
Health & Monitoring
skillbench health # Overall health report with alerts
skillbench watch --once # Run all test suites once
skillbench watch --interval 300 # Continuous monitoring every 5 min
Analysis & Improvement
skillbench improve # Get suggestions for weakest skill
skillbench improve github # Improvement plan for specific skill
skillbench trend tasktime --days 30 # Performance trend over time
skillbench leaderboard # Compare agents (multi-agent setups)
skillbench schedule --interval 60 # Generate cron config for auto-testing
Baselines & Regression Detection
skillbench baseline tasktime --set # Set baseline from current performance
skillbench baseline --list # List all baselines
skillbench baseline --check # Check all baselines (CI-friendly, exits 1 if failing)
skillbench baseline tasktime --remove # Remove a baseline
CI/CD Integration
skillbench ci # Run all tests + baseline checks
skillbench ci --json # JSON output for automation
skillbench badge # Generate shields.io badges for README
Copy examples/github-action.yml for ready-to-use GitHub Actions workflow.
Grading System
| Grade | Score | Meaning |
|---|---|---|
| 🏆 A+ | 95-100 | Elite performance |
| ✅ A | 85-94 | Excellent |
| 👍 B | 70-84 | Good |
| ⚠️ C | 50-69 | Needs work |
| ❌ D | <50 | Broken |
tasktime Integration
When you omit --duration, skillbench pulls from tasktime:
tt start "Create PR" -c git
# ... do work ...
tt stop
skillbench record --success # Duration auto-pulled
ClawVault Integration
Benchmarks sync to ClawVault automatically.
Improvement Signals
skillbench skills shows:
- ⚠️ needs work — Success rate below 70%
- 🕐 stale — No benchmarks in 7+ days
- ↘️ declining — Getting worse over time
Related
Installation
openclaw install skillbench
💻Code Examples
6. Repeat → Compare v1.0.0 vs v1.1.0
## Commands
### Track Skillsskillbench badge # Generate shields.io badges for README
Copy `examples/github-action.yml` for ready-to-use GitHub Actions workflow.
## Grading System
| Grade | Score | Meaning |
|-------|-------|---------|
| 🏆 A+ | 95-100 | Elite performance |
| ✅ A | 85-94 | Excellent |
| 👍 B | 70-84 | Good |
| ⚠️ C | 50-69 | Needs work |
| ❌ D | <50 | Broken |
Based on: Success Rate (40%), Avg Duration (30%), Consistency (20%), Trend (10%)
## tasktime Integration
When you omit `--duration`, skillbench pulls from [tasktime](https://clawhub.com/skills/tasktime):1. Use a skill → skillbench use [email protected]
2. Do the task → tt start "Create PR" && ... && tt stop
3. Record result → skillbench record "Create PR" --success
4. Check scores → skillbench score github
5. Improve skill → Update skill, bump version
6. Repeat → Compare v1.0.0 vs v1.1.0# Auto-pulls duration from tasktime
skillbench record "Create PR" --success
# Manual duration
skillbench record "Create PR" --duration 45s --success
# Record failures
skillbench record "Create PR" --fail --error-type "auth-error"skillbench score # All skills with grades
skillbench score github # Single skill
skillbench compare [email protected] [email protected]skillbench export --format markdown
skillbench export --format json
skillbench dashboard # Generate HTML dashboard
skillbench dashboard --open # Generate and open in browserskillbench test [email protected] # Run smoke test
skillbench test [email protected] --suite full # Run named suite
skillbench test [email protected] --dry-run # Test without recordingskillbench sync --clawhub # Import installed skills
skillbench sync --vault # Sync to ClawVault
skillbench sync --all # Everythingskillbench health # Overall health report with alerts
skillbench watch --once # Run all test suites once
skillbench watch --interval 300 # Continuous monitoring every 5 minskillbench improve # Get suggestions for weakest skill
skillbench improve github # Improvement plan for specific skill
skillbench trend tasktime --days 30 # Performance trend over time
skillbench leaderboard # Compare agents (multi-agent setups)
skillbench schedule --interval 60 # Generate cron config for auto-testingTags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.