✓ Verified 💻 Development ✓ Enhanced Data

Skillbench

Track skill versions, benchmark performance, compare improvements, and get self-improvement signals.

Rating
4.3 (135 reviews)
Downloads
1,272 downloads
Version
1.0.0

Overview

Track skill versions, benchmark performance, compare improvements, and get self-improvement signals.

Complete Documentation

View Source →

skillbench Skill

Self-improving skill ecosystem for AI agents.

Track skill versions, benchmark performance, compare improvements, and get signals on what to fix next.

Part of the ClawVault ecosystem | tasktime | ClawHub

Installation

bash
npm install -g @versatly/skillbench

The Loop

text
1. Use a skill    → skillbench use [email protected]
2. Do the task    → tt start "Create PR" && ... && tt stop
3. Record result  → skillbench record "Create PR" --success
4. Check scores   → skillbench score github
5. Improve skill  → Update skill, bump version
6. Repeat         → Compare v1.0.0 vs v1.1.0

Commands

Track Skills

bash
skillbench use [email protected]            # Set active skill version
skillbench skills                       # List tracked skills + signals

Record Benchmarks

bash
# Auto-pulls duration from tasktime
skillbench record "Create PR" --success

# Manual duration
skillbench record "Create PR" --duration 45s --success

# Record failures
skillbench record "Create PR" --fail --error-type "auth-error"

Score & Compare

bash
skillbench score                        # All skills with grades
skillbench score github                 # Single skill
skillbench compare [email protected] [email protected]

Export & Dashboard

bash
skillbench export --format markdown
skillbench export --format json
skillbench dashboard                    # Generate HTML dashboard
skillbench dashboard --open             # Generate and open in browser

Automated Testing

bash
skillbench test [email protected]          # Run smoke test
skillbench test [email protected] --suite full  # Run named suite
skillbench test [email protected] --dry-run     # Test without recording

Sync

bash
skillbench sync --clawhub               # Import installed skills
skillbench sync --vault                 # Sync to ClawVault
skillbench sync --all                   # Everything

Health & Monitoring

bash
skillbench health                       # Overall health report with alerts
skillbench watch --once                 # Run all test suites once
skillbench watch --interval 300         # Continuous monitoring every 5 min

Analysis & Improvement

bash
skillbench improve                      # Get suggestions for weakest skill
skillbench improve github               # Improvement plan for specific skill
skillbench trend tasktime --days 30     # Performance trend over time
skillbench leaderboard                  # Compare agents (multi-agent setups)
skillbench schedule --interval 60       # Generate cron config for auto-testing

Baselines & Regression Detection

bash
skillbench baseline tasktime --set      # Set baseline from current performance
skillbench baseline --list              # List all baselines
skillbench baseline --check             # Check all baselines (CI-friendly, exits 1 if failing)
skillbench baseline tasktime --remove   # Remove a baseline

CI/CD Integration

bash
skillbench ci                           # Run all tests + baseline checks
skillbench ci --json                    # JSON output for automation
skillbench badge                        # Generate shields.io badges for README

Copy examples/github-action.yml for ready-to-use GitHub Actions workflow.

Grading System

GradeScoreMeaning
🏆 A+95-100Elite performance
✅ A85-94Excellent
👍 B70-84Good
⚠️ C50-69Needs work
❌ D<50Broken
Based on: Success Rate (40%), Avg Duration (30%), Consistency (20%), Trend (10%)

tasktime Integration

When you omit --duration, skillbench pulls from tasktime:

bash
tt start "Create PR" -c git
# ... do work ...
tt stop
skillbench record --success   # Duration auto-pulled

ClawVault Integration

Benchmarks sync to ClawVault automatically.

Improvement Signals

skillbench skills shows:

  • ⚠️ needs work — Success rate below 70%
  • 🕐 stale — No benchmarks in 7+ days
  • ↘️ declining — Getting worse over time

Related

Installation

Terminal bash

openclaw install skillbench
    
Copied!

💻Code Examples

6. Repeat → Compare v1.0.0 vs v1.1.0

6-repeat--compare-v100-vs-v110.txt
## Commands

### Track Skills

skillbench badge # Generate shields.io badges for README

skillbench-badge--generate-shieldsio-badges-for-readme.txt
Copy `examples/github-action.yml` for ready-to-use GitHub Actions workflow.

## Grading System

| Grade | Score | Meaning |
|-------|-------|---------|
| 🏆 A+ | 95-100 | Elite performance |
| ✅ A | 85-94 | Excellent |
| 👍 B | 70-84 | Good |
| ⚠️ C | 50-69 | Needs work |
| ❌ D | <50 | Broken |

Based on: Success Rate (40%), Avg Duration (30%), Consistency (20%), Trend (10%)

## tasktime Integration

When you omit `--duration`, skillbench pulls from [tasktime](https://clawhub.com/skills/tasktime):
example.txt
1. Use a skill    → skillbench use [email protected]
2. Do the task    → tt start "Create PR" && ... && tt stop
3. Record result  → skillbench record "Create PR" --success
4. Check scores   → skillbench score github
5. Improve skill  → Update skill, bump version
6. Repeat         → Compare v1.0.0 vs v1.1.0
example.sh
# Auto-pulls duration from tasktime
skillbench record "Create PR" --success

# Manual duration
skillbench record "Create PR" --duration 45s --success

# Record failures
skillbench record "Create PR" --fail --error-type "auth-error"
example.sh
skillbench score                        # All skills with grades
skillbench score github                 # Single skill
skillbench compare [email protected] [email protected]
example.sh
skillbench export --format markdown
skillbench export --format json
skillbench dashboard                    # Generate HTML dashboard
skillbench dashboard --open             # Generate and open in browser
example.sh
skillbench test [email protected]          # Run smoke test
skillbench test [email protected] --suite full  # Run named suite
skillbench test [email protected] --dry-run     # Test without recording
example.sh
skillbench sync --clawhub               # Import installed skills
skillbench sync --vault                 # Sync to ClawVault
skillbench sync --all                   # Everything
example.sh
skillbench health                       # Overall health report with alerts
skillbench watch --once                 # Run all test suites once
skillbench watch --interval 300         # Continuous monitoring every 5 min
example.sh
skillbench improve                      # Get suggestions for weakest skill
skillbench improve github               # Improvement plan for specific skill
skillbench trend tasktime --days 30     # Performance trend over time
skillbench leaderboard                  # Compare agents (multi-agent setups)
skillbench schedule --interval 60       # Generate cron config for auto-testing

Tags

#git_and-github

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author g9pedro
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install skillbench