✓ Verified
💻 Development
✓ Enhanced Data
Visual Rpa Skill
Visual RPA desktop automation skill.
- Rating
- 4.3 (464 reviews)
- Downloads
- 1,069 downloads
- Version
- 1.0.0
Overview
Visual RPA desktop automation skill.
Complete Documentation
View Source →
Visual RPA Desktop Automation
Auto-execute all steps without waiting for user confirmation between steps.
Desktop automation via screen capture + Qwen vision model (Qwen-VL). No DOM or accessibility API needed.
How it works
- Capture screen -> thumbnail rough positioning
- Full-resolution crop -> precise coordinate refinement
- Execute mouse/keyboard action -> screenshot verification
- Compound instructions automatically decomposed into atomic steps
Usage
Use exec tool to run commands. Script path: $env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py
Requires DASHSCOPE_API_KEY environment variable to be set.
Single task
text
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "click to open WeChat"
Compound task (auto-decomposed)
text
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "open WeChat, open File Transfer chat, type hello in input box, click send"
Multi-step task (manually specified)
text
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "click Chrome browser" "type baidu.com in address bar and press enter" "type weather in search box" "click search button"
Skip verification (faster)
text
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --no-verify --task "click to open Calculator"
Parameters
| Parameter | Description |
|---|---|
| --mode task | Batch task mode (required) |
| --mode interactive | Interactive mode (default) |
| --task "step1" "step2" | Task instructions, supports multiple |
| --no-verify | Skip post-action verification |
| --model MODEL | Vision model name (default: qwen-vl-max-latest) |
| --api-key KEY | API Key (defaults to DASHSCOPE_API_KEY env var) |
Supported actions
| Action | Example instructions |
|---|---|
| Click | "click start menu", "click Chrome icon" |
| Double click | "double click Recycle Bin on desktop" |
| Right click | "right click on desktop blank area" |
| Type text | "type weather in search box", "type hello in input box" |
| Hotkey | "press Ctrl+C" |
| Scroll | "scroll down the page" |
| Wait | "wait for page to load" |
Instruction tips
- Be specific: "click WeChat icon on taskbar" is better than "open WeChat"
- Instructions can be in Chinese or English, the model understands both
- Complex operations can be written as compound instructions, system auto-decomposes
- For text input: say "type XXX in YYY", system auto-detects as input action
Output format
text
[OK] Step 0: click to open WeChat
click @ (375,1591)
[OK] Step 1: click File Transfer Assistant in WeChat
click @ (154,97)
[FAIL] Step 2: type hello in input box
type @ (300,1364)
2/3 succeeded
- OK = action succeeded and verified
- FAIL = action failed or verification failed, auto-retries up to 3 times
Common scenarios
Send WeChat message
text
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "open WeChat, open File Transfer Assistant chat, type hello in input box, click send"
Open app and navigate
text
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "click Chrome browser" "type https://www.baidu.com in address bar and press enter"
Desktop operations
text
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --task "right click on desktop blank area" "click New Folder"
Notes
- Each step takes 3-8 seconds (screenshot + API calls + verification)
- Chinese text input uses clipboard paste, will overwrite current clipboard
- Only operates on primary screen
- Logs and screenshots saved in
./rpa_logs/directory for debugging
Installation
Terminal bash
openclaw install visual-rpa-skill
Copied!
💻Code Examples
python "$env:TAXBOT_ROOT/skills/visual-rpa/scripts/visual_rpa.py" --mode task --no-verify --task "click to open Calculator"
python-envtaxbotrootskillsvisual-rpascriptsvisualrpapy---mode-task---no-verify---task-click-to-open-calculator.txt
### Parameters
| Parameter | Description |
|-----------|-------------|
| `--mode task` | Batch task mode (required) |
| `--mode interactive` | Interactive mode (default) |
| `--task "step1" "step2"` | Task instructions, supports multiple |
| `--no-verify` | Skip post-action verification |
| `--model MODEL` | Vision model name (default: qwen-vl-max-latest) |
| `--api-key KEY` | API Key (defaults to DASHSCOPE_API_KEY env var) |
## Supported actions
| Action | Example instructions |
|--------|---------------------|
| Click | "click start menu", "click Chrome icon" |
| Double click | "double click Recycle Bin on desktop" |
| Right click | "right click on desktop blank area" |
| Type text | "type weather in search box", "type hello in input box" |
| Hotkey | "press Ctrl+C" |
| Scroll | "scroll down the page" |
| Wait | "wait for page to load" |
## Instruction tips
- Be specific: "click WeChat icon on taskbar" is better than "open WeChat"
- Instructions can be in Chinese or English, the model understands both
- Complex operations can be written as compound instructions, system auto-decomposes
- For text input: say "type XXX in YYY", system auto-detects as input action
## Output format2/3 succeeded
-23-succeeded.txt
- **OK** = action succeeded and verified
- **FAIL** = action failed or verification failed, auto-retries up to 3 times
## Common scenarios
### Send WeChat messageexample.txt
[OK] Step 0: click to open WeChat
click @ (375,1591)
[OK] Step 1: click File Transfer Assistant in WeChat
click @ (154,97)
[FAIL] Step 2: type hello in input box
type @ (300,1364)
2/3 succeededTags
#web_and-frontend-development
#automation
Quick Info
Category Development
Model Claude 3.5
Complexity Multi-Agent
Author neilhexiaoning-alt
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
Ready to Install?
Get started with this skill in seconds
openclaw install visual-rpa-skill
Related Skills
✓ Verified
💻 Development
4claw
4claw — a moderated imageboard for AI agents.
🧠 Claude-Ready
)}
★ 4.4 (118)
↓ 4,990
v1.0.0
✓ Verified
💻 Development
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
🧠 Claude-Ready
)}
★ 4.3 (89)
↓ 4,621
v1.0.0
✓ Verified
💻 Development
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
⚡ GPT-Optimized
)}
★ 3.8 (274)
↓ 17,648
v1.0.0
✓ Verified
💻 Development
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
🧠 Claude-Ready
)}
★ 4.7 (88)
↓ 1,625
v1.0.0