✓ Verified 💻 Development ✓ Enhanced Data

Agentic Browser 0 1 2

Browser automation for AI agents via inference.sh.

Rating
3.8 (386 reviews)
Downloads
2,000 downloads
Version
1.0.0

Overview

Browser automation for AI agents via inference.sh.

Complete Documentation

View Source →

Agentic Browser

!Agentic Browser

Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Quick Start

bash
# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

Core Workflow

Every browser automation follows this pattern:

  • Open - Navigate to URL, get @e refs for elements
  • Interact - Use refs to click, fill, drag, etc.
  • Re-snapshot - After navigation/changes, get fresh refs
  • Close - End session (returns video if recording)
bash
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "[email protected]"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

Functions

FunctionDescription
openNavigate to URL, configure browser (viewport, proxy, video recording)
snapshotRe-fetch page state with @e refs after DOM changes
interactPerform actions using @e refs (click, fill, drag, upload, etc.)
screenshotTake page screenshot (viewport or full page)
executeRun JavaScript code on the page
closeClose session, returns video if recording was enabled

Interact Actions

ActionDescriptionRequired Fields
clickClick elementref
dblclickDouble-click elementref
fillClear and type textref, text
typeType text (no clear)text
pressPress key (Enter, Tab, etc.)text
selectSelect dropdown optionref, text
hoverHover over elementref
checkCheck checkboxref
uncheckUncheck checkboxref
dragDrag and dropref, target_ref
uploadUpload file(s)ref, file_paths
scrollScroll pagedirection (up/down/left/right), scroll_amount
backGo back in history-
waitWait millisecondswait_ms
gotoNavigate to URLurl

Element Refs

Elements are returned with @e refs:

text
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"

Important: Refs are invalidated after navigation. Always re-snapshot after:

  • Clicking links/buttons that navigate
  • Form submissions
  • Dynamic content loading

Features

Video Recording

Record browser sessions for debugging or documentation:

bash
# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}

Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'

The cursor appears as a red dot that follows mouse movements and shows click feedback.

Proxy Support

Route traffic through a proxy server:

bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'

File Upload

Upload files to file inputs:

bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'

Drag and Drop

Drag elements to targets:

bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'

JavaScript Execution

Run custom JavaScript:

bash
infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}

Deep-Dive Documentation

ReferenceDescription
references/commands.mdFull function reference with all options
references/snapshot-refs.mdRef lifecycle, invalidation rules, troubleshooting
references/session-management.mdSession persistence, parallel sessions
references/authentication.mdLogin flows, OAuth, 2FA handling
references/video-recording.mdRecording workflows for debugging
references/proxy-support.mdProxy configuration, geo-testing

Ready-to-Use Templates

TemplateDescription
templates/form-automation.shForm filling with validation
templates/authenticated-session.shLogin once, reuse session
templates/capture-workflow.shContent extraction with screenshots

Examples

Form Submission

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "[email protected]"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Search and Extract

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'

Screenshot with Video

bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'

Sessions

Browser state persists within a session. Always:

  • Start with --session new on first call
  • Use returned session_id for subsequent calls
  • Close session when done

Related Skills

bash
# Web search (for research + browse)
npx skills add inferencesh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inferencesh/skills@llm-models

Documentation

Installation

Terminal bash

openclaw install agentic-browser-0-1-2
    
Copied!

💻Code Examples

infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

infsh-app-run-agent-browser---function-open---input-url-httpsexamplecom---session-new.txt
## Core Workflow

Every browser automation follows this pattern:

1. **Open** - Navigate to URL, get `@e` refs for elements
2. **Interact** - Use refs to click, fill, drag, etc.
3. **Re-snapshot** - After navigation/changes, get fresh refs
4. **Close** - End session (returns video if recording)

infsh app run agent-browser --function close --session $SESSION_ID --input '{}'

infsh-app-run-agent-browser---function-close---session-sessionid---input-.txt
## Functions

| Function | Description |
|----------|-------------|
| `open` | Navigate to URL, configure browser (viewport, proxy, video recording) |
| `snapshot` | Re-fetch page state with `@e` refs after DOM changes |
| `interact` | Perform actions using `@e` refs (click, fill, drag, upload, etc.) |
| `screenshot` | Take page screenshot (viewport or full page) |
| `execute` | Run JavaScript code on the page |
| `close` | Close session, returns video if recording was enabled |

## Interact Actions

| Action | Description | Required Fields |
|--------|-------------|-----------------|
| `click` | Click element | `ref` |
| `dblclick` | Double-click element | `ref` |
| `fill` | Clear and type text | `ref`, `text` |
| `type` | Type text (no clear) | `text` |
| `press` | Press key (Enter, Tab, etc.) | `text` |
| `select` | Select dropdown option | `ref`, `text` |
| `hover` | Hover over element | `ref` |
| `check` | Check checkbox | `ref` |
| `uncheck` | Uncheck checkbox | `ref` |
| `drag` | Drag and drop | `ref`, `target_ref` |
| `upload` | Upload file(s) | `ref`, `file_paths` |
| `scroll` | Scroll page | `direction` (up/down/left/right), `scroll_amount` |
| `back` | Go back in history | - |
| `wait` | Wait milliseconds | `wait_ms` |
| `goto` | Navigate to URL | `url` |

## Element Refs

Elements are returned with `@e` refs:

@e5 [input type="checkbox"] name="agree"

e5-input-typecheckbox-nameagree.txt
**Important:** Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading

## Features

### Video Recording

Record browser sessions for debugging or documentation:

# Returns: {"success": true, "video": <File>}

-returns-success-true-video-file.txt
### Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

}'

.txt
The cursor appears as a red dot that follows mouse movements and shows click feedback.

### Proxy Support

Route traffic through a proxy server:

}'

.txt
### File Upload

Upload files to file inputs:

}'

.txt
### Drag and Drop

Drag elements to targets:

}'

.txt
### JavaScript Execution

Run custom JavaScript:

# Returns: {"result": "5", "screenshot": <File>}

-returns-result-5-screenshot-file.txt
## Deep-Dive Documentation

| Reference | Description |
|-----------|-------------|
| [references/commands.md](references/commands.md) | Full function reference with all options |
| [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
| [references/session-management.md](references/session-management.md) | Session persistence, parallel sessions |
| [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling |
| [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging |
| [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing |

## Ready-to-Use Templates

| Template | Description |
|----------|-------------|
| [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation |
| [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse session |
| [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |

## Examples

### Form Submission

echo $RESULT | jq '.video'

echo-result--jq-video.txt
## Sessions

Browser state persists within a session. Always:

1. Start with `--session new` on first call
2. Use returned `session_id` for subsequent calls
3. Close session when done

## Related Skills

Tags

#web_and-frontend-development #automation

Quick Info

Category Development
Model Claude 3.5
Complexity Multi-Agent
Author xyny89
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install agentic-browser-0-1-2