✓ Verified 🌐 Web Scrapers ✓ Enhanced Data

Atl Mobile

Mobile browser and native app automation via ATL (iOS Simulator).

Rating
4.6 (498 reviews)
Downloads
1,250 downloads
Version
1.0.0

Overview

Mobile browser and native app automation via ATL (iOS Simulator).

Complete Documentation

View Source →

ATL — Agent Touch Layer

The automation layer between AI agents and iOS

ATL provides HTTP-based automation for iOS Simulator — both browser (mobile Safari) and native apps. Think Playwright, but for mobile.

🔀 Two Servers: Browser & Native

ATL uses two separate servers for browser and native app automation:

ServerPortUse CaseKey Commands
Browser9222Web automation in mobile Safarigoto, markElements, clickMark, evaluate
Native9223iOS app automation (Settings, Contacts, any app)openApp, snapshot, tapRef, find
text
┌─────────────────────────────────────────────────────────────┐
│  BROWSER SERVER (9222)     │     NATIVE SERVER (9223)      │
│  (mobile Safari/WebView)   │     (iOS apps via XCTest)     │
│                            │                                │
│  markElements + clickMark  │     snapshot + tapRef         │
│  CSS selectors             │     accessibility tree        │
│  DOM evaluation            │     element references        │
│  tap, swipe, screenshot    │     tap, swipe, screenshot    │
└─────────────────────────────────────────────────────────────┘

Why two ports? Native app automation requires XCTest APIs (XCUIApplication, XCUIElement) which are only available in UI Test bundles. The native server runs as a UI Test that exposes an HTTP API.

Starting the Servers

bash
# Browser server (starts automatically with AtlBrowser app)
xcrun simctl launch booted com.atl.browser
curl http://localhost:9222/ping  # → {"status":"ok"}

# Native server (run as UI Test)
cd ~/Atl/core/AtlBrowser
xcodebuild test -workspace AtlBrowser.xcworkspace \
  -scheme AtlBrowser \
  -destination 'id=<SIMULATOR_UDID>' \
  -only-testing:AtlBrowserUITests/NativeServer/testNativeServer &
  
# Wait for it to start, then:
curl http://localhost:9223/ping  # → {"status":"ok","mode":"native"}

Quick Port Reference

TaskPortExample
Browse websites9222curl localhost:9222/command -d '{"method":"goto",...}'
Open native app9223curl localhost:9223/command -d '{"method":"openApp",...}'
Screenshot (browser)9222curl localhost:9222/command -d '{"method":"screenshot"}'
Screenshot (native)9223curl localhost:9223/command -d '{"method":"screenshot"}'

📱 Native App Automation (Port 9223)

Native automation uses port 9223 and automates any iOS app using the accessibility tree — no DOM, no JavaScript, just direct element interaction.

Opening & Closing Apps

bash
# Open an app by bundle ID
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'
# → {"success":true,"result":{"bundleId":"com.apple.Preferences","mode":"native","state":"running"}}

# Check current app state
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"appState"}'
# → {"success":true,"result":{"mode":"native","bundleId":"com.apple.Preferences","state":"running"}}

# Close current app
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"closeApp"}'
# → {"success":true,"result":{"closed":true}}

Common Bundle IDs

AppBundle ID
Settingscom.apple.Preferences
Contactscom.apple.MobileAddressBook
Calculatorcom.apple.calculator
Calendarcom.apple.mobilecal
Photoscom.apple.mobileslideshow
Notescom.apple.mobilenotes
Reminderscom.apple.reminders
Clockcom.apple.mobiletimer
Mapscom.apple.Maps
Safaricom.apple.mobilesafari

The snapshot Command

snapshot returns the accessibility tree — all visible elements with their properties and tap-able references.

bash
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result'

Example output:

json
{
  "count": 12,
  "elements": [
    {
      "ref": "e0",
      "type": "cell",
      "label": "Wi-Fi",
      "value": "MyNetwork",
      "identifier": "",
      "x": 0,
      "y": 142,
      "width": 393,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    },
    {
      "ref": "e1",
      "type": "cell",
      "label": "Bluetooth",
      "value": "On",
      "identifier": "",
      "x": 0,
      "y": 186,
      "width": 393,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    },
    {
      "ref": "e2",
      "type": "button",
      "label": "Back",
      "value": null,
      "identifier": "Back",
      "x": 0,
      "y": 44,
      "width": 80,
      "height": 44,
      "isHittable": true,
      "isEnabled": true
    }
  ]
}

Parameters:

  • interactiveOnly (bool, default: false) — Only return hittable elements
  • maxDepth (int, optional) — Limit tree traversal depth

The tapRef Command

Tap an element by its reference from the last snapshot:

bash
# Take snapshot first
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}'

# Tap element e0 (Wi-Fi cell from example above)
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"tapRef","params":{"ref":"e0"}}'
# → {"success":true}

The find Command

Find and interact with elements by text — no need to parse snapshot manually:

bash
# Find and tap "Wi-Fi"
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'
# → {"success":true,"result":{"found":true,"ref":"e0"}}

# Check if an element exists
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Bluetooth","action":"exists"}}'
# → {"success":true,"result":{"found":true,"ref":"e1"}}

# Find and fill a text field
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"First name","action":"fill","value":"John"}}'

# Get element info without interacting
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Cancel","action":"get"}}'
# → {"success":true,"result":{"found":true,"ref":"e5","element":{...}}}

Parameters:

  • text (string) — Text to search for (matches label, value, or identifier)
  • action (string) — One of: tap, fill, exists, get
  • value (string, optional) — Text to fill (required for action:"fill")
  • by (string, optional) — Narrow search: label, value, identifier, type, or any (default)

🔄 Native App Workflow Example

Here's a complete flow: open Settings, navigate to Wi-Fi, take a screenshot:

bash
# 1. Open Settings app
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}'

# 2. Wait for app to launch
sleep 1

# 3. Take snapshot to see available elements
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"snapshot","params":{"interactiveOnly":true}}' | jq '.result.elements[:5]'

# 4. Find and tap Wi-Fi
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}'

# 5. Wait for navigation
sleep 0.5

# 6. Take screenshot of Wi-Fi settings
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"screenshot"}' | jq -r '.result.data' | base64 -d > /tmp/wifi-settings.png

# 7. Navigate back (swipe right from left edge)
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"swipe","params":{"direction":"right"}}'

# 8. Close the app
curl -s -X POST http://localhost:9223/command \
  -d '{"method":"closeApp"}'

Helper Script Version

bash
source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh

atl_openapp "com.apple.Preferences"
sleep 1
atl_find "Wi-Fi" tap
sleep 0.5
atl_screenshot /tmp/wifi-settings.png
atl_swipe right
atl_closeapp


💡 Core Insight: Vision-Free Automation

ATL's killer feature is spatial understanding without vision models:

text
┌─────────────────────────────────────────────────────────────┐
│  markElements + captureForVision = COMPLETE PAGE KNOWLEDGE  │
└─────────────────────────────────────────────────────────────┘

1. markElements  → Numbers every interactive element [1] [2] [3]
2. captureForVision → PDF with text layer + element coordinates
3. tap x=234 y=567 → Pixel-perfect touch at exact position

Why this matters:

  • No vision API calls — zero token cost for "seeing" the page
  • Faster — no round-trip to GPT-4V/Claude Vision
  • Deterministic — same page = same coordinates, every time
  • Reliable — pixel-perfect coordinates vs. vision interpretation

The Vision-Free Workflow

bash
# 1. Mark elements (adds numbered labels + stores coordinates)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"markElements","params":{}}'

# 2. Capture PDF with text layer (machine-readable, has coordinates)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"captureForVision","params":{"savePath":"/tmp","name":"page"}}' \
  | jq -r '.result.path'
# → /tmp/page.pdf (text-selectable, contains element positions)

# 3. Get specific element's position by mark label
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"getMarkInfo","params":{"label":5}}' | jq '.result'
# → {"label":5, "tag":"button", "text":"Add to Cart", "x":187, "y":432, "width":120, "height":44}

# 4. Tap at exact coordinates
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"tap","params":{"x":187,"y":432}}'

The marks tell you WHERE everything is. The PDF tells you WHAT everything says. Together = full page understanding.

🎯 The Escalation Ladder

When automation gets stuck, escalate through these levels:

text
┌─────────────────────────────────────────────────────────────┐
│  Level 1: COORDINATES (fast, cheap, no API calls)          │
│  markElements → getMarkInfo → tap x,y                      │
│                                                             │
│  ↓ If stuck after 2-3 tries...                             │
│                                                             │
│  Level 2: VISION FALLBACK (screenshot to understand state) │
│  screenshot → analyze UI → identify blockers (modals, etc) │
│                                                             │
│  ↓ If still stuck...                                       │
│                                                             │
│  Level 3: JS INJECTION (direct DOM manipulation)           │
│  evaluate → dispatchEvent → force interactions             │
└─────────────────────────────────────────────────────────────┘

When to Escalate

SymptomLikely CauseAction
Tap succeeds but nothing changesModal/overlay openedScreenshot → find new button
Cart count doesn't updateSite needs login or has bot detectionTry JS click with events
Element not found after scrollMarks are page-relative, not viewportUse getBoundingClientRect via evaluate
Same error 3+ timesUI state changed unexpectedlyScreenshot to see actual state

Real-World Pattern: E-commerce Checkout

bash
# 1. Search and find product
atl_goto "https://store.com/search?q=headphones"
atl_mark

# 2. First, dismiss any modals/banners (ALWAYS DO THIS)
# Look for: close, dismiss, continue, accept, no thanks, got it
CLOSE=$(atl_find "close")
[ -n "$CLOSE" ] && atl_click $CLOSE

# 3. Find and click Add to Cart
ATC=$(atl_find "Add to cart")
atl_click $ATC

# 4. Wait, then CHECK if it worked
sleep 2
atl_screenshot /tmp/after-click.png

# 5. If cart didn't update, LOOK at the screenshot
# Maybe a "Choose options" modal opened - find the NEW Add to Cart button
# This is the vision fallback - you need to SEE what happened

Key Insight: Modals Change Everything

When you click "Add to cart" on sites like Target, Amazon, etc., they often:

  • Open a "Choose options" modal (size, color, quantity)
  • Show an upsell (protection plans, accessories)
  • Display a confirmation with "View cart" or "Continue shopping"
Your original tap WORKED — you just can't see the result without a screenshot.

🚀 Quick Start (30 seconds)

bash
# 1. Setup (boots sim, installs ATL)
~/.openclaw/skills/atl-browser/scripts/setup.sh

# 2. Navigate somewhere
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"goto","params":{"url":"https://example.com"}}'

# 3. Mark elements (shows [1], [2], [3] labels)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"markElements","params":{}}'

# 4. Take screenshot
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > /tmp/page.png

# 5. Click element [1]
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"clickMark","params":{"label":1}}'

Or use the helper functions:

bash
source ~/.openclaw/skills/atl-browser/scripts/atl-helper.sh
atl_goto "https://example.com"
atl_mark
atl_screenshot /tmp/page.png
atl_click 1

Quick Reference

Base URL: http://localhost:9222

Common Commands

bash
# Check if ATL is running
curl -s http://localhost:9222/ping

# Navigate to URL
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"1","method":"goto","params":{"url":"https://example.com"}}'

# Wait for page ready
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}'

# Take screenshot (returns base64 PNG)
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"3","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > screenshot.png

# Mark interactive elements (shows numbered labels)
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"4","method":"markElements","params":{}}'

# Click by mark label
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"5","method":"clickMark","params":{"label":3}}'

# Scroll page
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"6","method":"evaluate","params":{"script":"window.scrollBy(0, 500)"}}'

# Type text
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"7","method":"type","params":{"text":"Hello world"}}'

# Click by CSS selector
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"8","method":"click","params":{"selector":"button.submit"}}'

Setup (First Time)

1. Start Simulator

bash
# Boot iPhone 17 simulator (or another device)
xcrun simctl boot "iPhone 17"

# Open Simulator app
open -a Simulator

2. Build & Install AtlBrowser

bash
cd ~/Atl/core/AtlBrowser

# Build for simulator (RECOMMENDED: target by UDID)
# Why: name-based destinations can cause Xcode to pick an older iOS runtime (15/16)
# and fail if AtlBrowser has an iOS 17+ deployment target.
#
# 1) Find a suitable simulator UDID (iOS 17+):
#   xcrun simctl list devices available
#
# 2) Build targeting that UDID:
xcodebuild -workspace AtlBrowser.xcworkspace \
  -scheme AtlBrowser \
  -destination 'id=<SIM_UDID>' \
  -derivedDataPath /tmp/atl-dd \
  build

# Install to a specific simulator (preferred)
xcrun simctl install <SIM_UDID> \
  /tmp/atl-dd/Build/Products/Debug-iphonesimulator/AtlBrowser.app

# Launch the app
xcrun simctl launch <SIM_UDID> com.atl.browser

3. Verify Server

bash
curl -s http://localhost:9222/ping
# Should return: {"status":"ok"}

All Available Methods

App Control (Native Mode)

MethodParamsModeDescription
openApp{bundleId}Any→NativeOpen app, switch to native mode
closeApp-NativeClose current app, return to browser mode
appState-AnyGet current mode and bundleId
openBrowser-Native→BrowserSwitch back to browser mode

Native Accessibility

MethodParamsModeDescription
snapshot{interactiveOnly?, maxDepth?}NativeGet accessibility tree
tapRef{ref}NativeTap element by ref (e.g., "e0")
find{text, action, value?, by?}NativeFind element and interact
fillRef{ref, text}NativeTap element and type text
focusRef{ref}NativeFocus element without typing

Navigation (Browser)

MethodParamsModeDescription
goto{url}BrowserNavigate to URL
reload-BrowserReload page
goBack-BrowserGo back
goForward-BrowserGo forward
getURL-BrowserGet current URL
getTitle-BrowserGet page title

Interactions (Browser)

MethodParamsModeDescription
click{selector}BrowserClick element
doubleClick{selector}BrowserDouble-click
type{text}BothType text
fill{selector, value}BrowserFill input field
press{key}BothPress key
hover{selector}BrowserHover over element
scrollIntoView{selector}BrowserScroll to element

Mark System (Browser)

MethodParamsModeDescription
markElements-BrowserMark visible interactive elements
markAll-BrowserMark ALL interactive elements
unmarkElements-BrowserRemove marks
clickMark{label}BrowserClick by label number
getMarkInfo{label}BrowserGet element info by label

Screenshots & Capture

MethodParamsModeDescription
screenshot{fullPage?, selector?}BothTake screenshot
captureForVision{savePath?, name?}BrowserFull page PDF
captureJPEG{quality?, fullPage?}BothJPEG capture
captureLight-BrowserText + interactives only

Waiting (Browser)

MethodParamsModeDescription
waitForSelector{selector, timeout?}BrowserWait for element
waitForNavigation-BrowserWait for navigation
waitForReady{timeout?, stabilityMs?}BrowserWait for page ready
waitForAny{selectors, timeout?}BrowserWait for any selector

JavaScript (Browser)

MethodParamsModeDescription
evaluate{script}BrowserRun JavaScript
querySelector{selector}BrowserFind element
querySelectorAll{selector}BrowserFind all elements
getDOMSnapshot-BrowserGet page HTML

Cookies (Browser)

MethodParamsModeDescription
getCookies-BrowserGet all cookies
setCookies{cookies}BrowserSet cookies
deleteCookies-BrowserDelete all cookies

Touch Gestures (Both Modes)

MethodParamsModeDescription
tap{x, y}BothTap at coordinates
longPress{x, y, duration?}BothLong press (default 0.5s)
swipe{direction}BothSwipe up/down/left/right
swipe{fromX, fromY, toX, toY}BothSwipe between points
pinch{scale, duration?}BothPinch zoom (scale > 1 = zoom in)
#### Swipe Examples

bash
# Swipe up (scroll down)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"1","method":"swipe","params":{"direction":"up"}}'

# Swipe left (next page in carousel)
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"2","method":"swipe","params":{"direction":"left","distance":400}}'

# Custom swipe path
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"3","method":"swipe","params":{"fromX":200,"fromY":600,"toX":200,"toY":200}}'

# Long press for context menu
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"4","method":"longPress","params":{"x":150,"y":300,"duration":1.0}}'

# Pinch to zoom in
curl -s -X POST http://localhost:9222/command \
  -d '{"id":"5","method":"pinch","params":{"scale":2.0}}'

Typical Workflow

bash
# 1. Navigate to site
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"1","method":"goto","params":{"url":"https://www.apple.com/shop"}}'

# 2. Wait for page to load
sleep 2
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}'

# 3. Mark elements to see what's clickable
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"3","method":"markElements","params":{}}'

# 4. Take screenshot to see the marks
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"4","method":"screenshot","params":{}}' | jq -r '.result.data' | base64 -d > /tmp/page.png

# 5. Click a marked element (e.g., label 14)
curl -s -X POST http://localhost:9222/command \
  -H "Content-Type: application/json" \
  -d '{"id":"5","method":"clickMark","params":{"label":14}}'

# 6. Repeat as needed

Troubleshooting

Navigation not working (goto returns success but page doesn't change)

Known issue: goto command may return success without navigating. Use JS workaround:
bash
# Instead of goto, use evaluate to navigate
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"1","method":"evaluate","params":{"script":"location.href = \"https://example.com\"; true"}}'

# Wait for page load
sleep 3
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"2","method":"waitForReady","params":{"timeout":10}}'

Server not responding

bash
# Check if app is running
xcrun simctl listapps booted | grep atl

# Restart the app
xcrun simctl terminate booted com.atl.browser
xcrun simctl launch booted com.atl.browser

# Check logs
xcrun simctl spawn booted log show --predicate 'process == "AtlBrowser"' --last 1m

Need to rebuild (iOS version changes)

bash
cd ~/Atl/core/AtlBrowser
xcodebuild -workspace AtlBrowser.xcworkspace -scheme AtlBrowser -sdk iphonesimulator build
xcrun simctl install booted ~/Library/Developer/Xcode/DerivedData/AtlBrowser-*/Build/Products/Debug-iphonesimulator/AtlBrowser.app
xcrun simctl launch booted com.atl.browser

Port 9222 in use

The ATL server runs inside the simulator app. If port 9222 is blocked, check for other processes:
bash
lsof -i :9222

Best Practices

1. Clean UI Before Acting

Real users dismiss popups. You should too.
bash
# Before any workflow, check for and dismiss:
# - Cookie consent banners
# - Newsletter popups  
# - Health/privacy consent modals
# - "Download our app" prompts
atl_mark
for KEYWORD in "close" "dismiss" "no thanks" "accept" "got it" "continue"; do
  LABEL=$(atl_find "$KEYWORD")
  [ -n "$LABEL" ] && atl_click $LABEL && sleep 1
done

2. Verify State After Actions

Don't assume — confirm.
bash
atl_click $ADD_TO_CART
sleep 2
# Check if cart updated
CART=$(atl_find "cart [1-9]")
if [ -z "$CART" ]; then
  # Didn't work - take screenshot to see why
  atl_screenshot /tmp/debug.png
  echo "Action may have opened a modal - check screenshot"
fi

3. Use Viewport Coordinates for Taps

Marks give page-relative coordinates. For tap to work, the element must be visible.
bash
# Option A: Scroll element into view first
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"1","method":"evaluate","params":{"script":"document.querySelector(\"#my-button\").scrollIntoView()"}}'

# Option B: Get viewport-relative coords via JS
curl -s -X POST http://localhost:9222/command -H "Content-Type: application/json" \
  -d '{"id":"2","method":"evaluate","params":{"script":"var r = document.querySelector(\"#my-button\").getBoundingClientRect(); JSON.stringify({x: r.x + r.width/2, y: r.y + r.height/2})"}}'

4. Screenshot is Your Debugging Superpower

When in doubt, look.
bash
atl_screenshot /tmp/current-state.png
# Then analyze with vision or just open the file

Notes

  • ATL runs inside the iOS Simulator, sharing the host's network
  • Port 9222 is the default (matches Chrome DevTools Protocol convention)
  • The mark system shows red numbered labels on interactive elements
  • Screenshots are PNG base64-encoded; use base64 -d to decode
  • iOS 26+ compatible (fixed NWListener binding issue)

Requirements

  • macOS with Xcode installed
  • iOS Simulator (comes with Xcode)
  • That's it!

Examples

See examples/ folder:

  • test-browse.sh - Quick bash test workflow

API Reference

For machine-readable API spec, see openapi.yaml — includes all commands, parameters, and response schemas.

Source

  • GitHub: https://github.com/JordanCoin/Atl
  • Author: @JordanCoin

Installation

Terminal bash

openclaw install atl-mobile
    
Copied!

💻Code Examples

└─────────────────────────────────────────────────────────────┘

.txt
**Why two ports?** Native app automation requires XCTest APIs (XCUIApplication, XCUIElement) which are only available in UI Test bundles. The native server runs as a UI Test that exposes an HTTP API.

### Starting the Servers

curl http://localhost:9223/ping # → {"status":"ok","mode":"native"}

curl-httplocalhost9223ping---statusokmodenative.txt
### Quick Port Reference

| Task | Port | Example |
|------|------|---------|
| Browse websites | 9222 | `curl localhost:9222/command -d '{"method":"goto",...}'` |
| Open native app | 9223 | `curl localhost:9223/command -d '{"method":"openApp",...}'` |
| Screenshot (browser) | 9222 | `curl localhost:9222/command -d '{"method":"screenshot"}'` |
| Screenshot (native) | 9223 | `curl localhost:9223/command -d '{"method":"screenshot"}'` |

---

## 📱 Native App Automation (Port 9223)

Native automation uses **port 9223** and automates **any iOS app** using the accessibility tree — no DOM, no JavaScript, just direct element interaction.

### Opening & Closing Apps

# → {"success":true,"result":{"closed":true}}

--successtrueresultclosedtrue.txt
### Common Bundle IDs

| App | Bundle ID |
|-----|-----------|
| Settings | `com.apple.Preferences` |
| Contacts | `com.apple.MobileAddressBook` |
| Calculator | `com.apple.calculator` |
| Calendar | `com.apple.mobilecal` |
| Photos | `com.apple.mobileslideshow` |
| Notes | `com.apple.mobilenotes` |
| Reminders | `com.apple.reminders` |
| Clock | `com.apple.mobiletimer` |
| Maps | `com.apple.Maps` |
| Safari | `com.apple.mobilesafari` |

### The `snapshot` Command

`snapshot` returns the accessibility tree — all visible elements with their properties and tap-able references.

}

.txt
**Parameters:**
- `interactiveOnly` (bool, default: `false`) — Only return hittable elements
- `maxDepth` (int, optional) — Limit tree traversal depth

### The `tapRef` Command

Tap an element by its reference from the last `snapshot`:

# → {"success":true}

--successtrue.txt
### The `find` Command

Find and interact with elements by text — no need to parse snapshot manually:

# → {"success":true,"result":{"found":true,"ref":"e5","element":{...}}}

--successtrueresultfoundtruerefe5element.txt
**Parameters:**
- `text` (string) — Text to search for (matches label, value, or identifier)
- `action` (string) — One of: `tap`, `fill`, `exists`, `get`
- `value` (string, optional) — Text to fill (required for `action:"fill"`)
- `by` (string, optional) — Narrow search: `label`, `value`, `identifier`, `type`, or `any` (default)

---

## 🔄 Native App Workflow Example

Here's a complete flow: open Settings, navigate to Wi-Fi, take a screenshot:

atl_closeapp

atlcloseapp.txt
---

## 💡 Core Insight: Vision-Free Automation

ATL's killer feature is **spatial understanding without vision models**:

3. tap x=234 y=567 → Pixel-perfect touch at exact position

3-tap-x234-y567--pixel-perfect-touch-at-exact-position.txt
**Why this matters:**
- **No vision API calls** — zero token cost for "seeing" the page
- **Faster** — no round-trip to GPT-4V/Claude Vision
- **Deterministic** — same page = same coordinates, every time
- **Reliable** — pixel-perfect coordinates vs. vision interpretation

### The Vision-Free Workflow

-d '{"id":"4","method":"tap","params":{"x":187,"y":432}}'

--d-id4methodtapparamsx187y432.txt
**The marks tell you WHERE everything is. The PDF tells you WHAT everything says. Together = full page understanding.**

## 🎯 The Escalation Ladder

When automation gets stuck, escalate through these levels:

└─────────────────────────────────────────────────────────────┘

.txt
### When to Escalate

| Symptom | Likely Cause | Action |
|---------|--------------|--------|
| Tap succeeds but nothing changes | Modal/overlay opened | Screenshot → find new button |
| Cart count doesn't update | Site needs login or has bot detection | Try JS click with events |
| Element not found after scroll | Marks are page-relative, not viewport | Use `getBoundingClientRect` via evaluate |
| Same error 3+ times | UI state changed unexpectedly | Screenshot to see actual state |

### Real-World Pattern: E-commerce Checkout

Tags

#browser_and-automation #automation

Quick Info

Category Web Scrapers
Model Claude 3.5
Complexity Multi-Agent
Author jordancoin
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install atl-mobile