X Extract
Extract tweet content from x.com URLs without credentials using browser automation.
- Rating
- 4.7 (287 reviews)
- Downloads
- 14,249 downloads
- Version
- 1.0.0
Overview
Extract tweet content from x.com URLs without credentials using browser automation.
Complete Documentation
View Source →
X.com Tweet Extraction
Extract tweet content (text, media, author, metadata) from x.com URLs without requiring Twitter/X credentials.
How It Works
Uses OpenClaw's browser tool to load the tweet page, then extracts content from the rendered HTML.
Workflow
1. Validate URL
Check that the URL is a valid x.com/twitter.com tweet:
- Must contain
x.com//status/ortwitter.com//status/ - Extract tweet ID from URL pattern:
/status/(\d+)
2. Open in Browser
browser action=open profile=openclaw targetUrl=<x.com-url>
Wait for page load (targetId returned).
3. Capture Snapshot
browser action=snapshot targetId=<TARGET_ID> snapshotFormat=aria
4. Extract Content
From the snapshot, extract:
Required fields:
- Tweet text: Look for role=article containing the main tweet content
- Author: role=link with author name/handle (usually @username format)
- Timestamp: role=time element
- Media: role=img or role=link containing /photo/, /video/
- Engagement: Like count, retweet count, reply count (in role=group or role=button)
- Thread context: If tweet is part of thread, note previous/next tweet references
5. Format Output
Output as structured markdown:
# Tweet by @username
**Author:** Full Name (@handle)
**Posted:** YYYY-MM-DD HH:MM
**Source:** <original-url>
---
<Tweet text content here>
---
**Media:**
- 
- 
**Engagement:**
- 👍 Likes: 1,234
- 🔄 Retweets: 567
- 💬 Replies: 89
**Thread:** [Part 2/5] | [View full thread](<thread-url>)
6. Download Media (Optional)
If user requests --download-media or "download images":
- Extract all media URLs from snapshot
- Use
execwithcurlorwgetto download:
curl -L -o "tweet-{tweetId}-image-{n}.jpg" "<media-url>"
- Report downloaded files with paths
Error Handling
If page fails to load:
- Check if URL is valid
- Try alternative: replace
x.comwithtwitter.com(still works) - Some tweets may require login (controversial, age-restricted) - report to user
- X.com layout may have changed - check references/selectors.md
- Provide raw snapshot to user for manual review
- Report which fields were successfully extracted
Common Selectors
See references/selectors.md for detailed CSS/ARIA selectors used by x.com (updated as layout changes).
Limitations
- No credentials: Cannot access protected tweets, DMs, or login-required content
- Rate limiting: X.com may block excessive automated requests
- Layout changes: Selectors may break if X updates their HTML structure
- Dynamic content: Some content (comments, threads) may load lazily
Examples
Extract single tweet:
User: "Extract this tweet: https://x.com/vista8/status/2019651804062241077"
Agent: [Opens browser, captures snapshot, formats markdown output]
Extract with media download:
User: "Get the tweet text and download all images from https://x.com/user/status/123"
Agent: [Extracts content, downloads images to ./downloads/, reports paths]
Thread extraction:
User: "Extract this thread: https://x.com/user/status/456"
Agent: [Detects thread, extracts all tweets in sequence, formats as numbered list]
Installation
openclaw install x-extract
💻Code Examples
browser action=open profile=openclaw targetUrl=<x.com-url>
Wait for page load (targetId returned).
### 3. Capture Snapshotbrowser action=snapshot targetId=<TARGET_ID> snapshotFormat=aria
### 4. Extract Content
From the snapshot, extract:
**Required fields:**
- **Tweet text**: Look for role=article containing the main tweet content
- **Author**: role=link with author name/handle (usually @username format)
- **Timestamp**: role=time element
**Optional fields:**
- **Media**: role=img or role=link containing /photo/, /video/
- **Engagement**: Like count, retweet count, reply count (in role=group or role=button)
- **Thread context**: If tweet is part of thread, note previous/next tweet references
### 5. Format Output
Output as structured markdown:**Thread:** [Part 2/5] | [View full thread](<thread-url>)
### 6. Download Media (Optional)
If user requests `--download-media` or "download images":
1. Extract all media URLs from snapshot
2. Use `exec` with `curl` or `wget` to download:**Extract single tweet:**
User: "Extract this tweet: https://x.com/vista8/status/2019651804062241077"
Agent: [Opens browser, captures snapshot, formats markdown output]**Extract with media download:**
User: "Get the tweet text and download all images from https://x.com/user/status/123"
Agent: [Extracts content, downloads images to ./downloads/, reports paths]**Thread extraction:**
User: "Extract this thread: https://x.com/user/status/456"
Agent: [Detects thread, extracts all tweets in sequence, formats as numbered list]# Tweet by @username
**Author:** Full Name (@handle)
**Posted:** YYYY-MM-DD HH:MM
**Source:** <original-url>
---
<Tweet text content here>
---
**Media:**
- 
- 
**Engagement:**
- 👍 Likes: 1,234
- 🔄 Retweets: 567
- 💬 Replies: 89
**Thread:** [Part 2/5] | [View full thread](<thread-url>)Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.