Long Research
[BETA] Deep research that actually reads pages instead of summarizing search results.
- Rating
- 4.7 (191 reviews)
- Downloads
- 3,082 downloads
- Version
- 1.0.0
Overview
[BETA] Deep research that actually reads pages instead of summarizing search results.
Complete Documentation
View Source →
Long Research
⚠️ BETA — This skill works as described but is under active development. Review the notes below before installing.
Dependencies
- web_search — any search provider (Perplexity Sonar, Brave, etc.)
- web_fetch — built into most agent frameworks
- browser-use (REQUIRED) — install via
pip install browser-use && browser-use install. Used for JS-heavy sites, login-gated forums, and retailer pricing. The skill will not function fully without it.
Privacy & Trust Notes
browser-use remote mode:
The browser-use cascade tries 3 modes in order: chromium (local, free) → real (local, free) → remote (cloud-hosted, burns API credits). Remote mode sends page content to browser-use.com's cloud infrastructure. If you care about privacy, you can:
- Remove
remotefrom the cascade and only use local modes - Run in Interactive mode so the agent asks before escalating to remote
- The skill works fine with local-only browser modes for most sites
--profile flags. Profiles persist cookies locally on your machine in browser-use's default profile directory. The agent does NOT attempt automated login — it uses browser-use's cookie persistence from previous manual sessions. If you haven't logged into a forum manually via browser-use before, the agent won't have access. No credentials are stored or transmitted by this skill.Filesystem writes:
Research output is written to research/[topic]-[date].md in your agent's working directory. Progressive writes happen every 3-5 tool calls as crash recovery. Files stay on your local machine — nothing is uploaded. If running in a shared environment, configure your agent's working directory to a safe location.
Sub-agent prompt injection surface: The skill mandates pasting full instructions into sub-agent task prompts. This means the entire SKILL.md (including your research query) is sent to whatever model provider your sub-agent uses. If you use external/remote model providers, be aware that your research queries and the full skill text are transmitted to those services. This is standard for any agent skill that delegates to sub-agents — but worth noting if your research topics are sensitive.
Recommended for first-time users:
- Run in Interactive mode (not Autonomous) until you're comfortable with the skill's behavior
- Start with a short duration (10 min) to see how it works
- Review the research file output before trusting findings
Activation
When the user invokes this skill (e.g. "do long research", "research X", "pull up long research"), immediately start the pre-flight checklist. Do NOT ask "what topic?" separately — go straight into the pre-flight questions. If the user already provided a topic in their message, pre-fill it and ask the remaining questions.
Pre-Flight Checklist (MANDATORY — NO EXCEPTIONS)
Before starting ANY research, confirm all of these with the user:
- Topic — What exactly to research. Get specific. If not provided yet, ask now.
- Duration — How long to spend. Minimum 10 minutes, can be hours.
- Autonomy — Choose one. Default: Autonomous.
- Autonomous — No questions, log everything, report at end.
- Interactive — May ask clarifying questions during research. Pauses for answers.
- Interactive-continue — May ask clarifying questions, but if user doesn't reply within ~2 minutes, continue research with best judgment. Never block on silence.
- Tools — List in priority order (highest priority first). The plan block MUST reflect this ordering. Default priority:
- web_search — discovery, overviews, forum consensus
- web_fetch — reading specific review/forum/article pages
- browser-use (REQUIRED dependency) — retailer pricing (Amazon, etc.), JS-heavy sites, anything web_fetch can't reach. See
references/browser-use-patterns.md
- Scheduling — Now or delayed? If delayed, note time + timezone.
- Output — Where to deliver (this chat, specific topic, file only) and format (summary, full report, comparison table).
- Clarifying questions — Ask anything about the user's situation BEFORE starting. Don't assume hardware, location, budget, preferences, etc.
Post this and wait for approval:
📋 Research Plan
• Topic: [what]
• Duration: [X] minutes/hours
• Mode: Autonomous / Interactive / Interactive-continue
• Tools: [list in priority order, e.g. web_search → web_fetch → browser-use]
• Start: Now / [scheduled time]
• Output: [where and format]
• Questions: [anything unclear about user's situation]
Proceed? (yes to start)
If the user answers clarifying questions but doesn't say "proceed/yes/go", re-post the updated plan and ask again.
Spawning (CRITICAL — read this before delegating to a sub-agent)
⛔ The sub-agent MUST have the full skill instructions in its task prompt. The root cause of most research failures is the orchestrator paraphrasing the skill instead of injecting it. A sub-agent that doesn't read this file will ignore every gate, every ratio, every enforcement rule.
How to spawn a research sub-agent:
- Read this entire SKILL.md into context (you're already doing this if you're reading this).
- Construct the task prompt with these MANDATORY sections in this order:
TASK PROMPT TEMPLATE (paste in this order — critical rules at TOP and BOTTOM):
─────────────────────────────────────────────────────────────────────────────
## ⛔ CRITICAL RULES (read first, enforce always)
1. READ > SEARCH. After Seed, you cannot search unless you've read something since your last search.
2. NEVER start synthesis while time remains. Run `date +%s` and check END_TS.
3. browser-use cascade: chromium → real → remote. Try ALL 3 before giving up on any source.
4. web_search returns URLS, not answers. Ignore Perplexity's synthesis text. Extract only the citation URLs.
5. "Not found" IS a valid finding. If you can't find what was asked, say so honestly. Do NOT answer an adjacent question.
## Research Task
[The user's EXACT question, quoted verbatim. Do not paraphrase, soften, or broaden it.]
## Task Anchor (re-read every 5 tool calls)
Your job is to answer THIS question: "[exact question again]"
Every 5 tool calls, ask yourself: "Does my last action directly serve THIS question?"
If no → STOP current branch → re-orient.
## User Context
[Any relevant context — location, timeline, constraints. Keep separate from the task.]
## Research Rules
[Paste the FULL Execution section from SKILL.md — the complete Read-Driven Loop,
browser-use Cascade, Forum-Cracking Playbook, Progressive Writing, Time Enforcement,
Negative Results, Self-Audit, etc.]
## ⛔ REMINDER (read last, enforce always)
- You MUST read more pages than you search. Track: reads vs searches.
- You MUST try all 3 browser modes before giving up.
- You MUST NOT start synthesis while time remains.
- You MUST deliver honest negative results, not drift to adjacent questions.
- You MUST run the self-audit before delivering.
─────────────────────────────────────────────────────────────────────────────
- Never summarize the rules. Paste them. The sub-agent needs the actual enforcement language.
- Quote the user's question verbatim in both Task and Task Anchor. Do not rephrase "find real life experience where X happened" into "research whether X is common."
- Put critical rules at both TOP and BOTTOM of the prompt — primacy and recency effects in attention mean middle sections get ignored.
Common spawning mistakes:
- ❌ "Research X for 10 minutes following the long-research skill" — the sub-agent doesn't have the skill
- ❌ Paraphrasing the question — changes what the agent optimizes for
- ❌ Adding your own interpretation — "Focus on category A" when user said "any category"
- ❌ Omitting browser-use cascade details — agent won't know to retry
- ❌ Rules only in the middle of a long prompt — agent attention drops off
- ✅ Critical rules at top AND bottom + full rules in middle + exact question quoted
Execution
Where to Run
- Dedicated topic or sub-agent — never in main session
- Sub-agent for fully autonomous tasks
- Dedicated topic if interactive (so user can engage)
The Read-Driven Loop (replaces the old "Execution Flow")
The fundamental architecture: reading pages is the default action. Searching is only allowed when the URL queue is empty. This prevents the search-addiction pattern where agents fire 15 searches and read 3 pages.
1. SETUP
├─ START_TS=$(date +%s)
├─ END_TS=$((START_TS + DURATION_SECONDS))
├─ SEARCH_COUNT=0, READ_COUNT=0, WRITE_COUNT=0
├─ Create URL_QUEUE (empty list)
└─ Post: "🚀 Research started. Deadline: $(date -d @$END_TS '+%H:%M:%S')"
2. SURVEY PHASE (first 30% of committed time — BREADTH before depth)
├─ Goal: IDENTIFY as many candidates/angles as possible. Do NOT deep-dive yet.
├─ Run 3-5 broad searches with different angles/terminology
├─ For each search: extract citation URLs + candidate names/items mentioned
├─ Build a CANDIDATE LIST: every distinct option/entity mentioned across all sources
├─ ⛔ GATE: Write #1 to file (Task Anchor + FULL candidate list + URL queue)
├─ CANDIDATE GATE: You MUST have identified ≥10 candidates before deep-diving ANY of them.
│ If topic naturally has fewer (e.g., "best 3 X"), adjust to ≥2x the expected answer count.
├─ WRITE_COUNT += 1
└─ Post: "📋 Survey complete. [N] candidates identified: [list]. Now deep-diving."
Example: Don't stop at the first 3 options you find. If researching [Topic], survey should
uncover at least 10-20 candidates across different categories, brands, or approaches.
THEN prioritize which to deep-dive based on relevance to the question.
3. MAIN LOOP (repeat until time runs out)
│
├─ IF URL_QUEUE is not empty:
│ ├─ Pop next URL from queue
│ ├─ RELEVANCE CHECK: Is this URL likely to contain what the Task Anchor asks for?
│ │ If clearly irrelevant (e.g., beginner guide when searching for expert anecdotes): SKIP, log as "pruned: [reason]"
│ ├─ READ it (web_fetch first, browser-use cascade if fails)
│ ├─ After reading, assess: "Did this page help answer the Task Anchor?"
│ │ If yes: READ_COUNT += 1 (productive read)
│ │ If no: DEAD_END_COUNT += 1 (log as dead end, don't count as productive)
│ ├─ Extract: findings + new URLs → add new URLs to queue
│ ├─ If read spawns a question that needs a NEW search:
│ │ └─ ALLOWED (because you just read something)
│ │ ├─ web_search → extract citation URLs → add to queue
│ │ └─ SEARCH_COUNT += 1
│ └─ Continue to next iteration
│
├─ IF URL_QUEUE is empty AND you need more:
│ ├─ web_search with NEW terms (not repeating old queries)
│ ├─ SEARCH_COUNT += 1
│ ├─ Extract URLs → add to queue
│ └─ Go back to reading from queue
│
├─ Every 3-5 tool calls: write findings to file, WRITE_COUNT += 1
│
├─ Every 5 tool calls:
│ ├─ Re-read Task Anchor — am I still on-target?
│ ├─ Run: NOW=$(date +%s); REMAINING=$((END_TS - NOW))
│ ├─ Post: ⏱️ elapsed/committed (remaining) | 🔍 S | 📄 R | 📝 W | 🪦 D (dead ends) | R>S? yes/no
│ └─ Verify: READ_COUNT > SEARCH_COUNT? If not, STOP searching, READ.
│
├─ ⛔ BROWSER-USE DEADLINE: By 50% of committed time, you MUST have attempted
│ at least 1 browser-use call. If you reach the halfway mark without one,
│ STOP everything and run a browser-use cascade on your most promising blocked URL.
│
├─ ⛔ NO IDLE WAITING — NEVER USE `sleep` EXCEPT FOR browser-use PAGE LOADS (max 3s):
│ If you have time remaining, you MUST be making productive tool calls.
│ Running `sleep 60` or any sleep > 3s to fill time is a HARD FAIL.
│
│ Before you even think about sleeping, run this MANDATORY checklist:
│ □ Have I tried the top 3 forums for this domain?
│ □ Have I tried old.reddit.com web_fetch on at least 1 subreddit?
│ □ Have I tried Google cache on at least 1 blocked URL?
│ □ Have I searched YouTube comments?
│ □ Have I searched in non-English languages (if relevant)?
│ □ Have I tried older threads (2020, 2021, 2022)?
│ □ Have I tried alternative community forums beyond the obvious ones?
│ □ Have I cross-verified my existing findings with a second source?
│ If ANY box is unchecked → DO THAT instead of sleeping.
│
└─ TIME CHECK: if $(date +%s) >= END_TS → exit loop, go to SYNTHESIS
4. SYNTHESIS (only after time is up)
├─ ⛔ GATE: Run date +%s. Print result. If NOW < END_TS, go back to step 3.
├─ Write final version of research file
├─ Include: Task Anchor, Executive Summary, Research Tree, Findings, Sources, Verification
└─ WRITE_COUNT += 1
5. SELF-AUDIT (hard gate — blocks delivery)
├─ Run every checklist item (see Self-Audit section below)
├─ Fix any failures
└─ Write Process Compliance section
6. DELIVER
├─ Chat summary (<500 words)
├─ Link to research file
└─ Final elapsed time from date +%s
Why this works better than a ratio rule:
- The ratio is architecturally enforced: you can only search when the queue is empty OR after completing a read.
- Searching is the exception, reading is the default.
- The agent can't chain 5 searches in a row because the loop structure doesn't allow it.
- The READ_COUNT > SEARCH_COUNT check every 5 calls catches any drift.
The Search Discipline
web_search returns URLs, not answers.
When web_search returns a response from Perplexity/Sonar:
- Extract ONLY the citation URLs from the response.
- Add those URLs to your queue.
- Ignore the synthesized text. Do not read it. Do not cite it. Do not base any finding on it.
After Seed phase, you earn each search by reading:
- ✅ READ → SEARCH → READ → SEARCH (alternating)
- ✅ READ → READ → READ → SEARCH (reading more is always fine)
- ❌ SEARCH → SEARCH (never, unless in Seed)
- ❌ SEARCH → SEARCH → READ (too late, you already chained)
PDF URLs: If a URL ends in .pdf, use browser-use to render it. web_fetch returns binary garbage for PDFs. Skip PDFs entirely if browser-use is unavailable.
Source Balance: User Reports vs Official Research (MANDATORY)
For grey-area topics (supplements, nootropics, off-label treatments, emerging tech, anything without large studies or established consensus):
- User reports ARE the data. The person on a forum who's been using a product for 6 months IS a primary source.
- Hard rule: ≥40% of productive reads must be forum/user-report sources (not academic papers, not marketing sites, not review articles).
- Track: FORUM_READ_COUNT separately from STUDY_READ_COUNT.
- At every 5-call checkpoint:
Forum reads / Total reads >= 0.4?If not → next reads MUST be forums. - Forums to prioritize: Reddit (relevant subreddits), StackExchange, specialized community forums for the topic, niche discussion boards, practitioner blogs with comment sections
- Official research takes priority. Forum reads still valuable but ratio can be 30/70.
Adversarial Thinking (MANDATORY — prevents confirmation bias)
For EVERY candidate/option the research identifies as "good" or "recommended":
- Search for the negative case. Literally search:
"[candidate] dangers" OR "[candidate] problems" OR "[candidate] risks" OR "[candidate] didn't work" - Follow the mechanism chain. If something works by mechanism X, ask: "What ELSE does mechanism X do? Could it cause harm?"
- Example: If X promotes growth/healing via mechanism Y → Does Y also have unwanted side effects? Search it.
- Example: If X improves metric A → Does it also worsen metric B? Search it.
- Example: If X suppresses process P → Does that suppression cause downstream problems? Search it.
- Devil's advocate search quota: At least 1 adversarial search per 3 candidates identified. If you found 9 options, you need ≥3 "what could go wrong" searches.
- Write a "Risks & Concerns" section that is at LEAST 25% the length of the "Benefits" section. If your benefits section is 2000 words and risks is 200 words, you're not being honest.
Task Anchoring (MANDATORY — prevents drift)
The #1 failure mode is task drift — answering an adjacent, easier question.
Mechanism:
- Write the user's exact question (verbatim) at the top of the research file under
## Task Anchor. - Every 5 tool calls, re-read it and ask: "Does my last action directly help answer THIS question?"
- If no → STOP the current branch → re-orient.
- User asks: "find real life experience where [specific event] happened"
- ❌ Drift: researching general requirements (answers "what's needed" not "who experienced it")
- ❌ Drift: reading official policy documents (answers "what's the rule" not "what happened")
- ✅ On-target: finding a forum thread where someone says "this happened to me: [specific event]"
Negative Results (MANDATORY — don't panic-pivot)
Sometimes the answer is: "I looked hard and didn't find it."
This is a VALID research outcome. Do NOT:
- ❌ Pivot to answering an adjacent question to have "something to show"
- ❌ Present policy/official guidance as a substitute for the anecdotes you couldn't find
- ❌ Inflate weak leads into findings
- ❌ Quote Perplexity synthesis as if it were a real source
- ✅ Document exactly what you searched for and where (queries + forums + pages read)
- ✅ State clearly: "After [N] searches and [M] page reads across [forums], I found zero anecdotes matching [exact question]"
- ✅ Report adjacent findings separately: "While I didn't find X, I did find Y which is related but distinct"
- ✅ Assess WHY you might not have found it: rare event? wrong search terms? content behind paywalls? topic not discussed publicly?
- ✅ Suggest what would need to happen to find it: "This might require searching [specific forum] with a logged-in account" or "This may be too rare for public discussion"
Tool Rules
General
- Retailer sites (Amazon, Best Buy, etc.): ALWAYS browser-use with
--profile. web_fetch cannot scrape Amazon. - Forums/reviews: web_fetch first, browser-use cascade if blocked.
- Discovery: web_search (for URLs, not answers).
- browser-use is a hard requirement. If not installed:
pip install browser-use && browser-use install.
browser-use Cascade (try ALL 3 — no exceptions)
browser-use has 3 browser modes. You MUST try them in order. Do NOT give up after one failure.
Use this exact bash pattern:
URL="https://example.com"
SESSION="research"
# Attempt 1: chromium (free)
echo ">>> Trying chromium..."
browser-use --session $SESSION --browser chromium open "$URL" 2>&1 | tail -5
# If "url:" appears in output → SUCCESS
# Attempt 2: real (free)
echo ">>> Trying real..."
browser-use --session $SESSION --browser real open "$URL" 2>&1 | tail -5
# Attempt 3: remote (paid, last resort)
echo ">>> Trying remote..."
browser-use --session $SESSION --browser remote open "$URL" 2>&1 | tail -5
Run all 3 in sequence. Stop at the first success. Log all attempts:
[BROWSER CASCADE: {url}]
chromium: SUCCESS/FAIL (reason)
real: SUCCESS/FAIL (reason)
remote: SUCCESS/FAIL (reason)
⛔ If you get an import error (e.g., BrowserConfig not found), that's a code bug, not a site block. Fix the code (use BrowserProfile) and retry. Do NOT count code bugs as cascade failures.
⛔ NEVER give up after 1 failure. NEVER skip to "find alternative source" without trying all 3 modes. If you log only 1 attempt, the self-audit will fail you.
web_fetch Failure Escalation
When web_fetch returns 403, empty content, Cloudflare, or broken HTML:- Run the full browser-use cascade (all 3 modes)
- Log:
[web_fetch BLOCKED: {url} → cascade: chromium=X, real=Y, remote=Z] - Only after all 3 fail: find an alternative source (but log that you tried)
Forum-Cracking Playbook
Forums are where real experiences live. They're also the hardest to access. Use these strategies IN ORDER:
Reddit (NOTE: as of 2026, Reddit blocks most automated access including old.reddit.com):
old.reddit.comvia web_fetch (try first but expect 403)- Google cache: web_search
cache:reddit.com/r/[sub]/comments/[id] - Reddit aggregation sites that compile Reddit reviews by topic
- browser-use cascade on
reddit.com(last resort) - web_search
site:reddit.com "[exact phrase]"— use citation URLs, NOT Perplexity synthesis - Fallback forums when Reddit fails: StackExchange, specialized community forums for the topic, niche discussion boards
- web_fetch (sometimes works for newer threads)
- browser-use chromium + JavaScript eval to extract post content:
browser-use --session forum --browser chromium open "https://forum.example.com/thread/[ID]"
# Wait 2s for page load
browser-use --session forum --browser chromium eval "Array.from(document.querySelectorAll('[data-role=commentContent], .post-body, .message-content')).map((e,i) => 'POST ' + i + ': ' + e.innerText.substring(0,600)).join('\n===\n')"
- This eval pattern extracts post content from login-gated pages — adapt selectors per forum.
- web_fetch (may fail — if so, skip to browser-use)
- browser-use cascade
- Google cache
- web_fetch on the video page (gets description + some comments)
- browser-use for full comment sections
- Identify the top 3-5 forums for the research topic via web_search
- web_fetch with /threads/ or /topic/ URLs, search via
site:[forum-domain] - browser-use chromium for search, eval to extract post content
- Look for practitioner blogs, review aggregation sites, and niche wikis
- Academic databases for study abstracts when relevant (web_fetch works for most)
- Use browser-use eval to extract product catalogs with prices
- Example:
document.querySelectorAll('.product-miniature')→ map name + price - Always capture full catalog, not just target products
- Google
cache:[url]via web_fetch - Wayback Machine:
https://web.archive.org/web/[url] - Google
site:[domain] "[exact phrase]"to find alternative pages on the same site
Anecdote-Hunting Searches (for "find real experiences" tasks)
When the user asks for real-life experiences, first-person accounts, or anecdotes:
Use personal-language queries:
"my experience with" [topic] site:[relevant-forum]
"I tried" OR "I switched to" OR "after using" [topic] forum
"my experience" [topic] "what happened" forum
"I had to" OR "they told me" OR "turns out" [topic keywords]
DO NOT use policy-language queries:
"[topic] requirements overview" ← returns official pages, not stories
"what is needed for [topic]" ← returns guides, not experiences
Source priority for anecdotes:
- Specialized forums for the topic (use browser-use eval pattern if login-gated)
- Reddit (old.reddit.com → Google cache → browser-use)
- Niche community forums and discussion boards
- Q&A sites (real expert answers to real questions)
- YouTube comments
- Google
site:searches with first-person language
Source Verification (MANDATORY)
- Never trust Perplexity/Sonar synthesis. Extract URLs only. Ignore the text.
- When Sonar claims "Reddit users say X": treat it as a LEAD to find URLs, not a source.
- Real user quotes must come from pages you actually fetched.
- If you can't verify a forum claim: mark it
[⚠️ unverified forum consensus] - ⛔ FIREWALL: Unverified claims CANNOT appear in Recommendation or Executive Summary. They can appear in Detailed Findings with ⚠️ tag only.
Progressive Writing (MANDATORY — even for short sessions)
⛔ GATE: First write MUST happen after Seed phase. You CANNOT proceed to tree traversal until you've written initial findings, URL queue, and Task Anchor to the research file.
- Write #1 (GATE): After Seed — Task Anchor, URL queue, search terms. No more tool calls until this write lands.
- Subsequent writes: Every 3-5 tool calls — append new findings in batches.
- Final write: Synthesis phase — executive summary, tree, recommendation.
- ⛔ NEVER accumulate all findings and write once at the end. If session crashes, all data is lost.
- ≤15 min sessions: minimum 3 writes. >15 min: minimum 1 write per 5 minutes.
- Count writes. Every status update:
📝 [N] file writes. If N=0 after 4+ tool calls → STOP and write NOW.
Time Enforcement (WALL-CLOCK — NO GUESSING)
Setup:
START_TS=$(date +%s)
END_TS=$((START_TS + DURATION_SECONDS))
echo "START_TS=$START_TS" > /tmp/research_time
echo "END_TS=$END_TS" >> /tmp/research_time
echo "Deadline: $(date -d @$END_TS '+%H:%M:%S UTC')"
Every status check:
source /tmp/research_time
NOW=$(date +%s)
echo "Elapsed: $(( (NOW - START_TS) / 60 )) min | Remaining: $(( (END_TS - NOW) / 60 )) min"
Status format:
⏱️ [elapsed] / [committed] ([remaining] left) | 🔍 [S] searches | 📄 [R] reads | 📝 [W] writes | R>S? [yes/no]
The R>S? field: reads must exceed searches. If it says "no", stop searching and read.
Synthesis Gate (HARD — prevents finishing early)
⛔ Before starting synthesis, you MUST run this check:
source /tmp/research_time
NOW=$(date +%s)
if [ $NOW -lt $END_TS ]; then
echo "BLOCKED: $((( END_TS - NOW ) / 60)) minutes remaining. Go back to research."
else
echo "TIME UP: Proceed to synthesis."
fi
If the check prints "BLOCKED": You CANNOT synthesize. Go back to the Main Loop. Use the remaining time for:
- New search terms (synonyms, adjacent topics, different languages)
- Older threads (add year ranges: "2020", "2021", etc.)
- Adjacent communities (different forums)
- Cross-verifying existing findings with additional sources
- Following up on leads you deprioritized earlier
- YouTube comments on relevant videos
Context Management (see references/context-management.md)
- Write findings to
research/[topic]-[date].mdprogressively. - Chat context gets summaries only — detail goes to file.
- Write checkpoints every 5-10 minutes (recovery anchors after compaction).
- Long sessions WILL compact — that's expected and fine.
- Never stop research due to context size.
Pricing Data (when applicable)
If the research involves products/purchases:- MUST hit at least one retailer via browser-use for real pricing.
- web_search price summaries are estimates, NOT verified.
- Note currency, tax inclusion (e.g., regional pricing differences such as VAT inclusion vs exclusion), date of check.
Output
Research File (research/[topic]-[date].md)
# [Topic] Research
**Date:** [timestamp]
**Duration:** [actual time spent]
**Tools used:** [list]
**Searches:** [count] | **Deep reads:** [count] | **Ratio R:S:** [ratio]
## Task Anchor
> [User's exact question, quoted verbatim]
## Executive Summary
[3-5 bullet points — ONLY ✅ verified findings]
[If negative result: lead with "After X searches and Y page reads, no [specific thing] was found."]
## Research Tree
[SEARCH → READ → spawned → READ → etc. Shows traversal.]
## Detailed Findings
### [Category 1]
[findings with source URLs, ✅/⚠️ markers]
## Comparison Table (if applicable)
## Sources
[numbered list of URLs actually fetched — NOT search result URLs]
## Source Verification
[✅ verified (fetched actual page) | ⚠️ unverified (search synthesis) | 🔗 direct quote]
## Confidence Notes
[what's solid vs uncertain]
## Recommendation
[ONLY ✅ verified sources. If negative result, recommend next steps to find the answer.]
## Process Compliance
[Self-audit pass/fail for each gate + remediation taken]
Chat Summary
Post concise summary (<500 words) when done. Link to full research file.Post-Research Self-Audit (HARD GATE — blocks delivery)
⛔ You CANNOT post the chat summary until every box is checked. Run the audit, fix failures, THEN deliver.
Process Gates
- [ ] START_TS and END_TS set with
date +%s - [ ] All time reports used real
date +%s(no guessing) - [ ] Time commitment honored — verify: final
date +%s≥ END_TS - [ ] Synthesis Gate bash check was run and printed "TIME UP"
- [ ] Task Anchor written to file and re-checked every 5 tool calls
- [ ] No idle waiting — no sleep loops or timer countdowns. Every minute had active tool calls.
Read/Search Balance
- [ ] READ_COUNT > SEARCH_COUNT — post actual numbers:
reads: [R] / searches: [S] - [ ] No SEARCH→SEARCH chains outside Seed phase
- [ ] Every search after Seed was preceded by at least 1 read
- [ ] Dead ends tracked — reads on irrelevant pages logged as dead ends, not counted as productive reads. Post:
🪦 [D] dead ends
Writing Gates
- [ ] Write #1 happened after Seed (before tree traversal)
- [ ] ≥3 writes for ≤15 min sessions, or 1/5min for longer
- [ ] Research tree logged in file
- [ ] Every status update included
📝 [N]andR>S? [yes/no]
Tool Usage Gates (EVIDENCE REQUIRED — not honor system)
⛔ For each browser-use gate, you must paste the actual command and its output. Checking a box without evidence is audit fraud.- [ ] browser-use cascade attempted — PASTE the 3 commands and outputs below:
chromium attempt: [paste command + result]
real attempt: [paste command + result]
remote attempt: [paste command + result]
- [ ] browser-use attempted before 50% of committed time (paste timestamp proof)
- [ ] Every web_fetch failure escalated to full cascade — list each 403 and what you did:
[url] → 403 → cascade: chromium=[result], real=[result], remote=[result]
- [ ] Reddit 403 → tried old.reddit.com first (paste the web_fetch attempt), then cascade
- [ ] At least 1 successful browser-use interaction in the session
Source Quality Gates
- [ ] Zero Perplexity synthesis text in Executive Summary or Recommendation
- [ ] All sources have URLs from pages actually fetched
- [ ] Claims marked ✅ verified / ⚠️ unverified — none unmarked
- [ ] Recommendations backed ONLY by ✅ verified sources
Task Fidelity Gate
- [ ] Re-read Task Anchor. Does the Executive Summary directly answer the exact question?
- [ ] If user asked for anecdotes: report contains direct quotes with usernames + thread URLs
- [ ] If mostly policy/guidance when anecdotes were asked: FAIL → go find stories
- [ ] If negative result: summary LEADS with the negative, doesn't bury it under adjacent findings
Read Quality Gate
- [ ] Count reads that actually helped answer the Task Anchor vs dead-end reads
- [ ] At least 50% of reads must be on-topic (relevant to the actual question)
- [ ] Dead ends logged with reason:
🪦 [url] — [why irrelevant] - [ ] URL queue was filtered for relevance before reading (didn't blindly read every Perplexity citation)
Survey Breadth Gate
- [ ] Survey Phase completed in first 30% of time
- [ ] Candidate list has ≥10 items (or ≥2x expected answer count for smaller topics)
- [ ] Candidates written to file BEFORE deep-diving any single one
- [ ] Deep-dive prioritization justified (why these candidates over others)
Source Balance Gate (for grey-area topics)
- [ ] FORUM_READ_COUNT / total reads ≥ 0.4 (40% user reports)
- [ ] Post actual counts:
Forum reads: [F] / Study reads: [S] / Ratio: [F/(F+S)] - [ ] Real user quotes with usernames included in findings
- [ ] If ratio < 0.4: explain why (e.g., "no forums exist for this topic")
Adversarial Thinking Gate
- [ ] At least 1 adversarial search per 3 candidates (e.g., 9 candidates = ≥3 "dangers/risks" searches)
- [ ] Each recommended candidate has a "Risks & Concerns" subsection
- [ ] Mechanism chains explored (if X promotes Y for benefit, does Y also cause Z harm?)
- [ ] Risks section is ≥25% the length of Benefits section
- [ ] Post:
Devil's advocate searches: [N] / Candidates: [M] / Ratio: [N/ceil(M/3)]
Remediation (fix before delivering)
| Failure | Fix |
|---|---|
| READ_COUNT ≤ SEARCH_COUNT | Read more pages now |
| Missing writes | Write current findings immediately |
| Unverified claims in recommendation | Remove or verify them |
| Time not honored | Continue researching |
| browser-use cascade incomplete | Run remaining modes now |
| Task drift | Re-search with original question terms |
| No anecdotes when requested | Forum searches with first-person language |
| Only 1-2 browser modes tried | Try the remaining modes on any pending source |
Anti-Patterns
Architecture violations
- ❌ Search addiction — firing web_search to "see what comes up" instead of reading pages in the queue. The queue is your job. Search only when it's empty or a read spawns a new question.
- ❌ Reading Perplexity synthesis — the synthesized text from web_search is noise. Extract URLs. Ignore text. If your findings cite "according to Perplexity..." you've failed.
- ❌ SEARCH→SEARCH chains after Seed — architecturally forbidden. Each search must be earned by a read.
- ❌ Finishing early — the Synthesis Gate bash check prevents this. Run it. TIME IS A HARD CONTRACT. If committed to 10 minutes, you work for 10 minutes. Finishing at 4 min on a 10-min commitment = CRITICAL FAILURE. If you think you're "done" early, you're NOT — run the mandatory checklist, search deeper, cross-verify, find more threads. Check
date +%sagainst END_TS before ANY final output. - ❌ Idle waiting / sleep commands — NEVER use
sleepfor more than 3 seconds (browser-use page loads only). Runningsleep 60to fill remaining time is a HARD FAIL that the audit WILL catch. The checklist of "things to do when done" is mandatory — run through it before claiming you're out of options. - ❌ Garbage reads — reading irrelevant pages just to inflate READ_COUNT. A beginner tutorial doesn't count as a "read" when researching expert-level anecdotes. Filter the URL queue for relevance BEFORE reading. Dead-end reads must be logged as dead ends, not productive reads.
- ❌ Audit fraud — checking self-audit boxes for things you didn't do. If the audit says "browser-use cascade attempted" but you made zero browser-use calls, that's fabrication. The audit requires EVIDENCE (paste commands + outputs), not just checkboxes.
Content violations
- ❌ Answering a different question — if user wants anecdotes, don't deliver policy summaries. Re-read the Task Anchor.
- ❌ Panic-pivoting on negative results — "not found" is valid. Don't inflate weak leads.
- ❌ Unverified claims in recommendations — if you didn't fetch the page, it can't go in the recommendation.
- ❌ Marketing claims as benchmarks — distinguish clearly.
- ❌ Premature narrowing — picking 3-5 candidates in Seed and ignoring everything else. The Survey Phase exists to prevent this. You MUST cast a wide net before deep-diving. If you're deep-diving a candidate before identifying ≥10 options, you've narrowed too early.
- ❌ All-research, no-users — citing 10 academic papers and 0 forum threads. For grey-area topics, user reports are EQUALLY valuable. The 40% forum-read quota exists for a reason.
- ❌ Confirmation bias — only searching for why something works, never for why it might harm. The Adversarial Thinking section is mandatory. Every "this is great" needs a "but here's what could go wrong" search.
Tool violations
- ❌ Giving up after 1 browser-use failure — try ALL 3 modes. Log all attempts.
- ❌ Silently skipping blocked sources — web_fetch 403 → full cascade, always.
- ❌ web_fetch on Amazon — use browser-use with
--profile. - ❌ Zero browser-use calls — must attempt at least once per session.
Process violations
- ❌ Running in main session — always dedicated topic or sub-agent.
- ❌ Spawning sub-agent without full rules — paraphrasing = agent ignores rules. Paste them.
- ❌ Paraphrasing user's question — quote verbatim. "Find real experiences" ≠ "research whether common".
- ❌ Single write at end — progressive writes mandatory. Session crashes lose everything.
- ❌ Fabricating elapsed time — every timestamp from
date +%s. No guessing. - ❌ Skipping self-audit — hard gate. No exceptions.
- ❌ Skipping pre-flight — plan block must be posted and approved.
Installation
openclaw install long-research
💻Code Examples
Post this and wait for approval:
📋 Research Plan
• Topic: [what]
• Duration: [X] minutes/hours
• Mode: Autonomous / Interactive / Interactive-continue
• Tools: [list in priority order, e.g. web_search → web_fetch → browser-use]
• Start: Now / [scheduled time]
• Output: [where and format]
• Questions: [anything unclear about user's situation]
Proceed? (yes to start)─────────────────────────────────────────────────────────────────────────────
3. **Never summarize the rules.** Paste them. The sub-agent needs the actual enforcement language.
4. **Quote the user's question verbatim** in both Task and Task Anchor. Do not rephrase "find real life experience where X happened" into "research whether X is common."
5. **Put critical rules at both TOP and BOTTOM** of the prompt — primacy and recency effects in attention mean middle sections get ignored.
### Common spawning mistakes:
- ❌ "Research X for 10 minutes following the long-research skill" — the sub-agent doesn't have the skill
- ❌ Paraphrasing the question — changes what the agent optimizes for
- ❌ Adding your own interpretation — "Focus on category A" when user said "any category"
- ❌ Omitting browser-use cascade details — agent won't know to retry
- ❌ Rules only in the middle of a long prompt — agent attention drops off
- ✅ Critical rules at top AND bottom + full rules in middle + exact question quoted
---
## Execution
### Where to Run
- **Dedicated topic or sub-agent** — never in main session
- Sub-agent for fully autonomous tasks
- Dedicated topic if interactive (so user can engage)
### The Read-Driven Loop (replaces the old "Execution Flow")
The fundamental architecture: **reading pages is the default action. Searching is only allowed when the URL queue is empty.** This prevents the search-addiction pattern where agents fire 15 searches and read 3 pages.└─ Final elapsed time from date +%s
**Why this works better than a ratio rule:**
- The ratio is *architecturally enforced*: you can only search when the queue is empty OR after completing a read.
- Searching is the exception, reading is the default.
- The agent can't chain 5 searches in a row because the loop structure doesn't allow it.
- The READ_COUNT > SEARCH_COUNT check every 5 calls catches any drift.
### The Search Discipline
**web_search returns URLs, not answers.**
When web_search returns a response from Perplexity/Sonar:
1. **Extract ONLY the citation URLs** from the response.
2. **Add those URLs to your queue.**
3. **Ignore the synthesized text.** Do not read it. Do not cite it. Do not base any finding on it.
Why: Perplexity fabricates quotes, hallucinates thread URLs, and presents unverified claims as consensus. If you read its synthesis, you'll unconsciously treat it as a finding. Don't read it. URLs only.
**After Seed phase, you earn each search by reading:**
- ✅ READ → SEARCH → READ → SEARCH (alternating)
- ✅ READ → READ → READ → SEARCH (reading more is always fine)
- ❌ SEARCH → SEARCH (never, unless in Seed)
- ❌ SEARCH → SEARCH → READ (too late, you already chained)
**Hard search cap:** Maximum searches = 1.5x committed minutes. For 10 min = max 15 searches.
If you hit the cap, you are BLOCKED from searching for the rest of the session. Read only.
At any checkpoint: if SEARCH_COUNT > 2x READ_COUNT → BLOCKED from searching until reads catch up.
**PDF URLs:** If a URL ends in `.pdf`, use browser-use to render it. web_fetch returns binary garbage for PDFs. Skip PDFs entirely if browser-use is unavailable.
### Source Balance: User Reports vs Official Research (MANDATORY)
**For grey-area topics** (supplements, nootropics, off-label treatments, emerging tech, anything without large studies or established consensus):
- User reports ARE the data. The person on a forum who's been using a product for 6 months IS a primary source.
- **Hard rule: ≥40% of productive reads must be forum/user-report sources** (not academic papers, not marketing sites, not review articles).
- Track: FORUM_READ_COUNT separately from STUDY_READ_COUNT.
- At every 5-call checkpoint: `Forum reads / Total reads >= 0.4?` If not → next reads MUST be forums.
- Forums to prioritize: Reddit (relevant subreddits), StackExchange, specialized community forums for the topic, niche discussion boards, practitioner blogs with comment sections
**For well-studied topics** (established science, mainstream products):
- Official research takes priority. Forum reads still valuable but ratio can be 30/70.
### Adversarial Thinking (MANDATORY — prevents confirmation bias)
For EVERY candidate/option the research identifies as "good" or "recommended":
1. **Search for the negative case.** Literally search: `"[candidate] dangers" OR "[candidate] problems" OR "[candidate] risks" OR "[candidate] didn't work"`
2. **Follow the mechanism chain.** If something works by mechanism X, ask: "What ELSE does mechanism X do? Could it cause harm?"
- Example: If X promotes growth/healing via mechanism Y → Does Y also have unwanted side effects? Search it.
- Example: If X improves metric A → Does it also worsen metric B? Search it.
- Example: If X suppresses process P → Does that suppression cause downstream problems? Search it.
3. **Devil's advocate search quota:** At least 1 adversarial search per 3 candidates identified. If you found 9 options, you need ≥3 "what could go wrong" searches.
4. **Write a "Risks & Concerns" section** that is at LEAST 25% the length of the "Benefits" section. If your benefits section is 2000 words and risks is 200 words, you're not being honest.
### Task Anchoring (MANDATORY — prevents drift)
The #1 failure mode is **task drift** — answering an adjacent, easier question.
**Mechanism:**
1. Write the user's **exact question** (verbatim) at the top of the research file under `## Task Anchor`.
2. Every 5 tool calls, re-read it and ask: "Does my last action directly help answer THIS question?"
3. If no → STOP the current branch → re-orient.
**Examples of drift:**
- User asks: "find real life experience where [specific event] happened"
- ❌ Drift: researching general requirements (answers "what's needed" not "who experienced it")
- ❌ Drift: reading official policy documents (answers "what's the rule" not "what happened")
- ✅ On-target: finding a forum thread where someone says "this happened to me: [specific event]"
**The test:** If your findings say "according to official policy..." instead of "user X on forum Y reported that..." — you've drifted.
### Negative Results (MANDATORY — don't panic-pivot)
Sometimes the answer is: **"I looked hard and didn't find it."**
This is a VALID research outcome. Do NOT:
- ❌ Pivot to answering an adjacent question to have "something to show"
- ❌ Present policy/official guidance as a substitute for the anecdotes you couldn't find
- ❌ Inflate weak leads into findings
- ❌ Quote Perplexity synthesis as if it were a real source
Instead:
- ✅ Document exactly what you searched for and where (queries + forums + pages read)
- ✅ State clearly: "After [N] searches and [M] page reads across [forums], I found zero anecdotes matching [exact question]"
- ✅ Report adjacent findings separately: "While I didn't find X, I did find Y which is related but distinct"
- ✅ Assess WHY you might not have found it: rare event? wrong search terms? content behind paywalls? topic not discussed publicly?
- ✅ Suggest what would need to happen to find it: "This might require searching [specific forum] with a logged-in account" or "This may be too rare for public discussion"
**The summary for a negative result should lead with the negative.** Don't bury it under adjacent findings.
---
## Tool Rules
### General
- **Retailer sites (Amazon, Best Buy, etc.)**: ALWAYS browser-use with `--profile`. web_fetch cannot scrape Amazon.
- **Forums/reviews**: web_fetch first, browser-use cascade if blocked.
- **Discovery**: web_search (for URLs, not answers).
- **browser-use is a hard requirement.** If not installed: `pip install browser-use && browser-use install`.
### browser-use Cascade (try ALL 3 — no exceptions)
browser-use has 3 browser modes. You MUST try them **in order**. Do NOT give up after one failure.
**Use this exact bash pattern:**remote: SUCCESS/FAIL (reason)
⛔ If you get an **import error** (e.g., `BrowserConfig` not found), that's a code bug, not a site block. Fix the code (use `BrowserProfile`) and retry. Do NOT count code bugs as cascade failures.
⛔ NEVER give up after 1 failure. NEVER skip to "find alternative source" without trying all 3 modes. If you log only 1 attempt, the self-audit will fail you.
### web_fetch Failure Escalation
When web_fetch returns 403, empty content, Cloudflare, or broken HTML:
1. Run the **full browser-use cascade** (all 3 modes)
2. Log: `[web_fetch BLOCKED: {url} → cascade: chromium=X, real=Y, remote=Z]`
3. Only after all 3 fail: find an alternative source (but log that you tried)
### Forum-Cracking Playbook
Forums are where real experiences live. They're also the hardest to access. Use these strategies IN ORDER:
**Reddit (NOTE: as of 2026, Reddit blocks most automated access including old.reddit.com):**
1. `old.reddit.com` via web_fetch (try first but expect 403)
2. Google cache: web_search `cache:reddit.com/r/[sub]/comments/[id]`
3. Reddit aggregation sites that compile Reddit reviews by topic
4. browser-use cascade on `reddit.com` (last resort)
5. web_search `site:reddit.com "[exact phrase]"` — use citation URLs, NOT Perplexity synthesis
6. **Fallback forums when Reddit fails:** StackExchange, specialized community forums for the topic, niche discussion boards
**Login-gated forums:**
1. web_fetch (sometimes works for newer threads)
2. browser-use chromium + JavaScript eval to extract post content:**Use personal-language queries:**
"my experience with" [topic] site:[relevant-forum]
"I tried" OR "I switched to" OR "after using" [topic] forum
"my experience" [topic] "what happened" forum
"I had to" OR "they told me" OR "turns out" [topic keywords]**DO NOT use policy-language queries:**
"[topic] requirements overview" ← returns official pages, not stories
"what is needed for [topic]" ← returns guides, not experiences**Setup:**
START_TS=$(date +%s)
END_TS=$((START_TS + DURATION_SECONDS))
echo "START_TS=$START_TS" > /tmp/research_time
echo "END_TS=$END_TS" >> /tmp/research_time
echo "Deadline: $(date -d @$END_TS '+%H:%M:%S UTC')"**Every status check:**
source /tmp/research_time
NOW=$(date +%s)
echo "Elapsed: $(( (NOW - START_TS) / 60 )) min | Remaining: $(( (END_TS - NOW) / 60 )) min"⛔ **Before starting synthesis, you MUST run this check:**
source /tmp/research_time
NOW=$(date +%s)
if [ $NOW -lt $END_TS ]; then
echo "BLOCKED: $((( END_TS - NOW ) / 60)) minutes remaining. Go back to research."
else
echo "TIME UP: Proceed to synthesis."
fi### Research File (`research/[topic]-[date].md`)
# [Topic] Research
**Date:** [timestamp]
**Duration:** [actual time spent]
**Tools used:** [list]
**Searches:** [count] | **Deep reads:** [count] | **Ratio R:S:** [ratio]
## Task Anchor
> [User's exact question, quoted verbatim]
## Executive Summary
[3-5 bullet points — ONLY ✅ verified findings]
[If negative result: lead with "After X searches and Y page reads, no [specific thing] was found."]
## Research Tree
[SEARCH → READ → spawned → READ → etc. Shows traversal.]
## Detailed Findings
### [Category 1]
[findings with source URLs, ✅/⚠️ markers]
## Comparison Table (if applicable)
## Sources
[numbered list of URLs actually fetched — NOT search result URLs]
## Source Verification
[✅ verified (fetched actual page) | ⚠️ unverified (search synthesis) | 🔗 direct quote]
## Confidence Notes
[what's solid vs uncertain]
## Recommendation
[ONLY ✅ verified sources. If negative result, recommend next steps to find the answer.]
## Process Compliance
[Self-audit pass/fail for each gate + remediation taken]Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.