✓ Verified 🌐 Web Scrapers ✓ Enhanced Data

Arxiv Summarizer Orchestrator

End-to-end orchestration skill for periodic arXiv collection and reporting using three sub-skills.

Rating: 4.5 (345 reviews)
Downloads: 2,331 downloads
Version: 1.0.0

Overview

End-to-end orchestration skill for periodic arXiv collection and reporting using three sub-skills.

Complete Documentation

View Source →

name: arxiv-summarizer-orchestrator description: "End-to-end orchestration skill for periodic arXiv collection and reporting using three sub-skills: arxiv-search-collector, arxiv-paper-processor, and arxiv-batch-reporter. Supports manual language control across all markdown outputs and Stage-B processing strategy (subagent_parallel default max 5, or serial)."

ArXiv Summarizer Orchestrator

Run the full pipeline by composing three sub-skills.

Sub-skill Order

arxiv-search-collector

arxiv-paper-processor

arxiv-batch-reporter

Workflow Parameters

language: manual language parameter used by all stages. Default is English when omitted.

paper_processing_mode: subagent_parallel or serial.

max_parallel_papers: default 5 when paper_processing_mode=subagent_parallel.

Workflow

Stage A: Collection Setup + Query Retrieval

Initialize one run with arxiv-search-collector/scripts/init_collection_run.py.

Model generates multiple focused queries from original topic and writes a minimal query_plan.json (label + query only).

Run arxiv-search-collector/scripts/fetch_queries_batch.py with the plan file (recommended).

(Optional fallback) call arxiv-search-collector/scripts/fetch_query_metadata.py manually for one-by-one fetch.

Model reads each indexed query list and decides keep indexes.

Merge selected items with arxiv-search-collector/scripts/merge_selected_papers.py.

If relevance/coverage is still not good, iterate Stage A:

generate another query plan with new labels,

fetch again,

re-merge with --incremental and updated selection-json.

set weak labels to empty keep list ([]) to explicitly drop them.

Pass --language to collector scripts so all generated markdown files in Stage A follow the selected language. Use serial query fetch in Stage A with conservative controls (for example --min-interval-sec 5, --retry-max 4). Default collector settings already include retries/backoff and run-local throttle state (/.runtime/arxiv_api_state.json), so manual tuning is usually unnecessary. Prefer cache reuse (no --force) unless query parameters changed or data refresh is required. Output: one run directory with per-paper metadata subdirectories.

Stage B: Per-paper Artifact Download + Manual Summary

For each paper directory, invoke sub-skill arxiv-paper-processor once and let that skill produce /summary.md. Recommended pre-step for many papers:

Run one batch artifact download before per-paper reading:

bash
python3 arxiv-paper-processor/scripts/download_papers_batch.py \
  --run-dir /path/to/run \
  --artifact source_then_pdf \
  --max-workers 3 \
  --min-interval-sec 5 \
  --language



Per-paper execution steps (inside

arxiv-paper-processor

):

If /summary.md already exists and is complete, skip this paper.

If usable source (source/source_extract/*.tex) or PDF (source/paper.pdf) already exists, skip download.

If artifacts are missing, download source with arxiv-paper-processor/scripts/download_arxiv_source.py.

If source is unusable, download PDF with arxiv-paper-processor/scripts/download_arxiv_pdf.py.

Model reads content and manually writes /summary.md by reference format, in language.



Parallel strategy for many papers:

Default: paper_processing_mode=subagent_parallel with max_parallel_papers=5.

Optional: paper_processing_mode=serial to process one paper at a time.

In parallel mode, run multiple arxiv-paper-processor instances in batches; concurrent papers must not exceed max_parallel_papers.


Wait for one batch to finish before starting the next batch.

In serial mode, run exactly one arxiv-paper-processor instance at a time.


Subagent workers should only own one paper directory each to avoid file conflicts.
Do not use scripts to auto-compose summary text; scripts are download-only tools.

Output: all paper directories contain

summary.md

.

Stage C: Bundle + Final Hierarchical Report

Run arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language .

Model reads summaries_bundle.md and writes collection_report_template.md in base dir.

In template, each paper leaf entry must include one standalone placeholder line: {{ARXIV_BRIEF:}}.

Run arxiv-batch-reporter/scripts/render_collection_report.py to generate final collection_report.md.

Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper summary.md section 10 via script injection.

If

language

 is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language.

Periodic Scheduling

This orchestrator is suitable for cron/scheduled execution in OpenClaw:
Frequency examples: daily, weekly, monthly.

For rolling windows, use lookback (1d, 7d, 30d) when initializing runs.



Output Layout

/--/

task_meta.json, task_meta.md

query_results/, query_selection/

/metadata.md + downloaded source/pdf + summary.md

summaries_bundle.md

collection_report_template.md

final rendered collection report (e.g. collection_report.md)

Use

references/workflow-checklist.md

 as execution checklist.

Related Skills

This is the top-level orchestration skill.

Before using it, install and enable these three sub-skills:

arxiv-search-collector

arxiv-paper-processor

arxiv-batch-reporter



Execution order inside this orchestrator:

arxiv-search-collector (Stage A)

arxiv-paper-processor (Stage B)

arxiv-batch-reporter` (Stage C)

Installation

Terminal bash


openclaw install arxiv-summarizer-orchestrator

Copied!

Related Skills

✓ Verified 💻 Development

4claw

4claw — a moderated imageboard for AI agents.

🧠 Claude-Ready #ai_and-llms

✓ Verified 💻 Development

Aap Passport

Agent Attestation Protocol - The Reverse Turing Test.

🧠 Claude-Ready #ai_and-llms

✓ Verified 💻 Development

Adaptive Suite

A continuously adaptive skill suite that empowers Clawdbot.

🧠 Claude-Ready #ai_and-llms #bot

✓ Verified 💻 Development

Adversarial Prompting

Adversarial analysis to critique, fix.

🧠 Claude-Ready #ai_and-llms

Arxiv Summarizer Orchestrator

Overview

Complete Documentation

ArXiv Summarizer Orchestrator

Sub-skill Order

Workflow Parameters

Workflow

Stage A: Collection Setup + Query Retrieval

Stage B: Per-paper Artifact Download + Manual Summary

Stage C: Bundle + Final Hierarchical Report

Periodic Scheduling

Output Layout

Related Skills

Installation

Tags

Quick Info

Ready to Install?

Resources

Related Skills

4claw

Aap Passport

Adaptive Suite

Adversarial Prompting