✓ Verified 🌐 Web Scrapers ✓ Enhanced Data

Arxiv Paper Processor

Tool-only paper processing skill with a manual language parameter: supports batch artifact download

Rating: 4 (145 reviews)
Downloads: 3,744 downloads
Version: 1.0.0

Overview

Tool-only paper processing skill with a manual language parameter: supports batch artifact download for many papers.

Complete Documentation

View Source →

name: arxiv-paper-processor description: "Tool-only paper processing skill with a manual language parameter: supports batch artifact download for many papers or single-paper download, then the model manually reads source/PDF and writes summary.md in the selected language. Use when per-paper comprehension should be model-driven instead of script-generated."

ArXiv Paper Processor

Use this skill for per-paper manual summarization, with optional batch artifact download.

Single-paper mode: process one paper directory (e.g. //).

Batch predownload mode: process many paper directories under one run dir before writing summaries.

Language Parameter

Use a workflow language parameter (for example English or Chinese) and apply it manually.

The per-paper summary.md must be written in the selected language.

If download scripts are called directly, pass --language for traceability.

Core Principle

Scripts only fetch artifacts. The model performs reading and writing.

Non-negotiable Constraint

Do not generate summary.md by script-based snippet extraction, regex harvesting, or template autofill.

Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments.

Scripts in this skill are only for artifact download (source/pdf) and trace logs.

The final summary.md must come from model-side reading and synthesis of the paper content.

Optional Batch Artifact Download (Many Papers)

Use this first when Stage B has many papers: ``

bash
python3 scripts/download_papers_batch.py \
  --run-dir /path/to/run \
  --artifact source_then_pdf \
  --max-workers 3 \
  --min-interval-sec 5 \
  --language English



Key behavior:

Supports --artifact source, --artifact pdf, or --artifact source_then_pdf (default).

Supports concurrency (--max-workers) and safe throttling/retry (--min-interval-sec, retry args).

Uses run-local throttle state by default (/.runtime/arxiv_download_state.json) to reduce 429 risk.

Skips papers that already have usable source/source_extract/*.tex or existing source/paper.pdf (unless --force).

Resume-friendly: if a paper already has a completed summary.md, you can skip that paper's summary-writing step.

Writes batch log to /download_batch_log.json by default.



Step 1: Download Source (Preferred)

bash
python3 scripts/download_arxiv_source.py \
  --paper-dir /path/to/run/2602.00528 \
  --language English



This writes:

source/source_bundle.bin

source/source_extract/

source/download_source_log.json



If usable source already exists and

--force

 is not set, the script reuses local artifacts.

Step 2: If Needed, Download PDF

bash
python3 scripts/download_arxiv_pdf.py \
  --paper-dir /path/to/run/2602.00528 \
  --language English



This writes:

source/paper.pdf

source/download_pdf_log.json



If PDF already exists and

--force

 is not set, the script reuses local artifacts.

Step 3: Model Reads and Summarizes

If summary.md already exists and follows the required format, skip this paper and mark it complete.

Read metadata.md first.

If source/source_extract/ already exists with readable .tex files, use it directly.

Otherwise, if source/paper.pdf already exists, use PDF directly.


If neither exists, run download scripts (single-paper scripts or batch script) first.

Manually write summary.md in the same paper directory, in the selected language.



Do not rely on rule-based auto summarization.
Do not rely on auto-extracted snippets as the primary writing basis.

Quality Requirement
Every section should include paper-specific details that are traceable to full-text reading.
Section 4/5/10 should reflect concrete method and evaluation details, not generic wording.
If key details are unclear in the source, explicitly note uncertainty instead of guessing.

Match the detail level shown in references/summary-example-en.md and references/summary-example-zh.md.


If your draft is clearly shorter or less specific than the examples, expand it before finishing.

Required Output

/summary.md in fixed section format.

Pay special attention to section ## 10. Brief Conclusion: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details.

In section ## 1. Paper Snapshot, use exact keys: ArXiv ID, Title, Authors, Publish date, Primary category, Reading basis.

Do not use key variants such as Reading source, Author list, Published on, or lowercase key names.

See

references/summary-format.md

 for exact section requirements.

Related Skills

This skill is a sub-skill of

arxiv-summarizer-orchestrator

.

Pipeline position:

Step 1 (upstream): arxiv-search-collector produces the selected paper directories and metadata.

Step 2 (this skill): arxiv-paper-processor downloads artifacts and writes one summary.md per paper.

Step 3 (downstream): arxiv-batch-reporter` uses these per-paper summaries to generate the final collection report.

Use this skill together with Step 1 and Step 3 for full end-to-end execution.

Installation

Terminal bash


openclaw install arxiv-paper-processor

Copied!

Related Skills

✓ Verified 💻 Development

4claw

4claw — a moderated imageboard for AI agents.

🧠 Claude-Ready #ai_and-llms

✓ Verified 💻 Development

Aap Passport

Agent Attestation Protocol - The Reverse Turing Test.

🧠 Claude-Ready #ai_and-llms

✓ Verified 💻 Development

Adaptive Suite

A continuously adaptive skill suite that empowers Clawdbot.

🧠 Claude-Ready #ai_and-llms #bot

✓ Verified 💻 Development

Adversarial Prompting

Adversarial analysis to critique, fix.

🧠 Claude-Ready #ai_and-llms

Arxiv Paper Processor

Overview

Complete Documentation

ArXiv Paper Processor

Language Parameter

Core Principle

Non-negotiable Constraint

Optional Batch Artifact Download (Many Papers)

Step 1: Download Source (Preferred)

Step 2: If Needed, Download PDF

Step 3: Model Reads and Summarizes

Quality Requirement

Required Output

Related Skills

Installation

Tags

Quick Info

Ready to Install?

Resources

Related Skills

4claw

Aap Passport

Adaptive Suite

Adversarial Prompting