✓ Verified
🌐 Web Scrapers
✓ Enhanced Data
Agentic Paper Digest Skill
Fetches and summarizes recent arXiv.
- Rating
- 4.3 (68 reviews)
- Downloads
- 1,106 downloads
- Version
- 1.0.0
Overview
Fetches and summarizes recent arXiv.
Complete Documentation
View Source →
Agentic Paper Digest
When to use
- Fetch a recent paper digest from arXiv and Hugging Face.
- Produce JSON output for downstream agents.
- Run a local API server when a polling workflow is needed.
Prereqs
- Python 3 and network access.
- LLM access via
OPENAI_API_KEYor an OpenAI-compatible provider viaLITELLM_API_BASE+LITELLM_API_KEY. gitis optional for bootstrap; otherwisecurl/wget(or Python) is used to download the repo.
Get the code and install
- Preferred: run the bootstrap helper script. It uses git when available or falls back to a zip download.
bash
bash "{baseDir}/scripts/bootstrap.sh"
- Override the clone location by setting
PROJECT_DIR.
bash
PROJECT_DIR="$HOME/agentic_paper_digest" bash "{baseDir}/scripts/bootstrap.sh"
Run (CLI preferred)
bash
bash "{baseDir}/scripts/run_cli.sh"
- Pass through CLI flags as needed.
bash
bash "{baseDir}/scripts/run_cli.sh" --window-hours 24 --sources arxiv,hf
Run (API optional)
bash
bash "{baseDir}/scripts/run_api.sh"
- Trigger runs and read results.
bash
curl -X POST http://127.0.0.1:8000/api/run
curl http://127.0.0.1:8000/api/status
curl http://127.0.0.1:8000/api/papers
- Stop the API server if needed.
bash
bash "{baseDir}/scripts/stop_api.sh"
Outputs
- CLI
--jsonprintsrun_id,seen,kept,window_start, andwindow_end. - Data store:
data/papers.sqlite3(underPROJECT_DIR). - API:
POST /api/run,GET /api/status,GET /api/papers,GET/POST /api/topics,GET/POST /api/settings.
Configuration
Config files live inPROJECT_DIR/config. Environment variables can be set in the shell or via a .env file. The wrappers here auto-load .env from PROJECT_DIR (override with ENV_FILE=/path/to/.env).Environment (.env or exported vars)
OPENAI_API_KEY: required for OpenAI models (litellm reads this).LITELLM_API_BASE,LITELLM_API_KEY: use an OpenAI-compatible proxy/provider.LITELLM_MODEL_RELEVANCE,LITELLM_MODEL_SUMMARY: models for relevance and summarization (summary defaults to relevance model if unset).LITELLM_TEMPERATURE_RELEVANCE,LITELLM_TEMPERATURE_SUMMARY: lower for more deterministic output.LITELLM_MAX_RETRIES: retry count for LLM calls.LITELLM_DROP_PARAMS=1: drop unsupported params to avoid provider errors.WINDOW_HOURS,APP_TZ: recency window and timezone.ARXIV_CATEGORIES: comma-separated categories (default includescs.CL,cs.AI,cs.LG,stat.ML,cs.CR).ARXIV_API_BASE,HF_API_BASE: override source endpoints if needed.ARXIV_MAX_RESULTS,ARXIV_PAGE_SIZE: arXiv paging limits.MAX_CANDIDATES_PER_SOURCE: cap candidates per source before LLM filtering.FETCH_TIMEOUT_S,REQUEST_TIMEOUT_S: source fetch and per-request timeouts.ENABLE_PDF_TEXT=1: include first-page PDF text in summaries; requiresPyMuPDF(pip install pymupdf).DATA_DIR: location forpapers.sqlite3.CORS_ORIGINS: comma-separated origins allowed by the API server (UI use).- Path overrides:
TOPICS_PATH,SETTINGS_PATH,AFFILIATION_BOOSTS_PATH.
config/topics.json: list of topics withid,label,description,max_per_topic, andkeywords. The relevance classifier must output topic IDs exactly as defined here.max_per_topicalso caps results inGET /api/paperswhenapply_topic_caps=1.config/settings.json: overrides fetch limits (arxiv_max_results,arxiv_page_size,fetch_timeout_s,max_candidates_per_source). Updated viaPOST /api/settings.config/affiliations.json: list of{pattern, weight}boosts applied by substring match over affiliations. Weights add up and are capped at 1.0. Invalid JSON disables boosts, so keep the file strict JSON (no trailing commas).
Mandatory workflow (follow step-by-step)
- You first MUST open and read the configuration from the github repo: https://github.com/matanle51/agentic_paper_digest you downloaded:
- Load
config/topics.json,config/settings.json, andconfig/affiliations.json(if present). - Note current topic IDs, caps, and fetch limits before asking the user to change them.
- ASK THE USER TO PROVIDE IT'S PREFERENCES ABOUT THE FOLLOWING (HELP THE USER):
- Topics of interest → update
config/topics.json(topics[].id/label/description/keywords,max_per_topic).
- Time window (hours) → set
WINDOW_HOURS(or pass--window-hoursto CLI) only if the user cares; otherwise keep default to 24h. - ASK THE USER TO FILL THE FOLLOWING PARAMETERS (explain the user why are their intent):
ARXIV_CATEGORIES,ARXIV_MAX_RESULTS,ARXIV_PAGE_SIZE,MAX_CANDIDATES_PER_SOURCE.
- Model/provider → set
OPENAI_API_KEYorLITELLM_API_KEY(+LITELLM_API_BASEif proxy), and setLITELLM_MODEL_RELEVANCE/LITELLM_MODEL_SUMMARY. - Do NOT ask by default: timezone, quality vs cost, timeouts, PDF text, affiliation biasing, sources list. Use defaults unless the user requests changes.
- Confirm workspace path: Ask where to clone/run. Default to
PROJECT_DIR="$HOME/agentic_paper_digest"if the user doesn’t care. Never hardcode/Users/...paths. - Bootstrap the repo: Run the bootstrap script (unless the repo already exists and the user says to skip).
- Create or verify
.env: - If
.envis missing, create it from.env.example(in the repo), then ask the user to fill keys and any requested preferences. - Ensure at least one of
OPENAI_API_KEYorLITELLM_API_KEYis set before running. - Apply config changes:
- Edit JSON files directly (or use
POST /api/topicsandPOST /api/settingsif running the API). - Run the pipeline:
- Prefer
scripts/run_cli.shfor one-off JSON output. - Use
scripts/run_api.shonly if the user explicitly asks for UI/API access or polling. - Report results:
- If results are sparse, suggest increasing
WINDOW_HOURS,ARXIV_MAX_RESULTS, or broadening topics.
Getting good results
- Help the user define and keep topics focused and mutually exclusive so the classifier can choose the right IDs.
- Use a stronger model for summaries than for relevance if quality matters.
- If using openAI's model, defualy to gpt-5-mini for good tradeoff.
- Increase
WINDOW_HOURSorARXIV_MAX_RESULTSwhen results are sparse, or lower them if results are too noisy. - Tune
ARXIV_CATEGORIESto your research domains. - Enable PDF text (
ENABLE_PDF_TEXT=1) when abstracts are too thin. - Use modest affiliation weights to bias ranking without swamping relevance.
- BE PROACTIVE AND HELP THE USER TUNE THE SKILL FOR GOOD RESULTS!
Troubleshooting
- Port 8000 busy: run
bash "{baseDir}/scripts/stop_api.sh"or pass--portto the API command. - Empty results: increase
WINDOW_HOURSor verify the API key in.env. - Missing API key errors: export
OPENAI_API_KEYorLITELLM_API_KEYin the shell before running.
Installation
Terminal bash
openclaw install agentic-paper-digest-skill
Copied!
💻Code Examples
example.sh
curl -X POST http://127.0.0.1:8000/api/run
curl http://127.0.0.1:8000/api/status
curl http://127.0.0.1:8000/api/papers⚙️Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
PROJECT_DIR | string | override with `ENV_FILE=/path/to/.env` | . |
OPENAI_API_KEY | string | - | required for OpenAI models (litellm reads this). |
LITELLM_API_BASE | string | - | , `LITELLM_API_KEY`: use an OpenAI-compatible proxy/provider. |
LITELLM_MODEL_RELEVANCE | string | - | , `LITELLM_MODEL_SUMMARY`: models for relevance and summarization (summary defaults to relevance model if unset). |
LITELLM_TEMPERATURE_RELEVANCE | string | - | , `LITELLM_TEMPERATURE_SUMMARY`: lower for more deterministic output. |
LITELLM_MAX_RETRIES | string | - | retry count for LLM calls. |
WINDOW_HOURS | string | - | , `APP_TZ`: recency window and timezone. |
ARXIV_CATEGORIES | string | - | comma-separated categories (default includes `cs.CL,cs.AI,cs.LG,stat.ML,cs.CR`). |
ARXIV_API_BASE | string | - | , `HF_API_BASE`: override source endpoints if needed. |
ARXIV_MAX_RESULTS | string | - | , `ARXIV_PAGE_SIZE`: arXiv paging limits. |
MAX_CANDIDATES_PER_SOURCE | string | - | cap candidates per source before LLM filtering. |
FETCH_TIMEOUT_S | string | - | , `REQUEST_TIMEOUT_S`: source fetch and per-request timeouts. |
DATA_DIR | string | - | location for `papers.sqlite3`. |
CORS_ORIGINS | string | - | comma-separated origins allowed by the API server (UI use). |
TOPICS_PATH | string | - | , `SETTINGS_PATH`, `AFFILIATION_BOOSTS_PATH`. |
Tags
#search_and-research
Quick Info
Category Web Scrapers
Model Claude 3.5
Complexity Multi-Agent
Author matanle51
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
Ready to Install?
Get started with this skill in seconds
openclaw install agentic-paper-digest-skill
Related Skills
✓ Verified
💻 Development
4claw
4claw — a moderated imageboard for AI agents.
🧠 Claude-Ready
)}
★ 4.4 (118)
↓ 4,990
v1.0.0
✓ Verified
💻 Development
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
🧠 Claude-Ready
)}
★ 4.3 (89)
↓ 4,621
v1.0.0
✓ Verified
💻 Development
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
🧠 Claude-Ready
)}
★ 4.7 (88)
↓ 1,625
v1.0.0
✓ Verified
💻 Development
Adversarial Prompting
Adversarial analysis to critique, fix.
🧠 Claude-Ready
)}
★ 4.6 (372)
↓ 28,222
v1.0.0