✓ Verified 💻 Development ✓ Enhanced Data

Veille

RSS feed aggregator, deduplication engine, and output dispatcher for OpenClaw agents.

Rating: 4.9 (177 reviews)
Downloads: 22,906 downloads
Version: 1.0.0

Overview

RSS feed aggregator, deduplication engine, and output dispatcher for OpenClaw agents.

Complete Documentation

View Source →

Skill Veille - RSS Aggregator

RSS feed aggregator with URL deduplication and topic-based deduplication for OpenClaw agents. Fetches articles from 20+ configured sources, filters already-seen URLs (TTL 14 days), and deduplicates articles covering the same story using Jaccard similarity + named entities.

No external dependencies: stdlib Python only (urllib, xml.etree, email.utils).

Trigger phrases

"fais une veille"
"quoi de neuf en securite / tech / crypto / IA ?"
"donne-moi les news du jour"
"articles recents sur [sujet]"
"veille RSS"
"digest du matin"
"nouvelles non vues"

Quick Start

bash

# 1. Setup
python3 scripts/setup.py

# 2. Validate
python3 scripts/init.py

# 3. Fetch + Score + Send (full pipeline)
python3 scripts/veille.py fetch --filter-seen --filter-topic \
  | python3 scripts/veille.py score \
  | python3 scripts/veille.py send

Setup

Requirements

Python 3.9+
Network access to RSS feeds (public, no auth required)
No pip installs needed

Installation

bash

# From the skill directory
python3 scripts/setup.py

# Validate
python3 scripts/init.py

The wizard creates:

~/.openclaw/config/veille/config.json (from config.example.json)
~/.openclaw/data/veille/ (data directory)

Customizing sources

Edit ~/.openclaw/config/veille/config.json and add/remove entries in the "sources" dict:

json

{
  "sources": {
    "My Blog": "https://example.com/feed.xml",
    "BleepingComputer": "https://www.bleepingcomputer.com/feed/"
  }
}

Storage and credentials

Files written by this skill

Path	Written by	Purpose	Contains secrets
~/.openclaw/config/veille/config.json	setup.py	Sources, outputs, options	NO
~/.openclaw/data/veille/seen_urls.json	veille.py	URL dedup store (TTL 14d)	NO
~/.openclaw/data/veille/topic_seen.json	veille.py	Topic dedup store (TTL 5d)	NO

Files read from outside the skill

Path	Read by	Key accessed	When
~/.openclaw/openclaw.json	dispatch.py	channels.telegram.botToken (read-only)	Only when telegram_bot output is enabled and no bot_token is set in the output config

This is the only cross-config read. To avoid it entirely, set bot_token explicitly in your output config:

json

{ "type": "telegram_bot", "bot_token": "YOUR_BOT_TOKEN", "chat_id": "...", "enabled": true }

Output credentials (optional)

Credentials are only used if you enable the corresponding output. None are required for core functionality (RSS fetch + dedup).

Output	Credential source	What is used
telegram_bot	~/.openclaw/openclaw.json or bot_token in output config	Bot token (read-only)
mail-client	Delegated to mail-client skill (its own creds)	Nothing read directly
mail-client (SMTP fallback)	smtp_user / smtp_pass in output config	SMTP login
nextcloud	Delegated to nextcloud-files skill (its own creds)	Nothing read directly

Cleanup on uninstall

bash

python3 scripts/setup.py --cleanup

Security model

Credential isolation

API keys are read from dedicated files (default ~/.openclaw/secrets/), never from config.json. The scorer warns at runtime if a key file has overly permissive filesystem permissions.
SMTP credentials (fallback only) are stored in the output config block — use the mail-client skill delegation to avoid storing SMTP passwords.

Subprocess boundaries

Dispatch delegates to other OpenClaw skills via subprocess.run() (never shell=True). Script paths are validated to reside under ~/.openclaw/workspace/skills/ before execution, preventing path traversal.
No credentials are passed as subprocess arguments — each skill manages its own authentication.

File output safety

The file output type validates the target path before writing: only ~/.openclaw/ is allowed by default. Additional directories can be whitelisted via config.security.allowed_output_dirs. Sensitive paths (.ssh, .gnupg, /etc/, .bashrc, etc.) are always blocked regardless of allowlist.
Written content is checked for suspicious patterns (shell shebangs, SSH keys, PGP blocks, code injection) and size-limited to 1 MB.

Cross-config reads

The only cross-config file read is ~/.openclaw/openclaw.json for the Telegram bot token, and only when telegram_bot output is enabled without an explicit bot_token. This read is logged to stderr. Set bot_token in the output config to eliminate this read entirely.

Autonomous dispatch

When scheduled (cron), the skill can send messages/files to configured outputs without user interaction. All dispatch actions are logged to stderr with an audit summary. Use enabled: false on any output to disable it without removing its config.

CLI reference

`fetch`

text

python3 veille.py fetch [--hours N] [--filter-seen] [--filter-topic] [--sources FILE]

Options:

--hours N : lookback window in hours (default: from config, usually 24)
--filter-seen : filter already-seen URLs (uses seen_urls.json TTL store)
--filter-topic : deduplicate by topic (uses topic_seen.json + Jaccard similarity)
--sources FILE : path to custom JSON sources file

Output (JSON on stdout):

json

{
  "hours": 24,
  "count": 42,
  "skipped_url": 5,
  "skipped_topic": 3,
  "articles": [...],
  "wrapped_listing": "=== UNTRUSTED EXTERNAL CONTENT ..."
}

`seen-stats`

text

python3 veille.py seen-stats

Shows URL seen store statistics (count, TTL, file path).

`topic-stats`

text

python3 veille.py topic-stats

Shows topic deduplication store statistics.

`mark-seen`

text

python3 veille.py mark-seen URL [URL ...]

Marks one or more URLs as already seen (prevents them from appearing in future fetches with --filter-seen).

`score`

text

python3 veille.py score [--dry-run]

Reads a digest JSON from stdin (output of fetch) and scores articles using an OpenAI-compatible LLM. Returns enriched JSON with scored, ghost_picks, and per-article score/reason fields.

Options:

--dry-run : print summary on stderr without calling the LLM API

When llm.enabled is false (default), articles pass through unchanged ("scored": false).

Pipeline usage:

bash

python3 veille.py fetch --filter-seen --filter-topic | python3 veille.py score | python3 veille.py send

`send`

text

python3 veille.py send [--profile NAME]

Reads a digest JSON from stdin and dispatches to all enabled outputs configured in config.json. Accepts both raw fetch output (articles key) and LLM-processed digests (categories key).

Output types: telegram_bot, mail-client, nextcloud, file.

telegram_bot: bot token auto-read from OpenClaw config - no extra setup if Telegram already configured.
mail-client: delegates to mail-client skill if installed, falls back to raw SMTP config.
nextcloud: delegates to nextcloud-files skill if installed (append mode by default with date separator).
file: writes digest to a local file. Path must be under ~/.openclaw/ (default) or a directory listed in config.security.allowed_output_dirs. Sensitive paths and suspicious content are blocked (see Security model).

Configure outputs interactively:

bash

python3 scripts/setup.py --manage-outputs

`config`

text

python3 veille.py config

Prints the active configuration (no secrets).

LLM scoring configuration

The llm key in config.json controls the optional LLM-based article scoring:

json

{
  "llm": {
    "enabled": false,
    "base_url": "https://api.openai.com/v1",
    "api_key_file": "~/.openclaw/secrets/openai_api_key",
    "model": "gpt-4o-mini",
    "top_n": 10,
    "ghost_threshold": 5
  }
}

Key	Default	Description
enabled	false	Enable LLM scoring (requires API key)
base_url	https://api.openai.com/v1	OpenAI-compatible API endpoint
api_key_file	~/.openclaw/secrets/openai_api_key	Path to file containing the API key
model	gpt-4o-mini	Model to use for scoring
top_n	10	Max articles to send to LLM per batch
ghost_threshold	5	Score threshold for ghost_picks (blog-worthy articles)

Scoring rules:

Only the first top_n articles are sent to the LLM. Articles beyond top_n

are excluded from the digest entirely. fetch returns articles sorted by date desc, so top_n selects the most recent ones. Increase top_n to evaluate more articles per run (higher token cost).

Score >= ghost_threshold : added to ghost_picks list
Score >= 3 : kept in articles list
Score <= 2 : excluded from output
Articles are sorted by score (descending)

When disabled, the score subcommand passes data through unchanged.

Nextcloud output mode

The nextcloud output now defaults to append mode with a date separator. Each dispatch adds content below a ## YYYY-MM-DD HH:MM header, preserving previous entries.

Set "mode": "overwrite" in the output config to restore the old behavior:

json

{ "type": "nextcloud", "path": "/Veille/digest.md", "mode": "overwrite" }

File output configuration

The file output writes digests to the local filesystem. By default, only paths under ~/.openclaw/ are allowed. To authorize additional directories, use config.security.allowed_output_dirs:

json

{
  "security": {
    "allowed_output_dirs": [
      "~/Documents/veille",
      "/srv/digests"
    ]
  }
}

Blocked paths (always rejected, even if inside an allowed directory): .ssh, .gnupg, .config/systemd, crontab, /etc/, .bashrc, .profile, .bash_profile, .zshrc, .env

Content validation — written content is rejected if it:

Exceeds 1 MB
Contains shell shebangs (#!/), SSH keys, PGP blocks, or code injection patterns (eval(, exec(, __import__(, import os, import subprocess)

All blocked attempts are logged to stderr with the reason.

Templates (agent usage)

Basic digest

python

# In agent tool call:
result = exec("python3 scripts/veille.py fetch --hours 24 --filter-seen --filter-topic")
data = json.loads(result.stdout)
# data["wrapped_listing"] is ready for LLM prompt injection
# data["count"] = number of new articles
# data["articles"] = list of article dicts

Prompt template

text

You are a news analyst. Here are today's articles:

{data["wrapped_listing"]}

Please summarize the 5 most important stories, focusing on security and tech.

Agent workflow example

text

1. Call veille fetch --filter-seen --filter-topic
2. Pipe through veille score (LLM scoring, if enabled)
3. If count > 0: pass wrapped_listing to LLM for analysis
4. LLM produces digest summary
5. Pipe through veille send (dispatches to configured outputs)

Pipeline (CLI)

bash

python3 scripts/veille.py fetch --filter-seen --filter-topic \
  | python3 scripts/veille.py score \
  | python3 scripts/veille.py send

Filtering by keyword (post-fetch)

python

data = json.loads(fetch_output)
security_articles = [
    a for a in data["articles"]
    if any(kw in a["title"].lower() for kw in ["cve", "vuln", "patch", "breach"])
]

Ideas

Add keyword-based filtering (--keywords security,cve,linux)
Add per-source TTL override in config
Export digest as HTML or Markdown
Schedule with cron: 0 8 * python3 veille.py fetch --filter-seen --filter-topic
Weight articles by source tier for LLM prioritization
Add OPML import/export for source list management
Integrate with ntfy or Telegram for real-time alerts on high-priority articles

Combine with

mail-client : send the digest by email after fetching

text

veille fetch --filter-seen | ... | mail-client send

nextcloud-files : archive the daily digest as a Markdown file

text

veille fetch --filter-seen | jq .wrapped_listing -r > /tmp/digest.md
  nextcloud-files upload /tmp/digest.md /Digests/$(date +%Y-%m-%d).md

Troubleshooting

See references/troubleshooting.md for detailed troubleshooting steps.

Common issues:

No articles returned: check --hours value, verify feed URLs in config
XML parse error on a feed: some feeds use non-standard XML; the skill skips broken items silently
All articles filtered as seen: run seen-stats to check store size; reset with rm seen_urls.json
Import error: ensure you run veille.py from its directory or via full path
File output blocked: path is outside ~/.openclaw/ — add the target directory to config.security.allowed_output_dirs (see File output configuration)

Installation

Terminal bash


openclaw install veille

Copied!

💻Code Examples

| python3 scripts/veille.py send

--python3-scriptsveillepy-send.txt

---

## Setup

### Requirements

- Python 3.9+
- Network access to RSS feeds (public, no auth required)
- No pip installs needed

### Installation

python3 scripts/init.py

python3-scriptsinitpy.txt

The wizard creates:
- `~/.openclaw/config/veille/config.json` (from `config.example.json`)
- `~/.openclaw/data/veille/` (data directory)

### Customizing sources

Edit `~/.openclaw/config/veille/config.json` and add/remove entries in the `"sources"` dict:

}

.txt

---

## Storage and credentials

### Files written by this skill

| Path | Written by | Purpose | Contains secrets |
|------|-----------|---------|-----------------|
| `~/.openclaw/config/veille/config.json` | `setup.py` | Sources, outputs, options | NO |
| `~/.openclaw/data/veille/seen_urls.json` | `veille.py` | URL dedup store (TTL 14d) | NO |
| `~/.openclaw/data/veille/topic_seen.json` | `veille.py` | Topic dedup store (TTL 5d) | NO |

### Files read from outside the skill

| Path | Read by | Key accessed | When |
|------|---------|-------------|------|
| `~/.openclaw/openclaw.json` | `dispatch.py` | `channels.telegram.botToken` (read-only) | Only when `telegram_bot` output is enabled and no `bot_token` is set in the output config |

This is the only cross-config read. To avoid it entirely, set `bot_token` explicitly in your output config:

{ "type": "telegram_bot", "bot_token": "YOUR_BOT_TOKEN", "chat_id": "...", "enabled": true }

-type-telegrambot-bottoken-yourbottoken-chatid--enabled-true-.txt

### Output credentials (optional)

Credentials are only used if you enable the corresponding output. None are required for core functionality (RSS fetch + dedup).

| Output | Credential source | What is used |
|--------|-----------------|-------------|
| `telegram_bot` | `~/.openclaw/openclaw.json` or `bot_token` in output config | Bot token (read-only) |
| `mail-client` | Delegated to mail-client skill (its own creds) | Nothing read directly |
| `mail-client` (SMTP fallback) | `smtp_user` / `smtp_pass` in output config | SMTP login |
| `nextcloud` | Delegated to nextcloud-files skill (its own creds) | Nothing read directly |

### Cleanup on uninstall

python3 scripts/setup.py --cleanup

python3-scriptssetuppy---cleanup.txt

---

## Security model

### Credential isolation
- API keys are read from dedicated files (default `~/.openclaw/secrets/`), never from config.json. The scorer warns at runtime if a key file has overly permissive filesystem permissions.
- SMTP credentials (fallback only) are stored in the output config block — use the mail-client skill delegation to avoid storing SMTP passwords.

### Subprocess boundaries
- Dispatch delegates to other OpenClaw skills via `subprocess.run()` (never `shell=True`). Script paths are validated to reside under `~/.openclaw/workspace/skills/` before execution, preventing path traversal.
- No credentials are passed as subprocess arguments — each skill manages its own authentication.

### File output safety
- The `file` output type validates the target path before writing: only `~/.openclaw/` is allowed by default. Additional directories can be whitelisted via `config.security.allowed_output_dirs`. Sensitive paths (`.ssh`, `.gnupg`, `/etc/`, `.bashrc`, etc.) are always blocked regardless of allowlist.
- Written content is checked for suspicious patterns (shell shebangs, SSH keys, PGP blocks, code injection) and size-limited to 1 MB.

### Cross-config reads
- The only cross-config file read is `~/.openclaw/openclaw.json` for the Telegram bot token, and only when `telegram_bot` output is enabled without an explicit `bot_token`. This read is logged to stderr. Set `bot_token` in the output config to eliminate this read entirely.

### Autonomous dispatch
- When scheduled (cron), the skill can send messages/files to configured outputs without user interaction. All dispatch actions are logged to stderr with an audit summary. Use `enabled: false` on any output to disable it without removing its config.

---

## CLI reference

### `fetch`

python3 veille.py fetch [--hours N] [--filter-seen] [--filter-topic] [--sources FILE]

python3-veillepy-fetch---hours-n---filter-seen---filter-topic---sources-file.txt

Options:
- `--hours N` : lookback window in hours (default: from config, usually 24)
- `--filter-seen` : filter already-seen URLs (uses seen_urls.json TTL store)
- `--filter-topic` : deduplicate by topic (uses topic_seen.json + Jaccard similarity)
- `--sources FILE` : path to custom JSON sources file

Output (JSON on stdout):

python3 veille.py seen-stats

python3-veillepy-seen-stats.txt

Shows URL seen store statistics (count, TTL, file path).

### `topic-stats`

python3 veille.py topic-stats

python3-veillepy-topic-stats.txt

Shows topic deduplication store statistics.

### `mark-seen`

python3 veille.py mark-seen URL [URL ...]

python3-veillepy-mark-seen-url-url-.txt

Marks one or more URLs as already seen (prevents them from appearing in future fetches with `--filter-seen`).

### `score`

python3 veille.py score [--dry-run]

python3-veillepy-score---dry-run.txt

Reads a digest JSON from stdin (output of `fetch`) and scores articles using an OpenAI-compatible LLM.
Returns enriched JSON with `scored`, `ghost_picks`, and per-article `score`/`reason` fields.

Options:
- `--dry-run` : print summary on stderr without calling the LLM API

When `llm.enabled` is `false` (default), articles pass through unchanged (`"scored": false`).

Pipeline usage: