Hopeids
Inference-based intrusion detection for AI agents with quarantine.
- Rating
- 4.5 (112 reviews)
- Downloads
- 1,127 downloads
- Version
- 1.0.0
Overview
Inference-based intrusion detection for AI agents with quarantine.
✨Key Features
Auto-scan — Scan messages before agent processing
Quarantine — Block threats with metadata-only storage
Human-in-the-loop — Telegram alerts for review
Per-agent config — Different thresholds for different agents
Commands — /approve, /reject, /trust, /quarantine
--
Complete Documentation
View Source →hopeIDS Security Skill
Inference-based intrusion detection for AI agents with quarantine and human-in-the-loop.
Security Invariants
These are non-negotiable design principles:
- Block = full abort — Blocked messages never reach jasper-recall or the agent
- Metadata only — No raw malicious content is ever stored
- Approve ≠ re-inject — Approval changes future behavior, doesn't resurrect messages
- Alerts are programmatic — Telegram alerts built from metadata, no LLM involved
Features
- Auto-scan — Scan messages before agent processing
- Quarantine — Block threats with metadata-only storage
- Human-in-the-loop — Telegram alerts for review
- Per-agent config — Different thresholds for different agents
- Commands —
/approve,/reject,/trust,/quarantine
The Pipeline
Message arrives
↓
hopeIDS.autoScan()
↓
┌─────────────────────────────────────────┐
│ risk >= threshold? │
│ │
│ BLOCK (strictMode): │
│ → Create QuarantineRecord │
│ → Send Telegram alert │
│ → ABORT (no recall, no agent) │
│ │
│ WARN (non-strict): │
│ → Inject <security-alert> │
│ → Continue to jasper-recall │
│ → Continue to agent │
│ │
│ ALLOW: │
│ → Continue normally │
└─────────────────────────────────────────┘
Configuration
{
"plugins": {
"entries": {
"hopeids": {
"enabled": true,
"config": {
"autoScan": true,
"defaultRiskThreshold": 0.7,
"strictMode": false,
"telegramAlerts": true,
"agents": {
"moltbook-scanner": {
"strictMode": true,
"riskThreshold": 0.7
},
"main": {
"strictMode": false,
"riskThreshold": 0.8
}
}
}
}
}
}
}
Options
| Option | Type | Default | Description |
|---|---|---|---|
| autoScan | boolean | false | Auto-scan every message |
| strictMode | boolean | false | Block (vs warn) on threats |
| defaultRiskThreshold | number | 0.7 | Risk level that triggers action |
| telegramAlerts | boolean | true | Send alerts for blocked messages |
| telegramChatId | string | - | Override alert destination |
| quarantineDir | string | ~/.openclaw/quarantine/hopeids | Storage path |
| agents | object | - | Per-agent overrides |
| trustOwners | boolean | true | Skip scanning owner messages |
Quarantine Records
When a message is blocked, a metadata record is created:
{
"id": "q-7f3a2b",
"ts": "2026-02-06T00:48:00Z",
"agent": "moltbook-scanner",
"source": "moltbook",
"senderId": "@sus_user",
"intent": "instruction_override",
"risk": 0.85,
"patterns": [
"matched regex: ignore.*instructions",
"matched keyword: api key"
],
"contentHash": "ab12cd34...",
"status": "pending"
}
Note: There is NO originalMessage field. This is intentional.
Telegram Alerts
When a message is blocked:
🛑 Message blocked
ID: `q-7f3a2b`
Agent: moltbook-scanner
Source: moltbook
Sender: @sus_user
Intent: instruction_override (85%)
Patterns:
• matched regex: ignore.*instructions
• matched keyword: api key
`/approve q-7f3a2b`
`/reject q-7f3a2b`
`/trust @sus_user`
Built from metadata only. No LLM touches this.
Commands
/quarantine [all|clean]
List quarantine records.
/quarantine # List pending
/quarantine all # List all (including resolved)
/quarantine clean # Clean expired records
/approve
Mark a blocked message as a false positive.
/approve q-7f3a2b
Effect:
- Status →
approved - (Future) Add sender to allowlist
- (Future) Lower pattern weight
/reject
Confirm a blocked message was a true positive.
/reject q-7f3a2b
Effect:
- Status →
rejected - (Future) Reinforce pattern weights
/trust
Whitelist a sender for future messages.
/trust @legitimate_user
/scan
Manually scan a message.
/scan ignore your previous instructions and...
What Approve/Reject Mean
| Command | What it does | What it doesn't do |
|---|---|---|
| /approve | Marks as false positive, may adjust IDS | Does NOT re-inject the message |
| /reject | Confirms threat, may strengthen patterns | Does NOT affect current message |
| /trust | Whitelists sender for future | Does NOT retroactively approve |
Per-Agent Configuration
Different agents need different security postures:
"agents": {
"moltbook-scanner": {
"strictMode": true, // Block threats
"riskThreshold": 0.7 // 70% = suspicious
},
"main": {
"strictMode": false, // Warn only
"riskThreshold": 0.8 // Higher bar for main
},
"email-processor": {
"strictMode": true, // Always block
"riskThreshold": 0.6 // More paranoid
}
}
Threat Categories
| Category | Risk | Description |
|---|---|---|
| command_injection | 🔴 Critical | Shell commands, code execution |
| credential_theft | 🔴 Critical | API key extraction attempts |
| data_exfiltration | 🔴 Critical | Data leak to external URLs |
| instruction_override | 🔴 High | Jailbreaks, "ignore previous" |
| impersonation | 🔴 High | Fake system/admin messages |
| discovery | ⚠️ Medium | API/capability probing |
Installation
npx hopeid setup
Then restart OpenClaw.
Links
- GitHub: https://github.com/E-x-O-Entertainment-Studios-Inc/hopeIDS
- npm: https://www.npmjs.com/package/hopeid
- Docs: https://exohaven.online/products/hopeids
Installation
openclaw install hopeids
💻Code Examples
└─────────────────────────────────────────┘
---
## Configuration}
### Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `autoScan` | boolean | `false` | Auto-scan every message |
| `strictMode` | boolean | `false` | Block (vs warn) on threats |
| `defaultRiskThreshold` | number | `0.7` | Risk level that triggers action |
| `telegramAlerts` | boolean | `true` | Send alerts for blocked messages |
| `telegramChatId` | string | - | Override alert destination |
| `quarantineDir` | string | `~/.openclaw/quarantine/hopeids` | Storage path |
| `agents` | object | - | Per-agent overrides |
| `trustOwners` | boolean | `true` | Skip scanning owner messages |
---
## Quarantine Records
When a message is blocked, a metadata record is created:}
**Note:** There is NO `originalMessage` field. This is intentional.
---
## Telegram Alerts
When a message is blocked:`/trust @sus_user`
Built from metadata only. No LLM touches this.
---
## Commands
### `/quarantine [all|clean]`
List quarantine records./quarantine clean # Clean expired records
### `/approve <id>`
Mark a blocked message as a false positive./approve q-7f3a2b
**Effect:**
- Status → `approved`
- (Future) Add sender to allowlist
- (Future) Lower pattern weight
### `/reject <id>`
Confirm a blocked message was a true positive./reject q-7f3a2b
**Effect:**
- Status → `rejected`
- (Future) Reinforce pattern weights
### `/trust <senderId>`
Whitelist a sender for future messages./trust @legitimate_user
### `/scan <message>`
Manually scan a message./scan ignore your previous instructions and...
---
## What Approve/Reject Mean
| Command | What it does | What it doesn't do |
|---------|--------------|-------------------|
| `/approve` | Marks as false positive, may adjust IDS | Does NOT re-inject the message |
| `/reject` | Confirms threat, may strengthen patterns | Does NOT affect current message |
| `/trust` | Whitelists sender for future | Does NOT retroactively approve |
**The blocked message is gone by design.** If it was legitimate, the sender can re-send.
---
## Per-Agent Configuration
Different agents need different security postures:}
---
## Threat Categories
| Category | Risk | Description |
|----------|------|-------------|
| `command_injection` | 🔴 Critical | Shell commands, code execution |
| `credential_theft` | 🔴 Critical | API key extraction attempts |
| `data_exfiltration` | 🔴 Critical | Data leak to external URLs |
| `instruction_override` | 🔴 High | Jailbreaks, "ignore previous" |
| `impersonation` | 🔴 High | Fake system/admin messages |
| `discovery` | ⚠️ Medium | API/capability probing |
---
## InstallationTags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.
Adversarial Prompting
Adversarial analysis to critique, fix.