โœ“ Verified ๐Ÿ’ป Development โœ“ Enhanced Data

Token Guard

<!-- ๐ŸŒŒ Aoineco-Verified | S-DNA: AOI-2026-0213-SDNA-TG01 -->.

Rating
4.6 (469 reviews)
Downloads
1,709 downloads
Version
1.0.0

Overview

<!-- ๐ŸŒŒ Aoineco-Verified | S-DNA: AOI-2026-0213-SDNA-TG01 -->.

Complete Documentation

View Source โ†’

TokenGuard โ€” LLM API 429 Prevention Engine

Version: 1.5.0 Author: Aoineco & Co. License: MIT Tags: rate-limit, 429, token-management, cost-optimization, llm-guard, high-performance

Description

Prevents LLM API 429 (Rate Limit / Resource Exhausted) errors by intercepting requests before they're sent. Designed for users on free/low-cost API plans who need maximum intelligence per dollar.

Core philosophy: "Intelligence is measured not by how much you spend, but by how little you need."

Problem

When using LLM APIs (especially Google Gemini Flash with 1M TPM limit):

  • Large documents (docx, PDFs) can consume the entire minute quota in one request
  • Failed requests still count toward token usage
  • Retry loops after 429 errors waste more tokens โ†’ death spiral
  • No built-in way to detect runaway/duplicate requests

Features

FeatureDescription
Pre-flight Token EstimationEstimates token count before API call (CJK-aware, no tiktoken dependency)
Real-time Quota TrackingTracks per-model per-minute token usage with sliding window
Smart ThrottleAuto-waits when quota > 80%, blocks at > 95%
Duplicate DetectionBlocks identical requests within 60s window (3+ = runaway)
Response CachingCaches successful responses for duplicate requests
Auto Model FallbackSwitches to cheaper/available model when primary is exhausted
429 Error ParserExtracts exact retry delay from Google/Anthropic error responses
Batch vs Mistake DetectionDistinguishes intentional bulk processing from error loops

Supported Models

Pre-configured quotas for:

  • gemini-3-flash (1M TPM)
  • gemini-3-pro (2M TPM)
  • claude-haiku (50K TPM)
  • claude-sonnet (200K TPM)
  • claude-opus (200K TPM)
  • gpt-4o (800K TPM)
  • deepseek (1M TPM)
Custom quotas can be added for any model.

Usage

python
from token_guard import TokenGuard

guard = TokenGuard()

# Before every API call:
decision = guard.check(prompt_text, model="gemini-3-flash")

if decision.action == "proceed":
    response = call_your_api(prompt_text)
    guard.record_usage(decision.estimated_tokens, model="gemini-3-flash")
    guard.cache_response(prompt_text, response)

elif decision.action == "wait":
    time.sleep(decision.wait_seconds)
    # retry

elif decision.action == "fallback":
    response = call_your_api(prompt_text, model=decision.fallback_model)

elif decision.action == "block":
    print(f"Blocked: {decision.reason}")

# If you get a 429 error:
guard.record_429("gemini-3-flash", retry_delay=53.0)

Integration with OpenClaw

Add to your agent's config or use as a middleware:

yaml
skills:
  - token-guard

The agent can invoke TokenGuard before any LLM API call to prevent quota exhaustion.

File Structure

text
token-guard/
โ”œโ”€โ”€ SKILL.md          # This file
โ””โ”€โ”€ scripts/
    โ””โ”€โ”€ token_guard.py  # Main engine (zero external dependencies)

Status Output Example

json
{
  "models": {
    "gemini-3-flash": {
      "tpm_limit": 1000000,
      "used_this_minute": 750000,
      "remaining": 250000,
      "usage_pct": "75.0%",
      "status": "๐ŸŸข OK"
    }
  },
  "stats": {
    "total_checks": 42,
    "tokens_saved": 128000,
    "blocks": 3,
    "fallbacks": 2
  }
}

Zero Dependencies

Pure Python 3.10+. No pip install needed. No tiktoken, no external API calls. Designed for the $7 Bootstrap Protocol โ€” every byte counts.

Installation

Terminal bash

openclaw install token-guard
    
Copied!

๐Ÿ’ปCode Examples

guard.record_429("gemini-3-flash", retry_delay=53.0)

guardrecord429gemini-3-flash-retrydelay530.txt
## Integration with OpenClaw

Add to your agent's config or use as a middleware:

- token-guard

---token-guard.txt
The agent can invoke TokenGuard before any LLM API call to prevent quota exhaustion.

## File Structure
example.py
from token_guard import TokenGuard

guard = TokenGuard()

# Before every API call:
decision = guard.check(prompt_text, model="gemini-3-flash")

if decision.action == "proceed":
    response = call_your_api(prompt_text)
    guard.record_usage(decision.estimated_tokens, model="gemini-3-flash")
    guard.cache_response(prompt_text, response)

elif decision.action == "wait":
    time.sleep(decision.wait_seconds)
    # retry

elif decision.action == "fallback":
    response = call_your_api(prompt_text, model=decision.fallback_model)

elif decision.action == "block":
    print(f"Blocked: {decision.reason}")

# If you get a 429 error:
guard.record_429("gemini-3-flash", retry_delay=53.0)
example.txt
token-guard/
โ”œโ”€โ”€ SKILL.md          # This file
โ””โ”€โ”€ scripts/
    โ””โ”€โ”€ token_guard.py  # Main engine (zero external dependencies)
example.json
{
  "models": {
    "gemini-3-flash": {
      "tpm_limit": 1000000,
      "used_this_minute": 750000,
      "remaining": 250000,
      "usage_pct": "75.0%",
      "status": "๐ŸŸข OK"
    }
  },
  "stats": {
    "total_checks": 42,
    "tokens_saved": 128000,
    "blocks": 3,
    "fallbacks": 2
  }
}

Tags

#ai_and-llms

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author edmonddantesj
Last Updated 3/10/2026
๐Ÿš€
Optimized for
Claude 3.5
๐Ÿง 

Ready to Install?

Get started with this skill in seconds

openclaw install token-guard