✓ Verified 💻 Development ✓ Enhanced Data

Local First Llm

Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud API

Rating
4.9 (453 reviews)
Downloads
25,617 downloads
Version
1.0.0

Overview

Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.

Complete Documentation

View Source →

Local-First LLM

Route requests to a local LLM first; fall back to cloud only when necessary. Track every decision to show real token and cost savings.

Quick Start

1. Check if a local LLM is running

bash
python3 skills/local-first-llm/scripts/check_local.py

Returns JSON: { "any_available": true, "best": { "provider": "ollama", "models": [...] } }

2. Route a request

bash
python3 skills/local-first-llm/scripts/route_request.py \
  --prompt "Summarize this meeting transcript" \
  --tokens 800 \
  --local-available \
  --local-provider ollama

Returns: { "decision": "local", "reason": "...", "complexity_score": -1 }

3. Log the outcome

After executing the request, record it:

bash
python3 skills/local-first-llm/scripts/track_savings.py log \
  --tokens 800 \
  --model gpt-4o \
  --routed-to local

4. Show the dashboard

bash
python3 skills/local-first-llm/scripts/dashboard.py


Full Routing Workflow

text
┌─────────────────────────────────────────────────────┐
│  1. check_local.py  →  is a local provider running? │
│                                                      │
│  2. route_request.py  →  local or cloud?             │
│     - sensitivity check  (private data → local)      │
│     - complexity score   (high score → cloud)        │
│     - availability gate  (no local → cloud)          │
│                                                      │
│  3. Execute with the chosen provider                 │
│                                                      │
│  4. track_savings.py log  →  record the outcome      │
│                                                      │
│  5. dashboard.py  →  show cumulative savings         │
└─────────────────────────────────────────────────────┘


Routing Rules (Summary)

ConditionRoute
No local provider available☁️ Cloud
Prompt contains sensitive data (password, secret, api key, ssn, etc.)🏠 Local
Complexity score ≥ 3☁️ Cloud
Complexity score < 3🏠 Local
For full scoring details, see references/routing-logic.md.


Executing with a Local Provider

Once route_request.py returns "decision": "local", send the request:

Ollama

bash
curl http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "YOUR_PROMPT", "stream": false}'

LM Studio / llamafile (OpenAI-compatible)

bash
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "local-model", "messages": [{"role": "user", "content": "YOUR_PROMPT"}]}'


Dashboard

The dashboard reads from ~/.openclaw/local-first-llm/savings.json (auto-created).

text
┌─────────────────────────────────────────┐
│    🧠  Local-First LLM — Dashboard      │
├─────────────────────────────────────────┤
│  Local LLM:  ✅  ollama (llama3.2...)   │
├─────────────────────────────────────────┤
│  Total requests:         42             │
│  Routed locally:         31  (73.8%)    │
│  Routed to cloud:        11             │
├─────────────────────────────────────────┤
│  Tokens saved:       84,200             │
│  Cost saved:           $0.4210          │
└─────────────────────────────────────────┘

Reset savings data:

bash
python3 skills/local-first-llm/scripts/track_savings.py reset


Additional References

Installation

Terminal bash

openclaw install local-first-llm
    
Copied!

💻Code Examples

python3 skills/local-first-llm/scripts/check_local.py

python3-skillslocal-first-llmscriptschecklocalpy.txt
Returns JSON: `{ "any_available": true, "best": { "provider": "ollama", "models": [...] } }`

### 2. Route a request

--local-provider ollama

---local-provider-ollama.txt
Returns: `{ "decision": "local", "reason": "...", "complexity_score": -1 }`

### 3. Log the outcome

After executing the request, record it:

python3 skills/local-first-llm/scripts/dashboard.py

python3-skillslocal-first-llmscriptsdashboardpy.txt
---

## Full Routing Workflow

└─────────────────────────────────────────────────────┘

.txt
---

## Routing Rules (Summary)

| Condition                                                                     | Route    |
| ----------------------------------------------------------------------------- | -------- |
| No local provider available                                                   | ☁️ Cloud |
| Prompt contains sensitive data (`password`, `secret`, `api key`, `ssn`, etc.) | 🏠 Local |
| Complexity score ≥ 3                                                          | ☁️ Cloud |
| Complexity score < 3                                                          | 🏠 Local |

For full scoring details, see [references/routing-logic.md](references/routing-logic.md).

---

## Executing with a Local Provider

Once `route_request.py` returns `"decision": "local"`, send the request:

### Ollama

-d '{"model": "local-model", "messages": [{"role": "user", "content": "YOUR_PROMPT"}]}'

--d-model-local-model-messages-role-user-content-yourprompt.txt
---

## Dashboard

The dashboard reads from `~/.openclaw/local-first-llm/savings.json` (auto-created).
example.sh
python3 skills/local-first-llm/scripts/route_request.py \
  --prompt "Summarize this meeting transcript" \
  --tokens 800 \
  --local-available \
  --local-provider ollama
example.sh
python3 skills/local-first-llm/scripts/track_savings.py log \
  --tokens 800 \
  --model gpt-4o \
  --routed-to local
example.txt
┌─────────────────────────────────────────────────────┐
│  1. check_local.py  →  is a local provider running? │
│                                                      │
│  2. route_request.py  →  local or cloud?             │
│     - sensitivity check  (private data → local)      │
│     - complexity score   (high score → cloud)        │
│     - availability gate  (no local → cloud)          │
│                                                      │
│  3. Execute with the chosen provider                 │
│                                                      │
│  4. track_savings.py log  →  record the outcome      │
│                                                      │
│  5. dashboard.py  →  show cumulative savings         │
└─────────────────────────────────────────────────────┘
example.sh
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "local-model", "messages": [{"role": "user", "content": "YOUR_PROMPT"}]}'
example.txt
┌─────────────────────────────────────────┐
│    🧠  Local-First LLM — Dashboard      │
├─────────────────────────────────────────┤
│  Local LLM:  ✅  ollama (llama3.2...)   │
├─────────────────────────────────────────┤
│  Total requests:         42             │
│  Routed locally:         31  (73.8%)    │
│  Routed to cloud:        11             │
├─────────────────────────────────────────┤
│  Tokens saved:       84,200             │
│  Cost saved:           $0.4210          │
└─────────────────────────────────────────┘

Tags

#coding_agents-and-ides #api

Quick Info

Category Development
Model Claude 3.5
Complexity One-Click
Author joelnishanth
Last Updated 3/10/2026
🚀
Optimized for
Claude 3.5
🧠

Ready to Install?

Get started with this skill in seconds

openclaw install local-first-llm