✓ Verified 💻 Development ✓ Enhanced Data

Gradient Inference

Community skill (unofficial) for DigitalOcean Gradient AI Serverless Inference.

Rating: 4.6 (307 reviews)
Downloads: 8,179 downloads
Version: 1.0.0

Overview

Community skill (unofficial) for DigitalOcean Gradient AI Serverless Inference.

Complete Documentation

View Source →

🦞 Gradient AI — Serverless Inference

⚠️ This is an unofficial community skill, not maintained by DigitalOcean. Use at your own risk.

"Why manage GPUs when the ocean provides?" — ancient lobster proverb

Use DigitalOcean's Gradient Serverless Inference to call large language models without managing infrastructure. The API is OpenAI-compatible, so standard SDKs and patterns work — just point at https://inference.do-ai.run/v1 and swim.

Authentication

All requests need a Model Access Key in the Authorization: Bearer header.

bash

export GRADIENT_API_KEY="your-model-access-key"

Where to get one: DigitalOcean Console → Gradient AI → Model Access Keys → Create Key.

📖 Full auth docs

Tools

🔍 List Available Models

Window-shop for LLMs before you swipe the card.

bash

python3 gradient_models.py                    # Pretty table
python3 gradient_models.py --json             # Machine-readable
python3 gradient_models.py --filter "llama"   # Search by name

Use this before hardcoding model IDs — models are added and deprecated over time.

Direct API call:

bash

curl -s https://inference.do-ai.run/v1/models \
  -H "Authorization: Bearer $GRADIENT_API_KEY" | python3 -m json.tool

📖 Models reference

💬 Chat Completions

The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.

bash

python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --system "You are a helpful assistant." \
  --prompt "Explain serverless inference in one paragraph."

# Different model
python3 gradient_chat.py \
  --model "llama3.3-70b-instruct" \
  --prompt "Write a haiku about cloud computing."

Direct API call:

bash

curl -s https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'

📖 Chat Completions docs

⚡ Responses API (Recommended)

DigitalOcean's recommended endpoint for new integrations. Simpler request format and supports prompt caching — a.k.a. "stop paying twice for the same context."

bash

# Basic usage
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Summarize this earnings report." \
  --responses-api

# With prompt caching (saves cost on follow-up queries)
python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --prompt "Now compare it to last quarter." \
  --responses-api --cache

Direct API call:

bash

curl -s https://inference.do-ai.run/v1/responses \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "input": "Explain prompt caching.",
    "store": true
  }'

When to use which:

Chat Completions	Responses API
Request format	Array of messages with roles	Single input string
Prompt caching	❌	✅ via store: true
Multi-step tool use	Manual	Built-in
Best for	Structured conversations	Simple queries, cost savings

📖 Responses API docs

🖼️ Generate Images

Turn text prompts into images. Because sometimes a chart isn't enough.

bash

python3 gradient_image.py --prompt "A lobster trading stocks on Wall Street"
python3 gradient_image.py --prompt "Sunset over the NYSE" --output sunset.png
python3 gradient_image.py --prompt "Fintech logo" --json

Direct API call:

bash

curl -s https://inference.do-ai.run/v1/images/generations \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "dall-e-3",
    "prompt": "A lobster analyzing candlestick charts",
    "n": 1
  }'

📖 Image generation docs

🧠 Model Selection Guide

Not all models are created equal. Choose wisely, young crustacean:

Model	Best For	Speed	Quality	Context
openai-gpt-oss-120b	Complex reasoning, analysis, writing	Medium	★★★★★	128K
llama3.3-70b-instruct	General tasks, instruction following	Fast	★★★★	128K
deepseek-r1-distill-llama-70b	Math, code, step-by-step reasoning	Slow	★★★★★	128K
qwen3-32b	Quick triage, short tasks	Fastest	★★★	32K

🦞 Pro tip: Cost-aware routing. Use a fast model (e.g., qwen3-32b) to score or triage, then only escalate to a strong model (e.g., openai-gpt-oss-120b) when depth is needed. Enable prompt caching for repeated context.

Always run python3 gradient_models.py to check what's currently available — the menu changes.

📖 Available models

💰 Model Pricing Lookup

Check what models cost before you rack up a bill. Scrapes the official DigitalOcean pricing page — no API key needed.

bash

python3 gradient_pricing.py                    # Pretty table
python3 gradient_pricing.py --json             # Machine-readable
python3 gradient_pricing.py --model "llama"    # Filter by model name
python3 gradient_pricing.py --no-cache         # Skip cache, fetch live

How it works:

Fetches live pricing from DigitalOcean's docs (public page, no auth)
Caches results for 24 hours in /tmp/gradient_pricing_cache.json
Falls back to a bundled snapshot if the live fetch fails

🦞 Pro tip: Run python3 gradient_pricing.py --model "gpt-oss" before choosing a model to see the cost difference between gpt-oss-120b ($0.10/$0.70) and gpt-oss-20b ($0.05/$0.45) per 1M tokens.

📖 Pricing docs

CLI Reference

All scripts accept --json for machine-readable output.

text

gradient_models.py   [--json] [--filter QUERY]
gradient_chat.py     --prompt TEXT [--model ID] [--system TEXT]
                     [--responses-api] [--cache] [--temperature F]
                     [--max-tokens N] [--json]
gradient_image.py    --prompt TEXT [--model ID] [--output PATH]
                     [--size WxH] [--json]
gradient_pricing.py  [--json] [--model QUERY] [--no-cache]

External Endpoints

Endpoint	Purpose
https://inference.do-ai.run/v1/models	List available models
https://inference.do-ai.run/v1/chat/completions	Chat Completions API
https://inference.do-ai.run/v1/responses	Responses API (recommended)
https://inference.do-ai.run/v1/images/generations	Image generation
https://docs.digitalocean.com/.../pricing/	Pricing page (scraped, public)

Security & Privacy

All requests go to inference.do-ai.run — DigitalOcean's own endpoint
Your GRADIENT_API_KEY is sent as a Bearer token in the Authorization header
No other credentials or local data leave the machine
Model Access Keys are scoped to inference only — they can't manage your DO account
Prompt caching entries are scoped to your account and automatically expire

Trust Statement

By using this skill, prompts and data are sent to DigitalOcean's Gradient Inference API.

Only install if you trust DigitalOcean with the content you send to their LLMs.

Important Notes

Run python3 gradient_models.py before assuming a model exists — they rotate
All scripts exit with code 1 and print errors to stderr on failure

Installation

Terminal bash


openclaw install gradient-inference

Copied!

💻Code Examples

export GRADIENT_API_KEY="your-model-access-key"

export-gradientapikeyyour-model-access-key.txt

**Where to get one:** [DigitalOcean Console](https://cloud.digitalocean.com) → Gradient AI → Model Access Keys → Create Key.

📖 *[Full auth docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#create-a-model-access-key)*

---

## Tools

### 🔍 List Available Models

Window-shop for LLMs before you swipe the card.

python3 gradient_models.py --filter "llama" # Search by name

python3-gradientmodelspy---filter-llama--search-by-name.txt

Use this before hardcoding model IDs — models are added and deprecated over time.

**Direct API call:**

-H "Authorization: Bearer $GRADIENT_API_KEY" | python3 -m json.tool

--h-authorization-bearer-gradientapikey--python3--m-jsontool.txt

📖 *[Models reference](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*

---

### 💬 Chat Completions

The classic. Send structured messages (system/user/assistant roles), get a response. OpenAI-compatible, so you probably already know how this works.

}'

-.txt

📖 *[Chat Completions docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#chat-completions)*

---

### ⚡ Responses API (Recommended)

DigitalOcean's [recommended endpoint](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api) for new integrations. Simpler request format and supports **prompt caching** — a.k.a. "stop paying twice for the same context."

}'

-.txt

**When to use which:**
| | Chat Completions | Responses API |
|---|---|---|
| **Request format** | Array of messages with roles | Single `input` string |
| **Prompt caching** | ❌ | ✅ via `store: true` |
| **Multi-step tool use** | Manual | Built-in |
| **Best for** | Structured conversations | Simple queries, cost savings |

📖 *[Responses API docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#responses-api)*

---

### 🖼️ Generate Images

Turn text prompts into images. Because sometimes a chart isn't enough.

}'

-.txt

📖 *[Image generation docs](https://docs.digitalocean.com/products/gradient-ai-platform/how-to/use-serverless-inference/#image-generation)*

---

## 🧠 Model Selection Guide

Not all models are created equal. Choose wisely, young crustacean:

| Model | Best For | Speed | Quality | Context |
|-------|----------|-------|---------|---------|
| `openai-gpt-oss-120b` | Complex reasoning, analysis, writing | Medium | ★★★★★ | 128K |
| `llama3.3-70b-instruct` | General tasks, instruction following | Fast | ★★★★ | 128K |
| `deepseek-r1-distill-llama-70b` | Math, code, step-by-step reasoning | Slow | ★★★★★ | 128K |
| `qwen3-32b` | Quick triage, short tasks | Fastest | ★★★ | 32K |

> **🦞 Pro tip: Cost-aware routing.** Use a fast model (e.g., `qwen3-32b`) to score or triage, then only escalate to a strong model (e.g., `openai-gpt-oss-120b`) when depth is needed. Enable prompt caching for repeated context.

Always run `python3 gradient_models.py` to check what's currently available — the menu changes.

📖 *[Available models](https://docs.digitalocean.com/products/gradient-ai-platform/details/models/)*

---

### 💰 Model Pricing Lookup

Check what models cost *before* you rack up a bill. Scrapes the official [DigitalOcean pricing page](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/) — no API key needed.

python3 gradient_pricing.py --no-cache # Skip cache, fetch live

python3-gradientpricingpy---no-cache--skip-cache-fetch-live.txt

**How it works:**
- Fetches live pricing from DigitalOcean's docs (public page, no auth)
- Caches results for 24 hours in `/tmp/gradient_pricing_cache.json`
- Falls back to a bundled snapshot if the live fetch fails

> **🦞 Pro tip:** Run `python3 gradient_pricing.py --model "gpt-oss"` before choosing a model to see the cost difference between `gpt-oss-120b` ($0.10/$0.70) and `gpt-oss-20b` ($0.05/$0.45) per 1M tokens.

📖 *[Pricing docs](https://docs.digitalocean.com/products/gradient-ai-platform/details/pricing/)*

---

## CLI Reference

All scripts accept `--json` for machine-readable output.

example.sh

python3 gradient_models.py                    # Pretty table
python3 gradient_models.py --json             # Machine-readable
python3 gradient_models.py --filter "llama"   # Search by name

example.sh

python3 gradient_chat.py \
  --model "openai-gpt-oss-120b" \
  --system "You are a helpful assistant." \
  --prompt "Explain serverless inference in one paragraph."

# Different model
python3 gradient_chat.py \
  --model "llama3.3-70b-instruct" \
  --prompt "Write a haiku about cloud computing."

example.sh

curl -s https://inference.do-ai.run/v1/chat/completions \
  -H "Authorization: Bearer $GRADIENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-oss-120b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 1000
  }'