Ramalama Cli
Run and interact with AI agents.
- Rating
- 4.2 (291 reviews)
- Downloads
- 8,566 downloads
- Version
- 1.0.0
Overview
Run and interact with AI agents.
Complete Documentation
View Source →
Ramalama CLI
Use when an alternative AI agent is better suited to a task. For example, working with sensitive data or solving simple tasks with a cheap and local agent, or accessing specialist models with unique capabilities.
Overview
Use this skill to execute ramalama tasks in a consistent, low-risk workflow.
Prefer local discovery (--help, local config files, existing project scripts) before making assumptions about flags or runtime defaults.
Prefer ramalama when tasks need:
- flexible model sourcing (
hf://,oci://,rlcr://,url://) - containerized local inference with runtime/network/device controls
- RAG data packaging and serving
- benchmark/perplexity evaluation
- model conversion and registry push/pull flows
Preflight
Run these checks before first invocation in a session:
ramalama version
podman info >/dev/null 2>&1 || docker info >/dev/null 2>&1
ramalama run --help
If serving on default port, verify availability:
lsof -i :8080
Decision Matrix
- One-shot inference:
ramalama run" " - Interactive chat loop:
ramalama run - Serve OpenAI-compatible endpoint:
ramalama serve - Query an existing endpoint:
ramalama chat --url" " - Build knowledge bundle from files/URLs:
ramalama rag - Evaluate model performance/quality:
ramalama benchandramalama perplexity - Inspect/source lifecycle operations:
inspect,pull,push,convert,list,rm
Usage
Start with top-level discovery:
ramalama --help
ramalama version
Apply global options before the subcommand when needed:
ramalama [--debug|--quiet] [--dryrun] [--engine podman|docker] [--nocontainer] [--runtime llama.cpp|vllm|mlx] [--store <path>] <subcommand> ...
Use command-level help before invoking unknown flags:
ramalama <subcommand> --help
Known-Good Recipes
1) One-shot run
ramalama run granite3.3:2b "Summarize this in 3 bullets: <text>"
2) Detached service + API call
ramalama serve -d granite3.3:2b
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"granite3.3:2b","messages":[{"role":"user","content":"Hello"}]}'
3) Direct Hugging Face source
ramalama serve hf://unsloth/gemma-3-270m-it-GGUF
4) RAG package then query
ramalama rag ./docs my-rag
ramalama run --rag my-rag granite3.3:2b "What are the auth requirements?"
5) Benchmark and list benchmark history
ramalama bench granite3.3:2b
ramalama benchmarks list
Reliability Defaults
For agent automation, prefer explicit and deterministic flags:
ramalama --engine podman run -c 4096 --pull missing granite3.3:2b "<prompt>"
Recommended defaults:
- set
--engineexplicitly when environment is mixed - start with smaller
-c/--ctx-sizeon constrained hosts - use
--pull missingfor faster repeat runs - use one-shot non-interactive invocation for scripts
Troubleshooting
- Docker socket unavailable:
- verify Docker is running, or use
--engine podman - Podman socket unavailable:
- check
podman machine listand start machine if needed timed outduring startup:- inspect container logs:
podman logs - reduce context (
-c 4096) and retry - memory allocation failure:
- use a smaller model and/or lower context size
- port conflict on 8080:
- choose alternate port via
-p
Notes
serveexposes an OpenAI-compatible endpoint for external clients.- Prefer JSON output flags where available (
list --json,inspect --json) for robust parsing in automation. - Use
ramalama chat --urlwhen the model is already served elsewhere.
Installation
openclaw install ramalama-cli
💻Code Examples
lsof -i :8080
## Decision Matrix
- One-shot inference: `ramalama run <model> "<prompt>"`
- Interactive chat loop: `ramalama run <model>`
- Serve OpenAI-compatible endpoint: `ramalama serve <model>`
- Query an existing endpoint: `ramalama chat --url <url> "<prompt>"`
- Build knowledge bundle from files/URLs: `ramalama rag <paths...> <destination>`
- Evaluate model performance/quality: `ramalama bench <model>` and `ramalama perplexity <model>`
- Inspect/source lifecycle operations: `inspect`, `pull`, `push`, `convert`, `list`, `rm`
## Usage
Start with top-level discovery:ramalama <subcommand> --help
## Known-Good Recipes
### 1) One-shot runramalama benchmarks list
## Reliability Defaults
For agent automation, prefer explicit and deterministic flags:ramalama version
podman info >/dev/null 2>&1 || docker info >/dev/null 2>&1
ramalama run --helpramalama serve -d granite3.3:2b
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"granite3.3:2b","messages":[{"role":"user","content":"Hello"}]}'Tags
Quick Info
Ready to Install?
Get started with this skill in seconds
Related Skills
4claw
4claw — a moderated imageboard for AI agents.
Aap Passport
Agent Attestation Protocol - The Reverse Turing Test.
Acestep Lyrics Transcription
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API.
Adaptive Suite
A continuously adaptive skill suite that empowers Clawdbot.