Skip to content

Use with Ollama

There are two ways to drive Chimera with Ollama, both supported as of v0.7.1:

PathEndpointAuthWhat runs the model
A. Ollama Cloud directhttps://ollama.comAuthorization: Bearer $OLLAMA_API_KEYOllama’s GPUs (their datacenter)
B. Local daemon (passthrough or local)http://localhost:11434none (or ollama signin for cloud passthrough)Local daemon → forwards :cloud ids to Ollama / keeps non-cloud ids local

Path A is simpler — no daemon, no ollama serve, just two env vars and any of the 7 CLIs. Path B is what you use if you also want truly local models on your own GPU.


Section titled “Path A — Ollama Cloud direct (recommended for cloud models)”
  1. Sign up at ollama.com
  2. Generate an API key at ollama.com/settings/keys
  3. Set two env vars and you’re done:
Terminal window
export OLLAMA_API_KEY=your_key_here
export OLLAMA_HOST=https://ollama.com

That’s it. No daemon, no ollama serve, no ollama pull. Requests go straight to Ollama’s API with bearer-token auth.

List the models your key has access to:

Terminal window
curl -s -H "Authorization: Bearer $OLLAMA_API_KEY" https://ollama.com/api/tags | jq '.models[].name'

You should see ~30-40 cloud models including gpt-oss:120b, glm-5.1, qwen3-coder:480b, kimi-k2.6, deepseek-v4-pro.

Terminal window
# Pick the codename for your task — chimera which --task "..." helps
chimera mink --model gpt-oss:120b-cloud -p "explain this repo"
chimera otter --model glm-5.1:cloud -p "summarize TODO comments"
chimera ferret --model deepseek-v4-pro --sandbox workspace-write -p "fix tests"
chimera weasel --model gpt-oss:120b-cloud -p "list py files in chimera/" --json
chimera shrew --model qwen3-coder:480b-cloud -p "explain this directory"
chimera stoat --model kimi-k2.6-cloud -p "what is 8 - 2?"
chimera badger --model glm-5.1:cloud -p "verify the test suite passes"

The -cloud and :cloud suffixes work interchangeably. When $OLLAMA_HOST points at ollama.com, Chimera strips the suffix automatically for the direct API. When pointed at a local daemon, it keeps the suffix so the daemon knows to forward.

Every run is event-sourced. Inspect and resume:

Terminal window
chimera mink runs list
chimera mink runs show mink-20260514T034436-46db8b28
chimera resume <run-id> # auto-detects which CLI saved it

Path B — Local daemon (for truly local models + cloud passthrough)

Section titled “Path B — Local daemon (for truly local models + cloud passthrough)”
  1. Download Ollama and run ollama serve (or just let the app boot).
  2. Optional: pull local models. ollama pull qwen3-coder-30b, ollama pull glm-4.7-flash, etc.
  3. Optional: ollama signin to enable :cloud-suffixed passthrough.
  4. Point Chimera at the daemon. Either drop in via the Anthropic-compatible adapter:
Terminal window
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
chimera mink --model glm-5.1:cloud -p "..."

…or use the native Ollama provider (and don’t touch ANTHROPIC_*):

Terminal window
unset OLLAMA_HOST OLLAMA_API_KEY ANTHROPIC_BASE_URL
chimera shrew --model qwen3-coder:30b -p "..." # local, free, your hardware
chimera shrew --model glm-5.1:cloud -p "..." # daemon forwards to Ollama Cloud

Use this path when you want any of:

  • A local model running on your own GPU / Apple Silicon
  • The daemon’s keep_alive model caching for many small turns
  • SSH-key based cloud auth (handled by ollama signin)

Section titled “Recommended models (as of 2026-05-13, on ollama.com)”
Model idHostingApprox contextBest for
gpt-oss:120b-cloudCloud128kGeneral-purpose, fast, free at time of writing
glm-5.1:cloudCloud128k+Coding / refactoring; Chimera benchmarks against it
kimi-k2.6:cloudCloud200k+Long sessions, tool-use heavy
kimi-k2-thinkingCloud1M+Extended thinking, reasoning-heavy
qwen3-coder:480b-cloudCloud131kPure coding tasks
deepseek-v4-proCloud262kLatest DeepSeek; reasoning + code
minimax-m2.7:cloudCloud1M+Very long contexts
qwen3-coderLocal (~60GB)131kLocal coding on Apple Silicon / GPU
glm-4.7-flashLocal128kLocal coding, smaller footprint

Browse the full live catalog:

Terminal window
curl -s -H "Authorization: Bearer $OLLAMA_API_KEY" https://ollama.com/api/tags | jq '.models[] | {name, size_gb: (.size / 1e9 | round)}'

  • chimera mink — TUI-first, slash-command palette. Reads ~/.claude/settings.json for hooks / permissions / MCP. Best for interactive sessions where you want a REPL with rich rendering.
  • chimera otter — Multi-client server. Three transports (one-shot, HTTP+SSE, ACP). Use when you want a long-running agent server other tools connect to.
  • chimera ferret — Sandbox-first. --sandbox read-only|workspace-write|danger-full-access + --approval suggest|auto|yolo. Best for destructive tasks where you want a safety net.
  • chimera weasel — Minimal harness. Four modes: interactive REPL, -p print, JSON-RPC over stdio, embedded SDK. Best for scripting / pipelines / programmatic use.
  • chimera shrew — Tuned for small local models. Smaller default --max-steps, restricted default tools, output-parser repair for text-mode tool calls, quality monitor for empty/hallucinated responses.
  • chimera stoat — Shell-mode toggle (Ctrl-X). Each REPL line either feeds the LLM or runs as a bash command. Mode-tagged history.
  • chimera badger — Strict harness posture. Tighter --max-steps, --rerun-on-failure, parity tracking. Best for repeatable / regression-style work.

Run chimera agents to see this same matrix any time. Or chimera which --task "describe what you want" to get a recommendation.


HTTP 401 — Auth rejected by upstream

Either $OLLAMA_API_KEY is wrong, expired, or unset. Verify with the curl above.

HTTP 404 ... /v1/chat/completions

Your shell still has a stale $ANTHROPIC_BASE_URL pointing at localhost:11434 while no daemon is running. unset ANTHROPIC_BASE_URL ANTHROPIC_AUTH_TOKEN.

HTTP 503 — Upstream issue

Ollama’s upstream is having trouble. The provider returns a friendly error and a hint. Retry, or switch model: --model glm-5.1:cloud is usually up when others are down.

stoat: $MOONSHOT_API_KEY is required for kimi-* models (fixed in v0.7.1+)

Use the -cloud or :cloud suffix: chimera stoat --model kimi-k2.6-cloud. Bare kimi ids still need a Moonshot key.

Models are slow on first call

The daemon (path B) does cold-loads. Subsequent calls are fast due to keep_alive. Path A has no cold-load.


PathCostPrivacy
A: Cloud directFree at time of writing (Ollama’s loss-leader pricing). May change.Calls + payloads go to Ollama’s servers.
B: Local daemon, non-cloud model$0Stays on your machine.
B: Local daemon, :cloud modelSame as A — the daemon just proxies.Same as A.

For sensitive code, prefer Path B with a non-cloud model.