Weasel Providers
Providers — what weasel can talk to
Section titled “Providers — what weasel can talk to”chimera weasel reuses Chimera’s standard provider stack
(chimera/providers/factory.py), so any provider that mink, otter,
or ferret can drive, weasel can drive. The difference is in defaults
and the resolution chain: weasel’s resolver
(chimera/weasel/providers.py) prefers the broadest reach — it will
fall back to a local Ollama daemon if no hosted key is set, so the
zero-config path works on a fresh laptop.
This page is the weasel-specific layer. For the deep, line-numbered
tour of every adapter, see
docs/mink/providers.md. Same code, deeper
notes.
Resolution order
Section titled “Resolution order”build_provider(args) walks this chain on every weasel invocation,
first match wins:
- Explicit
args.model(CLI--model <id>, SDKmodel=). $WEASEL_MODELenvironment variable.$ANTHROPIC_API_KEYset → defaults toclaude-sonnet-4-6.$OPENAI_API_KEYset → defaults togpt-4o.$OPENROUTER_API_KEYset → defaults toanthropic/claude-sonnet-4.- Local Ollama daemon reachable on
$OLLAMA_HOST(defaulthttp://localhost:11434) → first installed tag. - Friendly error pointing at the env vars above.
Explicit beats env beats default. So
chimera weasel --model gpt-4o -p "..." works even when
$ANTHROPIC_API_KEY is set.
Anthropic (default for hosted)
Section titled “Anthropic (default for hosted)”The first-class hosted target. claude-sonnet-4-6 is the default
when $ANTHROPIC_API_KEY is set: it streams, tool-calls cleanly,
supports extended thinking, and has prompt caching for long sessions.
- Setup:
Terminal window uv sync --extra anthropicexport ANTHROPIC_API_KEY=sk-ant-... - Use:
Terminal window chimera weasel -p "review this PR"chimera weasel --model claude-opus-4 -p "long-form refactor" - Wired: streaming, tool calls, async, extended thinking,
prompt caching, vision. See
chimera/providers/anthropic.py. - Anthropic-compatible endpoints: the same provider also accepts
ANTHROPIC_BASE_URL+ANTHROPIC_AUTH_TOKEN, so weasel can route through GLM-4.6, Moonshot, or any third-party gateway that speaks the Messages API:Terminal window export ANTHROPIC_BASE_URL=https://api.z.ai/v1/anthropicexport ANTHROPIC_AUTH_TOKEN=...chimera weasel --model glm-4.6 -p "..."
OpenAI
Section titled “OpenAI”Use when you have an OpenAI key and want gpt-4o, o3, or any
GPT-5-class model weasel recognizes.
- Setup:
Terminal window uv sync --extra openaiexport OPENAI_API_KEY=sk-... - Use:
Terminal window chimera weasel --model gpt-4o -p "draft a release note"chimera weasel --model o3-mini -p "prove this invariant" - Wired: native streaming, tool calls, async (
AsyncOpenAI), reasoning-token tracking for o-series, prompt-cache hit accounting, vision (gpt-4o), JSON mode.
OpenRouter
Section titled “OpenRouter”OpenRouter is one of weasel’s first-class targets because the
vendor/name model id convention
(anthropic/claude-sonnet-4, google/gemini-2.5-pro,
meta-llama/llama-3.3-70b) lets a single key fan out across
providers.
Routing rule: when $OPENROUTER_API_KEY is set and the resolved
model id contains a /, weasel hands it to the OpenAI-compatible
adapter pointed at https://openrouter.ai/api/v1. A bare
claude-sonnet-4-6 with both $OPENROUTER_API_KEY and
$ANTHROPIC_API_KEY set still goes direct to Anthropic — the /
separator is the explicit signal.
- Setup:
Terminal window export OPENROUTER_API_KEY=sk-or-... - Use:
Terminal window chimera weasel --model anthropic/claude-sonnet-4 -p "..."chimera weasel --model google/gemini-2.5-pro -p "..."chimera weasel --model meta-llama/llama-3.3-70b -p "..." - Wired: non-streaming
complete()plus base-class shim for streaming. Tool calls forwarded as standard OpenAI deltas. No provider-side caching.
Ollama (local)
Section titled “Ollama (local)”Same OllamaProvider mink and otter use, same /api/chat endpoint,
same keep_alive: 60m, same :cloud tag handling. Weasel falls
back to Ollama automatically when no hosted key is set, which makes
the zero-config laptop story work:
brew install ollamaollama serve &ollama pull qwen3:32bchimera weasel -p "explain this repo" # picks qwen3:32b- Use:
Terminal window chimera weasel --model qwen3:32b -p "summarize"chimera weasel --model glm-5.1:cloud -p "long-context refactor"chimera weasel --model kimi-k2.6:cloud -p "deep reasoning" - Wired: native streaming over NDJSON, tool calls (
tool_callsondone:falsechunks),think:trueforkimi*tags, per-requestnum_ctx, configurableOLLAMA_HOSTfor remote daemons.
llama.cpp (llama-server)
Section titled “llama.cpp (llama-server)”Direct integration with llama.cpp’s OpenAI-compatible HTTP server.
Useful for running quantized GGUF models on the metal without going
through Ollama.
./llama-server -m ./models/qwen3-32b.Q4_K_M.gguf --port 8080chimera weasel \ --model qwen3-32b \ --base-url http://localhost:8080/v1 \ -p "list files"- Wired: non-streaming + streaming chat completions, tool calls (when the model supports them), no caching, no thinking.
- The OpenAI-compatible adapter (
chimera/providers/compatible.py) is what does the work;--base-urlis the only extra flag.
Modal-hosted vLLM container exposing OpenAI-shape /v1/chat/completions.
Weasel inherits the same adapter as mink — useful when you’ve stood
up an open-weight model on Modal.
- Setup:
Terminal window pip install modal httpxmodal token new - Use:
--modelonly auto-routes Anthropic / OpenAI / OpenRouter / Ollama. To call Modal, build the provider in Python:from chimera.providers import create_providerprovider = create_provider("modal", model="meta-llama/Llama-3.3-70B",base_url="https://your-org--llm-app-serve.modal.run/v1",)
Custom providers
Section titled “Custom providers”Implement Provider (chimera/providers/base.py) and call
register_provider("my-name", factory)
(chimera/providers/registry.py). Factory signature:
factory(model=..., api_key=..., base_url=..., **kw). After
registration, create_provider("my-name", model=...) works
identically to the built-ins.
To make weasel pick your provider automatically:
- Hand it an explicit model with a recognizable prefix (extend
_infer_provider), or - Construct the provider yourself and pass it to the embedded SDK
Agentconstructor:from chimera.weasel.sdk import Agentfrom chimera.providers import create_providerprovider = create_provider("my-name", model="my-1.0")agent = Agent(provider=provider)
Self-registration on import (e.g. in your package __init__.py)
mirrors the built-ins.
Choosing a provider
Section titled “Choosing a provider”| Concern | Pick |
|---|---|
| Default, “just works” | anthropic (claude-sonnet-4-6) |
| One key, many vendors | openrouter (anthropic/..., google/..., meta-llama/...) |
| Privacy / local | ollama (qwen3:32b) or llama.cpp |
| Cheap + fast | compatible against Groq / DeepSeek / Together |
| Vision-heavy | anthropic (claude) or openai (gpt-4o) |
| Long context (>200k) | google (Gemini 1M), anthropic (200k), Kimi (262k) |
| Reasoning-tuned | openai (o3, o3-mini) or anthropic (claude-opus + thinking) |
Mixing providers in one session
Section titled “Mixing providers in one session”Weasel holds one provider per process. To swap mid-session, exit
and re-launch with a different --model or different env. The REPL
/model slash command cycles through the model list passed via
--models <a>,<b>,<c> — it rebuilds the provider on each switch.
In RPC mode, the host process can spawn a fresh chimera weasel --mode rpc subprocess per provider it needs to drive.
Auth storage
Section titled “Auth storage”Weasel shares Chimera’s credential storage with the rest of the CLI:
| Path | Source | Mode |
|---|---|---|
~/.chimera/credentials.json | OAuth-issued tokens, refresh tokens | 0o600 |
~/.chimera/auth.json | AuthManager.set_token() | default |
CredentialStore._write chmods to 0o600 after each save
(chimera/auth/store.py).
Proxy mode for teams
Section titled “Proxy mode for teams”For an org-wide gateway sitting in front of Anthropic-shaped providers:
export ANTHROPIC_BASE_URL=http://proxy.internal:8000export ANTHROPIC_AUTH_TOKEN=team-issued-jwtchimera weasel --model claude-sonnet-4-6 -p "..."AnthropicProvider honors both env vars. The dedicated proxy
provider (chimera/providers/proxy.py) is the alternative when your
gateway speaks its own JSON shape rather than the Anthropic wire
protocol.
--list-models
Section titled “--list-models”Weasel ships a --list-models flag that asks each configured
provider for its catalogue and prints the union:
chimera weasel --list-modelschimera weasel --list-models --json | jq '.[] | select(.provider == "anthropic")'The list is provider-driven (Anthropic and OpenAI return live catalogues; Ollama returns installed tags) and updates every time the CLI is launched — there’s no static model file to keep current.
Troubleshooting
Section titled “Troubleshooting”| Symptom | Likely cause / fix |
|---|---|
weasel: no provider configured | Set one of $ANTHROPIC_API_KEY, $OPENAI_API_KEY, $OPENROUTER_API_KEY, or pass --model <id> / $WEASEL_MODEL. Or ollama serve. |
401 / 403 | Wrong key. printenv | grep -E '(ANTHROPIC|OPENAI|OPENROUTER)_' to verify. |
| OpenRouter not used despite key | Model id needs the vendor/name / separator. |
ImportError: pip install chimera-run[anthropic] | uv sync --extra anthropic to pull the SDK. |
Cannot infer provider from model name '...' | Pass --model <id> with a known prefix (claude-*, gpt-*, gemini-*, glm-*, kimi-*, qwen*, …). |
| Streaming hangs on first call | Cloud cold start. Anthropic / OpenAI typically warm in <2s; Ollama Cloud needs keep_alive: 60m. |
tool_calls always empty on Ollama | You hit /v1/chat/completions instead of /api/chat. Set OLLAMA_HOST to the daemon root, not .../v1. |
| llama.cpp returns 404 | Confirm the OpenAI compat path. llama-server exposes /v1/chat/completions by default; pass that as the base URL. |
See also
Section titled “See also”modes.md— provider behavior is identical across modes.docs/mink/providers.md— line-numbered deep dive into every adapter.security-and-trademarks.md— auth storage + redaction posture.