Mink Providers
Providers — what mink can talk to
Section titled “Providers — what mink can talk to”chimera mink defaults to Ollama (kimi-k2.6:cloud, see
chimera/mink/cli.py:34), but the underlying provider stack is pluggable
through chimera.providers.create_provider()
(chimera/providers/factory.py:36). Six adapters self-register at import
time via _ensure_builtins_registered()
(chimera/providers/registry.py:55); a seventh (proxy) is registered by
chimera/providers/proxy.py:119.
Built-in registry
Section titled “Built-in registry”| Name | Module | Self-registers at |
|---|---|---|
anthropic | chimera/providers/anthropic.py | line 437 |
openai | chimera/providers/openai.py | line 375 |
google | chimera/providers/google.py | line 165 |
ollama | chimera/providers/ollama.py | line 392 |
compatible | chimera/providers/compatible.py | line 150 |
modal | chimera/providers/modal.py | line 171 |
proxy | chimera/providers/proxy.py | line 119 |
create_provider() picks one of these by:
- explicit
provider_type=, - prefix on the model name (
claude-*,gpt-*,gemini-*,glm-*,kimi-*,llama*,qwen*,mistral*,phi*), ProviderCataloglookup (chimera/providers/catalog.py:79),- loose env-var fallback.
See _infer_provider() (chimera/providers/factory.py:113) for the exact
order, including the env-var override that routes anything with
ANTHROPIC_BASE_URL or ANTHROPIC_AUTH_TOKEN set through the Anthropic
adapter.
Ollama (default for mink)
Section titled “Ollama (default for mink)”Local or Ollama-Cloud models served by the ollama daemon. The adapter
talks to /api/chat (NOT the OpenAI-compat shim at /v1/..., which
silently drops tool_calls from streaming chunks; see comment at
chimera/providers/ollama.py:25-32).
- Use when: running locally for privacy, or hitting Ollama Cloud
(
*:cloudtags) for hosted Kimi / Qwen3 / GLM-4.6. - Setup:
Terminal window brew install ollama # or: curl -fsSL https://ollama.com/install.sh | shollama serve & # daemon on :11434ollama signin # only needed for :cloud tagsollama pull qwen3:32b # local fallback - Auth options: none for local; Ollama Cloud uses
ollama signin(browser flow stored in the daemon, not in~/.chimera/). - Use with mink:
Terminal window chimera mink --model kimi-k2.6:cloud -p "list files then read README.md"chimera mink --model qwen3:32b -p "explain this repo" - What’s wired: streaming yes (NDJSON over
/api/chat,chimera/providers/ollama.py:214); tool calls yes (nativetool_callsfield,chimera/providers/ollama.py:269-295); thinking yes forkimi*only (chimera/providers/ollama.py:50); JSON-mode no on cloud Kimi (server ignoresformatfield, seequickstart.md); vision varies by model (Kimi 2.6 weak); prompt caching no. - Defaults:
num_ctx=131072,keep_alive="60m",OLLAMA_HOSTenv override,262144ctx auto-bumped forkimi*(seechimera/mink/cli.py:300). - Limits:
tool_choice="required"is silently dropped whenthink:true(chimera/providers/ollama.py:103);:cloudweights live on Moonshot/Ollama infra; first cold-start call can take 10-30s.
Anthropic
Section titled “Anthropic”Anthropic SDK against the official api.anthropic.com endpoint, OR
against any Anthropic-wire-compatible endpoint via ANTHROPIC_BASE_URL
(GLM-4.6 via api.z.ai, Moonshot, Ollama’s Anthropic-compat shim, etc.).
- Use when: you have an Anthropic API key, OR you have a third-party provider that speaks the Anthropic Messages API.
- Setup:
Terminal window uv sync --extra anthropic # installs the anthropic SDKexport ANTHROPIC_API_KEY=sk-ant-...# OR, for an Anthropic-compat endpoint:export ANTHROPIC_BASE_URL=https://api.z.ai/v1/anthropicexport ANTHROPIC_AUTH_TOKEN=... - Auth options:
ANTHROPIC_API_KEY— first-class env var (chimera/providers/anthropic.py:53)ANTHROPIC_AUTH_TOKEN— Bearer-token alias for OAuth-issued tokens or third-party endpoints (same line, fallback)AuthManager.get_token("anthropic")— pulls from~/.chimera/auth.jsonor environment (chimera/auth/manager.py:36-37,chimera/providers/anthropic.py:48)- OAuth device or browser flow via
OAuthDeviceFlow/OAuthBrowserFlow(chimera/auth/oauth.py:20,chimera/auth/oauth.py:182); tokens persist to~/.chimera/credentials.json(mode0o600,chimera/auth/store.py:62)
- Use with mink:
Terminal window chimera mink --model claude-sonnet-4-20250514 -p "review this PR" - What’s wired: streaming yes (
chimera/providers/anthropic.py:204); tool calls yes; native async (AsyncAnthropic,chimera/providers/anthropic.py:345); extended thinking yes (enable_thinking=True,chimera/providers/anthropic.py:120-125); prompt caching yes (opt-in viaenable_cache=True,chimera/providers/anthropic.py:131-143); vision yes (Claude models); JSON mode via tool-calling pattern. - Limits: when
enable_thinkingis on, temperature is forced to 1 (chimera/providers/anthropic.py:125);ImportErrorifchimera-run[anthropic]is not installed.
OpenAI
Section titled “OpenAI”The official openai SDK against api.openai.com, or any
/v1/chat/completions endpoint via base_url=.
- Use when: GPT-4o, o1, o3, Codex, or any provider that ships an OpenAI-SDK-compatible endpoint and you want native streaming + reasoning-token tracking.
- Setup:
Terminal window uv sync --extra openai # installs the openai SDKexport OPENAI_API_KEY=sk-... - Auth options:
OPENAI_API_KEY(chimera/providers/openai.py:52)AuthManager.get_token("openai")(chimera/providers/openai.py:46-50)
- Use with mink:
Terminal window chimera mink --model gpt-4o -p "draft a release note"chimera mink --model o3-mini -p "prove this invariant" - What’s wired: streaming yes (
chimera/providers/openai.py:142); tool calls yes (delta-accumulated byindex,chimera/providers/openai.py:174-213); native async (AsyncOpenAI,chimera/providers/openai.py:222); reasoning-token tracking for o-series models (chimera/providers/openai.py:96-99); prompt-cache hit accounting (chimera/providers/openai.py:100-104); vision yes (gpt-4o); JSON mode yes (via standard SDK options the adapter passes through unchanged). - Limits: thinking-level enum is accepted but not forwarded — the provider relies on the OpenAI SDK’s native reasoning behavior; no Anthropic-style cache control.
OpenAI-compatible
Section titled “OpenAI-compatible”Generic adapter for any /v1/chat/completions endpoint. Used by
OpenRouter, Together, Fireworks, Groq, vLLM, SGLang, LM Studio, LiteLLM,
Anthropic Coding API in OpenAI-compat mode, and bedrock//azure//etc.
catalog entries (chimera/providers/catalog.py:57-76).
- Use when: you have an OpenAI-shaped endpoint and don’t want the
full
openaiSDK dependency. - Setup:
Terminal window # No extra needed — uses httpx (already a dep of mink/Ollama).export OPENAI_API_KEY=... # or any bearer-style token - Auth options:
OPENAI_API_KEYenv (or any token passed asapi_key=); custom headers via theheaders=constructor kwarg (chimera/providers/compatible.py:29). - Use with mink:
Terminal window # via the catalog (resolves base_url + api_key from env)chimera mink --model groq/llama-3.3-70b -p "..."chimera mink --model deepseek-chat -p "..." - What’s wired: non-streaming
complete()only (chimera/providers/compatible.py:44); tool calls yes; default ASYNC/streaming falls back to base-class wrappers (base.py:61,base.py:107); no thinking, no caching. - Limits: no native streaming — the base-class
stream()shim yields the whole response as one chunk; this is fine for short completions but loses token-by-token feel. Use theopenaiadapter if you need real streaming against an OpenAI-shaped endpoint.
Google Gemini
Section titled “Google Gemini”Google google-generativeai SDK against the Gemini API.
- Use when: you want Gemini 2.0 Flash, Gemini 1.5 Pro, etc.
- Setup:
Terminal window uv sync --extra google # if shipped; otherwise:pip install google-generativeaiexport GOOGLE_API_KEY=... # or GEMINI_API_KEY - Auth options:
GOOGLE_API_KEYorGEMINI_API_KEY(chimera/auth/manager.py:39,chimera/providers/google.py:45)AuthManager.get_token("google")
- Use with mink:
Terminal window chimera mink --model gemini-2.0-flash -p "summarize this doc" - What’s wired: non-streaming
complete()only; tool calls yes (function declarations,chimera/providers/google.py:120); 1M context windows (chimera/providers/google.py:23-26); system messages folded into the first user turn as[System] ...(chimera/providers/google.py:101). - Limits: no streaming — the base-class shim is used; no async
override; system-prompt handling is approximate (Gemini wants a
separate
system_instructionfield, this adapter inlines it instead); schema cleanup strips$ref/oneOf/additionalPropertiesetc. unsupported by Gemini (chimera/providers/google.py:132-146); thinking param is accepted but ignored.
Modal-hosted vLLM container exposing OpenAI-shape /v1/chat/completions.
The adapter does not deploy the container — it just calls a URL you
already have.
- Use when: you’ve deployed an open-weight model on Modal GPUs and want mink to drive it.
- Setup:
Terminal window pip install modal httpxmodal token new # browser flowexport MODAL_TOKEN_ID=... # optional, if you want to skip browserexport MODAL_TOKEN_SECRET=...# then deploy your container and capture its public URL - Auth options:
MODAL_TOKEN_ID+MODAL_TOKEN_SECRETenv vars (chimera/providers/modal.py:47-48); per-container auth is up to your vLLM image. - Use with mink: not directly —
mink --modelonly auto-routes to Ollama. Use the catalog orcreate_provider()from Python:from chimera.providers import create_providerprovider = create_provider("modal", model="meta-llama/Llama-3.3-70B",base_url="https://your-org--llm-app-serve.modal.run/v1",) - What’s wired: non-streaming
complete()only; tool calls yes (via the OpenAI-compat shape vLLM emits). - Limits: no streaming, no async, no thinking, no caching;
base_urlis required —_get_base_url()raises if you forget it (chimera/providers/modal.py:55).
HTTP relay for centralized key management, cost tracking, or running agents in environments without direct API egress.
- Use when: a team wants one server holding the API keys, with mink clients pointing at it. The proxy translates Chimera’s wire format to whatever upstream provider it controls.
- Setup: stand up a server that exposes
POST /api/complete(and optionally/api/stream) accepting Chimera’s payload shape; seechimera/providers/proxy.py:18-26for the contract. - Auth options: Bearer token via the
api_key=constructor arg, sent asAuthorization: Bearer <token>(chimera/providers/proxy.py:69). - Use with mink: like Modal, only via
create_provider()from Python:from chimera.providers import create_providerprovider = create_provider("proxy",base_url="http://proxy.internal:8000",api_key="team-token",model="claude-sonnet-4",) - What’s wired: synchronous
complete()only (uses stdliburllib.request, nohttpx); tool calls forwarded as plain dicts; thinking level forwarded as a string for the proxy to interpret (chimera/providers/proxy.py:62-65). - Limits: no streaming, no async; the proxy must report
usageitself — the adapter trusts whatever JSON comes back; defaultcontext_window=128000is hardcoded (chimera/providers/proxy.py:97).
Custom providers
Section titled “Custom providers”Implement Provider (chimera/providers/base.py:47) and call
register_provider("my-name", factory) (chimera/providers/registry.py:14).
Factory signature: factory(model=..., api_key=..., base_url=..., **kw).
After registration, create_provider("my-name", model=...) works
identically to the built-ins. Drop the factory call into your package’s
__init__.py so import triggers self-registration, mirroring the
built-ins (e.g. chimera/providers/anthropic.py:436).
Choosing a provider
Section titled “Choosing a provider”| Concern | Pick |
|---|---|
| Privacy / local | ollama with a local model (qwen3:32b, llama3.1:70b-instruct) |
| Lowest latency | ollama local on GPU; groq/* via compatible |
| Best capability | anthropic (claude-sonnet/opus) or openai (gpt-4o, o3) |
| Cost-sensitive | compatible with deepseek-chat or groq/llama-3.3-70b |
| Vision-heavy | anthropic (claude-sonnet) or openai (gpt-4o); avoid Kimi 2.6 |
| Long context | google (Gemini 1M) or anthropic (Claude 200k) or Kimi (262k) |
| Tools + streaming + thinking, all in one | anthropic (claude with enable_thinking=True) |
Mixing providers in one session
Section titled “Mixing providers in one session”Two ways to set the model — last write wins:
- CLI flag (highest precedence):
chimera mink --model <name>. settings.jsonmodel:key, loaded byload_mink_settings()(chimera/mink/settings.py:274). The CLI defaultkimi-k2.6:cloudis treated as “user did not pass —model” so an agent’s frontmattermodel:can override it (chimera/mink/cli.py:796-801).- Env vars
ANTHROPIC_MODEL/OPENAI_MODELare picked up only when no--modeland no settings model is set (chimera/providers/factory.py:81-82).
A single chimera mink process holds one provider for the whole
session. To swap mid-session, exit and relaunch (or, programmatically,
build a new Agent with create_provider(...)).
Credentials directory
Section titled “Credentials directory”| Path | Source | Mode |
|---|---|---|
~/.chimera/credentials.json | OAuth-issued tokens, refresh tokens | 0o600 |
~/.chimera/auth.json | AuthManager.set_token() | default |
~/.chimera/sessions/*.jsonl | Session tree per-cwd (mink REPL) | default |
~/.chimera/eventlog/mink-*/ | chimera mink -p persisted runs | default |
CredentialStore._write chmods to 0o600 after each save
(chimera/auth/store.py:62); auth.json is a plain JSON file written
by AuthManager.set_token() (chimera/auth/manager.py:154).
Proxy mode for teams
Section titled “Proxy mode for teams”For an org-wide gateway sitting in front of Anthropic-shaped providers:
export ANTHROPIC_BASE_URL=http://proxy.internal:8000export ANTHROPIC_AUTH_TOKEN=team-issued-jwtchimera mink --model claude-sonnet-4 -p "..."AnthropicProvider honors both env vars
(chimera/providers/anthropic.py:53,58); the proxy can be a thin pass-
through, a key-rotating relay, or a budget-enforcing gatekeeper. The
dedicated proxy provider (above) is the alternative when your gateway
speaks its own JSON shape rather than the Anthropic wire protocol.
Troubleshooting
Section titled “Troubleshooting”| Symptom | Likely cause / fix |
|---|---|
401 / 403 | Wrong key, wrong env var. Check AuthManager.status() or `printenv |
ImportError: pip install chimera-run[...] | Provider’s optional extra is missing. uv sync --extra anthropic (or openai, google). |
Cannot infer provider from model name '...' | Pass provider_type=... explicitly, or use a known prefix; see _infer_provider() for the list. |
| Streaming hangs / first call is slow | Cloud cold start. Ollama Cloud needs keep_alive: 60m (mink sets this). Anthropic / OpenAI typically warm in <2s. |
tool_calls always empty on Ollama | You hit /v1/chat/completions instead of /api/chat. Set OLLAMA_HOST to the daemon root, not .../v1. |
temperature must be 1 with thinking | Anthropic extended thinking forces temperature=1 (see anthropic.py:125). Drop your custom temperature. |
Ollama Kimi rejects tool_choice: "required" | Adapter silently drops it when think:true is also set (ollama.py:103); use auto/none. |
Gemini complains about $ref / oneOf | The adapter strips them (google.py:132); if you see it pass through, check you’re hitting the right adapter. |