Mink Model Compatibility
Mink Model Compatibility
Section titled “Mink Model Compatibility”This page lists every Ollama model that has been end-to-end smoke-tested against
chimera mink on this machine, along with the captured run IDs that produced
the numbers. Every row is reproducible from the eventlog directories cited.
The two smokes used are identical for every model:
| Smoke | Purpose | Command (paraphrased) |
|---|---|---|
| A: text-only | proves provider wiring | mink -p "Say hello in three words." --max-steps 1 |
| B: tool dispatch | proves the model emits a real tool_calls block that mink can route to the bash tool | mink -p "Use the bash tool to run 'echo HELLO_FROM_<MODEL>' and report the result." --max-steps 4 |
Both invoked with --permission-mode bypassPermissions --output-format json.
“Tool calls” below refers to tool_calls_total from the per-run
~/.chimera/eventlog/<run_id>/summary.json. Wall-clock is ended_at - started_at from the same file. Costs are whatever OllamaProvider reported
back to the run summary; cloud models that the local Ollama daemon does not
price out come back as $0.0000 (the cost is borne by the upstream account,
not surfaced through /api/chat).
Recommended models
Section titled “Recommended models”| Model | Kind | Context (default) | Tool calls observed | Smoke A turn | Smoke B turn | Cost / turn (B) | Notes |
|---|---|---|---|---|---|---|---|
glm-5.1:cloud | cloud | 131072 | 1 | 30.0s | 17.0s | $0.0204 | Validated working baseline. Slowest cold-start in this batch (30s on first call). |
glm-5:cloud | cloud | 131072 | 1 | 1.0s | 9.0s | $0.0214 | Fastest text response, dispatches tools cleanly. Cheap-ish per turn. |
kimi-k2.5:cloud | cloud | 131072 | 1 | 4.0s | 17.0s | $0.0000 | Reasoning model (think:true auto-enabled by provider). Steady tool use. |
kimi-k2.6:cloud | cloud | 131072 | 1 | 9.0s | 11.0s | $0.0000 | Default mink model. Reasoning enabled by default. |
minimax-m2:cloud | cloud | 131072 | 1 | 1.0s | 9.0s | $0.0000 | Tied for fastest tool turn. |
minimax-m2.7:cloud | cloud | 131072 | 1 | 2.0s | 10.0s | $0.0000 | Successor of m2; same tool-use behavior. |
qwen3.5:cloud | cloud | 131072 | 1 | 3.0s | 10.0s | $0.0000 | Clean tool dispatch, fast. |
gpt-oss:120b-cloud | cloud | 131072 | 1 | 1.0s | 9.0s | $0.0000 | Tied for fastest tool turn. One transient 500 from /api/chat on first try; succeeded on retry. |
llama3.2:3b | local | 131072 | 1 | 22.0s (text smoke failed)* | 49.0s | $0.0000 | Smallest viable local model. Slow and unreliable on free-form text smoke; surprisingly does dispatch the bash tool. |
*Smoke A returned success=false (“Max steps reached”) for llama3.2:3b. See
“Models that failed or are partial” below.
Per-model details
Section titled “Per-model details”Setup commands assume Ollama is running at http://localhost:11434. Cloud
tagged models (*:cloud) are manifest-only pulls (a few hundred bytes each)
and require ollama signin to be authenticated against an account that has
access to the cloud model catalog. Local models are full weight downloads.
glm-5.1:cloud
Section titled “glm-5.1:cloud”- Setup:
ollama signin(once), thenollama pull glm-5.1:cloud. - Strengths: Stable, accurate tool dispatch, tracks the prompt closely.
- Weaknesses: Saw a 30s cold-start on the first text request of the session; subsequent calls are sub-10s.
- Suggested flags:
--max-steps 50(default),--tool-timeout 120. - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182550-523915ec/summary.json - Smoke B:
~/.chimera/eventlog/mink-20260425T183450-71742dd6/summary.json
- Smoke A:
glm-5:cloud
Section titled “glm-5:cloud”- Setup:
ollama pull glm-5:cloud. - Strengths: Fastest text response in this batch (1.0s), good tool formatting.
- Weaknesses: Slightly higher per-turn cost surfaced through the provider ($0.0214 on smoke B).
- Suggested flags:
--max-steps 50,--tool-timeout 120. - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182630-1a0e2683/summary.json - Smoke B:
~/.chimera/eventlog/mink-20260425T183512-430cc00a/summary.json
- Smoke A:
kimi-k2.5:cloud
Section titled “kimi-k2.5:cloud”- Setup:
ollama pull kimi-k2.5:cloud. - Strengths: Reasoning model; the Ollama provider auto-sets
think:truefor any tag starting withkimi. Good for multi-step planning. - Weaknesses: ~17s on smoke B is mid-pack; reasoning tokens add latency.
- Suggested flags:
--max-steps 50,--tool-timeout 180(give reasoning room). - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182631-dce51214/summary.json - Smoke B:
~/.chimera/eventlog/mink-20260425T183521-d621617c/summary.json
- Smoke A:
kimi-k2.6:cloud
Section titled “kimi-k2.6:cloud”- Setup:
ollama pull kimi-k2.6:cloud. This is the mink default (see_DEFAULT_MODELinchimera/mink/cli.py). - Strengths: Reasoning enabled by default; well-tuned for the Chimera loop; this is the model the rest of the project benchmarks against.
- Weaknesses: Cold-start can exceed 2 minutes on a freshly-signed-in account (not observed in this run because the account was already warm).
- Suggested flags: defaults are tuned for it.
--tool-timeout 300if you expect a cold cloud node. - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182651-acb8b23a/summary.json - Smoke B:
~/.chimera/eventlog/mink-20260425T183616-d60db3fd/summary.json
- Smoke A:
minimax-m2:cloud
Section titled “minimax-m2:cloud”- Setup:
ollama pull minimax-m2:cloud. - Strengths: Tied for fastest tool turn (9.0s on smoke B).
- Weaknesses: None observed in smoke; weights are older than m2.7.
- Suggested flags:
--max-steps 50,--tool-timeout 120. - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182704-975eb417/summary.json - Smoke B:
~/.chimera/eventlog/mink-20260425T183645-fec8628f/summary.json
- Smoke A:
minimax-m2.7:cloud
Section titled “minimax-m2.7:cloud”- Setup:
ollama pull minimax-m2.7:cloud. - Strengths: Newer minimax weights; tool dispatch is clean, runs fast.
- Weaknesses: None observed in smoke.
- Suggested flags:
--max-steps 50,--tool-timeout 120. - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182641-5e4c98b3/summary.json - Smoke B:
~/.chimera/eventlog/mink-20260425T183544-9f4db463/summary.json
- Smoke A:
qwen3.5:cloud
Section titled “qwen3.5:cloud”- Setup:
ollama pull qwen3.5:cloud. - Strengths: Fast, clean tool dispatch, very chatty in a good way.
- Weaknesses: None observed in smoke.
- Suggested flags:
--max-steps 50,--tool-timeout 120. - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182643-5be3897c/summary.json - Smoke B:
~/.chimera/eventlog/mink-20260425T183554-bb818c61/summary.json
- Smoke A:
gpt-oss:120b-cloud
Section titled “gpt-oss:120b-cloud”- Setup:
ollama pull gpt-oss:120b-cloud. - Strengths: Tied for fastest tool turn (9.0s). Useful as a third-party baseline for tool-use comparison.
- Weaknesses: Returned a 500 from
/api/chaton the first smoke-B attempt (cold cloud node, transient); succeeded on the immediate retry. Plan for a one-shot retry around this model. - Suggested flags:
--max-steps 50,--tool-timeout 180. Wrap calls in retry-on-500 if you script against it. - Evidence:
- Smoke A:
~/.chimera/eventlog/mink-20260425T182650-c027e2e2/summary.json - Smoke B (retry):
~/.chimera/eventlog/mink-20260425T183635-7391c9e5/summary.json
- Smoke A:
llama3.2:3b (local)
Section titled “llama3.2:3b (local)”- Setup:
ollama pull llama3.2:3b. - Strengths: Pure local, zero cost, surprisingly does dispatch the bash tool (smoke B succeeded).
- Weaknesses: Free-form smoke A failed with
success=falseand"Max steps reached"after 22s — at--max-steps 1the 3B model spent the budget without producing a stop-able message. Smoke B took 49s, the slowest in the entire batch. - Suggested flags: Use as a fallback only.
--max-steps 6,--tool-timeout 120. Do not rely on it for production tool loops. - Evidence:
- Smoke A (failed):
~/.chimera/eventlog/mink-20260425T183421-c1be4b5d/summary.json - Smoke B (passed):
~/.chimera/eventlog/mink-20260425T183920-8e864999/summary.json
- Smoke A (failed):
Models that failed or are partial
Section titled “Models that failed or are partial”| Model | Status | Reason |
|---|---|---|
glm-4.7-flash (local, 19 GB) | failed | First-load httpx.ReadTimeout even at a 480s overall timeout. The 19GB MoE weights take longer than the default httpx read window to spin up on this hardware. Would need a custom Modelfile that pre-warms with keep_alive raised, or a longer client read timeout in OllamaProvider. |
qwen3.5 (local, non-cloud tag) | failed to pull | Error: pull model manifest: 412 ... requires a newer version of Ollama. Upgrade Ollama via https://ollama.com/download to use this tag. The cloud variant qwen3.5:cloud works fine. |
llama3.2:3b (local) | partial | Smoke A failed (Max steps reached); smoke B passed but at 49s. Treat as fallback only. |
How to add your own model
Section titled “How to add your own model”The Ollama provider lives at
chimera/providers/ollama.py.
Notable defaults:
_DEFAULT_NUM_CTX = 131_072(131k context window pushed to Ollama asnum_ctx)._DEFAULT_KEEP_ALIVE = "60m"(model stays warm for an hour after the last call).think=Trueis auto-enabled when the model tag starts withkimi. Override per-call by passingthink=...through provider kwargs.
To register a custom Ollama model:
- Create a Modelfile pointing at your base weights and any system-prompt
tuning. Example:
FROM kimi-k2.6:cloudplus your ownSYSTEM. ollama create my-agent -f ./Modelfile.chimera mink --model my-agent .... The provider will dispatch through the same/api/chatpath, so tool-calling parity is preserved.
For a fully custom provider (e.g., raise num_ctx, change keep_alive,
disable thinking on a kimi tag), instantiate OllamaProvider directly in
your own script:
from chimera.providers.ollama import OllamaProvider
provider = OllamaProvider( model="kimi-k2.6:cloud", context_length=200_000, keep_alive="2h", think=False,)Switching models
Section titled “Switching models”There are two supported ways to switch the model mink talks to:
-
Per-invocation via
--model:Terminal window chimera mink --model glm-5:cloud -p "..."chimera mink --model minimax-m2.7:cloud -p "..." -
Per-environment fallback via
CHIMERA_MINK_FALLBACK:chimera minkfirst probes/api/tagsfor the requested--model. If the tag is not present locally andCHIMERA_MINK_FALLBACKis set, it falls back to that tag instead of failing. Default fallback isqwen3:32b(seechimera/mink/cli.py).Terminal window export CHIMERA_MINK_FALLBACK=glm-5.1:cloudchimera mink --model some-not-yet-pulled-model -p "..."# -> uses glm-5.1:cloudThe legacy
CHIMERA_CC_FALLBACKis still accepted with a deprecation warning.
There is no CHIMERA_MINK_MODEL env var; the default model is hardcoded
in _DEFAULT_MODEL in chimera/mink/cli.py (currently kimi-k2.6:cloud).
If you want a session-wide default, alias it:
alias mink="chimera mink --model glm-5.1:cloud"Catalog refresh — wave 13 (2026-05)
Section titled “Catalog refresh — wave 13 (2026-05)”The default ProviderCatalog (chimera/providers/catalog.py) ships
explicit ModelConfig bindings for the seven model families below.
Routing inference (chimera/providers/factory.py:_infer_provider) is
covered by tests/providers/test_catalog_refresh.py.
| Model id | Provider | Endpoint / env | Pricing (in/out per Mtok) | Notes |
|---|---|---|---|---|
qwen3-coder | ollama | $OLLAMA_HOST | $0 / $0 | Local Ollama tag (Alibaba). DashScope API path: pass provider_type="compatible" + base_url=https://dashscope-intl.aliyuncs.com/compatible-mode/v1. |
qwen3-coder-30b | ollama | $OLLAMA_HOST | $0 / $0 | Same family, 30B coder. |
qwen3-32b | ollama | $OLLAMA_HOST | $0 / $0 | Same family, 32B general. |
glm-4.6 | anthropic | https://api.z.ai/api/anthropic + $ANTHROPIC_AUTH_TOKEN | $0.6 / $2.2 (placeholder) | Zhipu Anthropic-compat. TODO: confirm rates against docs.z.ai. |
glm-5.1 | anthropic | https://api.z.ai/api/anthropic + $ANTHROPIC_AUTH_TOKEN | $2 / $8 (mirrors glm-5) | Same endpoint as glm-5; pricing TODO until Zhipu publishes 5.1 sheet. |
deepseek-v3.1-terminus | compatible | https://api.deepseek.com/v1 + $DEEPSEEK_API_KEY | $0.27 / $1.10 (placeholder) | DeepSeek hosted OpenAI-compat. |
deepseek-coder-v3 | compatible | https://api.deepseek.com/v1 + $DEEPSEEK_API_KEY | $0.27 / $1.10 (placeholder) | Coder line; longest-prefix matched ahead of deepseek-chat. |
gpt-oss-120b | ollama | $OLLAMA_HOST | $0 / $0 | OpenAI open weights via Ollama. Routed to ollama by an explicit gpt-oss prefix that fires before the gpt-* → OpenAI rule. |
gpt-oss-20b | ollama | $OLLAMA_HOST | $0 / $0 | Smaller OSS sibling. |
kimi-k2-0905-preview | anthropic | https://api.moonshot.ai/anthropic + $MOONSHOT_API_KEY | $0.6 / $2.5 (placeholder) | Moonshot Anthropic-compat. :cloud Kimi tags stay served by Ollama. |
kimi-k2.5 | anthropic | https://api.moonshot.ai/anthropic + $MOONSHOT_API_KEY | $0.6 / $2.5 (placeholder) | Same endpoint, k2.5 GA line. |
mistral-codestral-2511 | ollama | $OLLAMA_HOST | $0 / $0 | Mistral coder. For Mistral hosted API, override with provider_type="compatible" + base_url=https://api.mistral.ai/v1 + $MISTRAL_API_KEY. |
gemma3-27b-instruct | ollama | $OLLAMA_HOST | $0 / $0 | Google open weights. Routed via a new gemma prefix; hosted Gemini stays on the gemini-* → Google branch. |
Pricing entries flagged (placeholder) are educated guesses — refresh
once the upstream vendor publishes per-SKU rates. Local Ollama tags
report $0 because /api/chat does not surface a price field; the
real cost is hardware + electricity.
Reproducing this report
Section titled “Reproducing this report”Every smoke in this report was run against a live Ollama daemon at
http://localhost:11434 from this checkout. To reproduce a single row:
# Smoke Atimeout 90 uv run python -m chimera.cli.main mink \ --model <MODEL> --permission-mode bypassPermissions --output-format json \ -p "Say hello in three words." --max-steps 1
# Smoke Btimeout 180 uv run python -m chimera.cli.main mink \ --model <MODEL> --permission-mode bypassPermissions --output-format json \ -p "Use the bash tool to run 'echo HELLO_FROM_<TAG>' and report the result." \ --max-steps 4Each invocation prints a [mink] run saved as mink-<timestamp>-<id> at ~/.chimera/eventlog/mink-<timestamp>-<id>/ line; the summary.json in that
directory carries tool_calls_total, cost_usd, started_at, ended_at,
and success — the same fields cited above.