Skip to content

Use with Ollama

Point Chimera at Ollama’s Anthropic-compatible endpoint and run the full agent against a local Qwen model or a hosted cloud model with zero code changes.

Local models are free and private. Cloud models are capable enough for real coding work. Either way, there is nothing to install on the Chimera side beyond what you already have.


  • Ollama 0.6 or later — ollama.com/download
  • For cloud models: an Ollama account — ollama.com
  • For local models: at least 16 GB RAM and a model with a 64k-token context window
  • Chimera installed: pip install "git+https://github.com/0bserver07/chimera.git#egg=chimera-run"

Ollama serves the Anthropic API shape at http://localhost:11434. Set these three environment variables so Chimera’s Anthropic provider routes through Ollama:

Terminal window
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""

The ANTHROPIC_AUTH_TOKEN value is a placeholder — Ollama does not validate it for local requests, but Chimera’s provider expects the variable to be set. For cloud models, sign in with ollama signin first.

Chimera agents need at least 64k tokens of context to operate reliably. Pick a model whose listed context window meets that bar.


ModelHostingContextBest for
kimi-k2.6:cloudCloud200k+Full coding agent, tool use, long sessions
glm-5.1:cloudCloud128kCoding and refactoring
qwen3.5:cloudCloud128kGeneral-purpose, fast
minimax-m2.7:cloudCloud1M+Very long contexts
glm-4.7-flashLocal128kFast local coding, no API costs
qwen3.5Local128kLocal general-purpose work

Browse the full cloud catalog at ollama.com/search?c=cloud. Verify the context window on the model’s page before choosing — numbers above are typical at time of writing.


import os
import chimera
os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:11434"
os.environ["ANTHROPIC_AUTH_TOKEN"] = "ollama"
provider = chimera.create_provider(model="kimi-k2.6:cloud")
response = provider.complete("Write a Python one-liner that reverses a list.")
print(response.content)

import asyncio
import os
from chimera.assembly.coding_agent import CodingAgent
os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:11434"
os.environ["ANTHROPIC_AUTH_TOKEN"] = "ollama"
agent = CodingAgent(model="kimi-k2.6:cloud")
async def main():
async for event in agent.run("Create hello.py that prints 'hi from ollama'."):
print(event.type.value, getattr(event.data, "content", "")[:100])
asyncio.run(main())

Two runnable scripts live in the repo:

  • examples/provider/ollama_quickstart.py — minimal provider call
  • examples/agent/ollama_coding_agent.py — full agent loop on a trivial task

Set the env vars above, then run either script.


Chimera (and Claude Code) assume at least 64k tokens of context. Many smaller local models ship with 4k or 8k defaults. Check the model card, and if Ollama exposes a num_ctx parameter for your model, set it when pulling or serving so the value matches what the model actually supports.

A model that advertises 128k but runs with a 4k effective context will truncate tool results silently and cause the agent to loop or lose state.


  • ValueError: Cannot infer provider — set ANTHROPIC_BASE_URL so Chimera’s provider detection picks the Anthropic route.
  • Connection refused — Ollama is not running. Start it with ollama serve.
  • model not found — pull the model first: ollama pull kimi-k2.6:cloud.
  • Tool calls fail silently or return empty — some smaller local models have weak tool-use. Switch to kimi-k2.6:cloud or glm-5.1:cloud before debugging your own code.
  • Agent truncates mid-task — the model’s effective context is below 64k. Pick a larger model or raise num_ctx.

Chimera’s Anthropic provider talks to Ollama’s Anthropic-compatible endpoint. Cloud models like kimi-k2.6:cloud and glm-5.1:cloud work end-to-end with the full coding agent — tools, streaming, compaction, everything. Local models work for inference but tool-use quality varies by model; pick one that advertises function-calling support.

A separate native Ollama provider exists in Chimera for non-Anthropic-compatible use cases (raw /api/generate and /api/chat). For the coding agent, the Anthropic-compatible path is simpler and gets you streaming and tool use out of the box.

If a model does not support tool use at all, the agent will fail on its first tool call. There is no graceful fallback — choose a tool-capable model.