Skip to content

Use DeepSeek-V4

DeepSeek-V4 is wired into Chimera’s model catalog with three transports:

Model idTransportEndpoint
deepseek-v4OpenAI-compatiblehttps://api.deepseek.com/v1
deepseek-v4-proOpenAI-compatiblehttps://api.deepseek.com/v1
deepseek-v4-pro:cloudOllama$OLLAMA_HOST (cloud passthrough)

The bare ids hit DeepSeek’s hosted OpenAI-compatible endpoint. The :cloud-tagged id hits Ollama’s ollama run deepseek-v4-pro:cloud cloud passthrough — useful when you already have an Ollama signin and prefer the unified endpoint. You can also pull the model into a local Ollama daemon for fully offline use.

Context window: 128k tokens for every variant.

Pick one of the three paths below.

  • Ollama 0.6+ — ollama.com/download
  • An Ollama account: ollama signin
  • export OLLAMA_HOST=https://ollama.com (default)
  • Ollama 0.6+
  • Sufficient RAM and storage to host the model locally — ollama pull deepseek-v4-pro first

All three paths require Chimera installed:

Terminal window
pip install chimera-run
import os
from chimera.providers.factory import create_provider
os.environ["DEEPSEEK_API_KEY"] = "sk-..."
provider = create_provider(model="deepseek-v4-pro")
response = provider.complete("Write a Python one-liner that reverses a list.")
print(response.content)

The factory resolves deepseek-v4 / deepseek-v4-pro / deepseek-chat / deepseek-reasoner to the OpenAI-compatible provider pointed at https://api.deepseek.com/v1 with the DEEPSEEK_API_KEY env var.

Terminal window
ollama signin
export OLLAMA_HOST=https://ollama.com
from chimera.providers.factory import create_provider
provider = create_provider(model="deepseek-v4-pro:cloud")
response = provider.complete("Refactor this function for clarity.")
print(response.content)

The :cloud suffix forces the Ollama transport regardless of any prefix matching elsewhere in the catalog. The factory inspects the trailing :cloud (and friends like glm-5.1:cloud, kimi-k2.6:cloud) to short-circuit to the Ollama provider.

Terminal window
ollama pull deepseek-v4-pro
ollama serve # if not already running on :11434
import os
from chimera.providers.factory import create_provider
os.environ["OLLAMA_HOST"] = "http://localhost:11434"
provider = create_provider(model="deepseek-v4-pro")
# When OLLAMA_HOST points at a local daemon, the factory will route through
# the local Ollama transport instead of the public DeepSeek API.
response = provider.complete("Implement a binary search.")
print(response.content)

For full offline operation, drop DEEPSEEK_API_KEY from your environment so the factory cannot accidentally fall back to the hosted API.

import asyncio
from chimera.assembly.coding_agent import CodingAgent
agent = CodingAgent(model="deepseek-v4-pro")
async def main():
result = await agent.arun(
"Add a CLI flag --json to scripts/format_report.py and write a test."
)
print(result.output)
asyncio.run(main())

CodingAgent wires up the default tools, a ReAct loop, and the right environment for the workspace. The model= kwarg flows through create_provider.

Terminal window
# Synthesize against the direct API
chimera synthesize "Build a calculator REST API" --tests tests/ --model deepseek-v4-pro
# Eval HumanEval against the cloud passthrough
chimera eval --benchmark humaneval --model deepseek-v4-pro:cloud --limit 10
# Otter REPL against the direct API
otter chat --model deepseek-v4
# Code REPL with model cycling: try DeepSeek-V4 first, then fall back to GLM-5
chimera code --models deepseek-v4-pro,glm-5

The cost catalog ships placeholder pricing copied from deepseek-reasoner ($0.55 / $2.19 per Mtok) for every V4 variant. DeepSeek had not published a V4 list at the time of writing — refresh chimera/providers/cost.py once the official rate card lands.

  • No API key found for deepseek-v4-pro — set DEEPSEEK_API_KEY (or use the :cloud variant for Ollama).
  • OLLAMA_HOST is unreachable — start ollama serve, or set OLLAMA_HOST=https://ollama.com for cloud.
  • Model 'deepseek-v4-foo' not in catalog — only deepseek-v4, deepseek-v4-pro, deepseek-v4-pro:cloud are wired. Custom variants need a ModelConfig registered against chimera.providers.catalog.
  • Slow context-window saturation — every V4 variant ships at 128k context. If you’re hitting it on long sessions, enable chimera.compaction.thresholds.ThresholdCompaction in your LoopConfig.