Skip to content

Cost Tracking

The cost tracking module provides granular, per-call token accounting with cache-aware pricing, per-step breakdowns, per-model summaries, budget enforcement, and context window utilization monitoring. Attach a CostTracker to an agent to track exactly how much each step costs, where cache hits save money, and how close you are to your budget.

from chimera.providers.cost_tracker import CostTracker
tracker = CostTracker(budget=5.00)
usage = tracker.record_usage(
model="claude-sonnet-4",
input_tokens=1000,
output_tokens=500,
cache_read_tokens=200,
)
print(f"Cost so far: ${tracker.total:.4f}")
print(f"Cache hit rate: {usage.cache_hit_rate:.1%}")
print(f"Budget remaining: ${tracker.remaining:.4f}")
ClassModuleDescription
TokenUsagechimera.providers.cost_trackerDataclass for a single LLM call: input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, reasoning_tokens, total_tokens, cost, model, timestamp. Properties: cache_hit_rate, effective_input_tokens.
StepUsagechimera.providers.cost_trackerAggregated usage for one agent step: step_index, calls (list of TokenUsage), start_time, end_time. Properties: total_input_tokens, total_output_tokens, total_cache_read_tokens, total_cache_write_tokens, total_reasoning_tokens, total_cost, duration.
CostTrackerchimera.providers.cost_trackerMain tracker. Records calls via record() or record_usage(), tracks per-model and per-step breakdowns, enforces budgets, and provides summary statistics.
CostLimitExceededchimera.providers.cost_trackerException raised when the cost budget is exceeded.

For simple cost tracking without token-level detail, use record():

from chimera.providers.cost_tracker import CostTracker
tracker = CostTracker(budget=10.00)
tracker.record(0.05, model="claude-sonnet-4")
tracker.record(0.12, model="gpt-4o")
print(tracker.total) # 0.17
print(tracker.breakdown()) # {"claude-sonnet-4": 0.05, "gpt-4o": 0.12}
print(tracker.remaining) # 9.83

Use record_usage() for full token-level accounting with automatic cost calculation:

from chimera.providers.cost_tracker import CostTracker
tracker = CostTracker()
usage = tracker.record_usage(
model="claude-sonnet-4",
input_tokens=5000,
output_tokens=1200,
cache_read_tokens=3000,
cache_write_tokens=500,
reasoning_tokens=0,
)
print(f"This call cost: ${usage.cost:.4f}")
print(f"Cache hit rate: {usage.cache_hit_rate:.1%}")
print(f"Effective input tokens: {usage.effective_input_tokens}")

Wrap agent steps with start_step() / end_step() to group LLM calls by step:

tracker = CostTracker()
tracker.start_step(0)
tracker.record_usage(model="claude-sonnet-4", input_tokens=1000, output_tokens=200)
tracker.record_usage(model="claude-sonnet-4", input_tokens=1500, output_tokens=300)
step = tracker.end_step()
print(f"Step 0: {step.total_cost:.4f} USD, {step.duration:.2f}s")
print(f"Total calls in step: {len(step.calls)}")
# Find the most expensive step across the session
expensive = tracker.most_expensive_step()

When a budget is set, CostLimitExceeded is raised as soon as the budget is exceeded:

from chimera.providers.cost_tracker import CostTracker, CostLimitExceeded
tracker = CostTracker(budget=0.10)
try:
tracker.record_usage(
model="claude-opus-4",
input_tokens=100_000,
output_tokens=5_000,
)
except CostLimitExceeded as e:
print(f"Stopped: {e}")

The summary() method returns a comprehensive dict of all tracked metrics:

info = tracker.summary()
# {
# "total_cost": 0.17,
# "total_calls": 5,
# "total_input_tokens": 15000,
# "total_output_tokens": 3200,
# "total_cache_read_tokens": 8000,
# "total_cache_write_tokens": 1000,
# "total_reasoning_tokens": 0,
# "cache_hit_rate": 0.348,
# "context_utilization": 0.0,
# "budget": 10.0,
# "budget_remaining": 9.83,
# "by_model": {"claude-sonnet-4": {...}, "gpt-4o": {...}},
# "steps": 3,
# "most_expensive_step": 1,
# }

Register a callback that fires after each recorded call:

def on_update(usage):
print(f"[{usage.model}] +${usage.cost:.4f}")
tracker = CostTracker(on_usage_update=on_update)
tracker.record_usage(model="claude-sonnet-4", input_tokens=500, output_tokens=100)
# prints: [claude-sonnet-4] +$0.0030

chimera.cli.cost_estimator adds a pre-flight version of cost tracking — answer “how much will this turn cost?” before paying for the round-trip. Wave-11 (A8-W11-COST-ESTIMATE) wires it into the otter -p one-shot path; Wave-12 (W12-9-REPL-COST-GATING) extends the gate to the interactive REPL.

from chimera.cli.cost_estimator import estimate_cost, ModelNotPriced
estimate = estimate_cost(
model="claude-sonnet-4-6",
prompt="implement a JSON parser",
expected_output_tokens=2000,
)
print(estimate.total_usd) # 0.0184
print(estimate.input_tokens) # 7 (chars-÷-4 rule)

Token counts use the chars-÷-4 rule of thumb (good to ~10-20 % on English / code). Pricing comes from chimera.providers.cost.PRICING with longest-prefix-match semantics so glm-5-air resolves through glm-5. Unknown models raise ModelNotPriced (subclass of KeyError) so callers can detect missing entries rather than silently estimating zero.

Flag / commandLayerBehaviour
--estimate-costotter -pPrint the estimate and exit (no provider call).
--max-cost FLOATone-shot + REPLRefuse the turn when the estimate exceeds the cap.
/max-cost <usd>REPLRaise / clear the active cap mid-session. Empty argument clears.
/force-sendREPLBypass the cap once for the next turn only.

The REPL gate runs before the agent thread spins up, so a refused turn costs zero. Refusal goes to stderr; stdout stays clean for jq-style consumers. See chimera/cli/code.py::_gate_turn_by_cost and chimera/cli/slash_commands.py for the wiring shared across codenames.

  • Agent: Each Agent instance holds a CostTracker. The ReAct loop calls start_step() / end_step() around each iteration and record_usage() after every provider call.
  • REPL /cost command: Displays the current tracker’s summary() in a formatted table, including per-model breakdown, cache stats, and budget status.
  • REPL cost gating (W12-9): Each user turn is cost-estimated before submission; turns exceeding --max-cost are refused with a friendly hint pointing at /max-cost and /force-send. The wiring lives at chimera.cli.code so any CLI delegating to run_code inherits it.
  • EventBus: A StepCost event is emitted after each step with the StepUsage data, enabling middleware and plugins to react to cost changes.
  • Provider layer: Providers call record_usage() on the tracker after each LLM response, passing token counts from the API response headers.
  • Built-in pricing: chimera.providers.cost.PRICING includes per-million-token rates for Claude, GPT, GLM, DeepSeek, Kimi, Qwen, GPT-OSS, Mistral-Codestral and Gemma3 (see Providers for the catalog refresh). Unknown models fall back to the "default" tier.
from chimera.providers.cost_tracker import (
CostLimitExceeded,
CostTracker,
StepUsage,
TokenUsage,
)
from chimera.cli.cost_estimator import (
CostEstimate,
ModelNotPriced,
estimate_cost,
format_estimate,
)