Critic
chimera.critic adds in-loop evaluation to agent reasoning. A critic
inspects each action (or only the final response) and scores it against
quality criteria. When the score falls below a configurable threshold, the
loop automatically injects a refinement prompt and retries — giving the agent
a chance to self-correct before returning its answer.
Quick Start
Section titled “Quick Start”from chimera.critic import LLMCritic, CriticConfig, CriticModefrom chimera.providers import create_provider
provider = create_provider("anthropic")
critic = LLMCritic( provider=provider, config=CriticConfig( mode=CriticMode.FINISH_ONLY, success_threshold=0.8, max_refinement_iterations=3, ),)Key Classes
Section titled “Key Classes”| Class | Description |
|---|---|
CriticMode | Enum controlling when evaluation runs: ALL_ACTIONS, FINISH_ONLY, or TOOL_AND_FINISH |
CriticResult | Dataclass with score (0.0—1.0), passed (bool), feedback (str), and details (dict) |
CriticConfig | Dataclass configuring mode, success_threshold, max_refinement_iterations, and optional critic_model |
Critic | ABC with evaluate(context, current_action) -> CriticResult and get_followup_prompt() |
LLMCritic | Uses an LLM provider to score actions on correctness, safety, efficiency, and completeness |
ChecklistCritic | Evaluates actions against a user-supplied checklist of requirements |
CriticMixin | Mixin for loop classes that adds automatic evaluation and iterative refinement |
LLMCritic
Section titled “LLMCritic”LLMCritic sends recent conversation history and the current action to an
LLM, which returns a structured SCORE / PASSED / FEEDBACK response.
from chimera.critic import LLMCritic, CriticConfig, CriticModefrom chimera.providers import create_provider
provider = create_provider("anthropic")
# Use a different (cheaper) model for the criticcritic = LLMCritic( provider=provider, config=CriticConfig( mode=CriticMode.FINISH_ONLY, success_threshold=0.7, max_refinement_iterations=2, critic_model="claude-haiku-4-5-20251001", ),)
# Evaluate an actionresult = critic.evaluate(context, action)print(result.score) # e.g. 0.85print(result.passed) # Trueprint(result.feedback) # "Response is correct but could handle edge case X"You can supply a custom evaluation prompt to focus on domain-specific criteria:
critic = LLMCritic( provider=provider, evaluation_prompt=( "You are a security reviewer. Score the action from 0.0 to 1.0 " "based on whether it introduces security vulnerabilities.\n\n" "Respond in this exact format:\n" "SCORE: <float>\nPASSED: <true/false>\nFEEDBACK: <one paragraph>" ),)ChecklistCritic
Section titled “ChecklistCritic”ChecklistCritic evaluates actions against a list of concrete requirements.
The LLM marks each requirement as satisfied or not and computes an overall
score.
from chimera.critic import ChecklistCritic, CriticConfigfrom chimera.providers import create_provider
provider = create_provider("anthropic")critic = ChecklistCritic( checklist=[ "All new functions have docstrings", "No hardcoded credentials", "Error handling covers network failures", "Unit tests are included", ], provider=provider, config=CriticConfig(success_threshold=0.75),)
result = critic.evaluate(context, action)# result.score reflects the fraction of checklist items satisfiedCriticConfig
Section titled “CriticConfig”| Field | Type | Default | Description |
|---|---|---|---|
mode | CriticMode | FINISH_ONLY | When to evaluate actions |
success_threshold | float | 0.8 | Minimum score to pass |
max_refinement_iterations | int | 3 | Maximum retries before accepting |
critic_model | str | None | None | Optional model override for the critic |
CriticMode
Section titled “CriticMode”| Value | Behavior |
|---|---|
ALL_ACTIONS | Evaluate every action (tool calls and text responses) |
FINISH_ONLY | Evaluate only the final text response (no tool_calls) |
TOOL_AND_FINISH | Evaluate both tool calls and the final response |
CriticResult
Section titled “CriticResult”| Field | Type | Description |
|---|---|---|
score | float | Evaluation score from 0.0 to 1.0 |
passed | bool | Whether the score met the configured threshold |
feedback | str | None | Actionable feedback for improvement |
details | dict | None | Optional structured metadata |
Refinement prompts
Section titled “Refinement prompts”When an evaluation fails, Critic.get_followup_prompt() generates a
refinement message that includes the score, threshold, iteration count, and
feedback:
prompt = critic.get_followup_prompt(result, iteration=1)# "Your previous response scored 65% (threshold: 80%).# Iteration 1/3.## Feedback: The function is missing input validation for negative values.## Please revise your response addressing the feedback above."CriticMixin
Section titled “CriticMixin”CriticMixin is mixed into loop classes (e.g. ReActLoop) to wire critic
evaluation directly into the reasoning cycle. It provides two key methods:
_should_evaluate(action)— checks the critic’sCriticModeto decide whether this action type should be evaluated._evaluate_and_maybe_refine(context, action, event_bus)— runs the critic, publishes aCriticEvent, and returns(should_continue, followup_message).
from chimera.critic import CriticMixin
class MyLoop(CriticMixin): def __init__(self, critic=None): self.critic = critic self._refinement_iteration = 0
def step(self, context, action, event_bus=None): should_retry, followup = self._evaluate_and_maybe_refine( context, action, event_bus, ) if should_retry: # Inject followup prompt and re-run context.add_message(Message.user(followup)) return self.step(context, self.get_next_action(context)) return actionWhen the critic score meets the threshold, _refinement_iteration resets to 0.
When max_refinement_iterations is exhausted, the loop accepts the current
action and moves on.
Integration
Section titled “Integration”The critic system integrates with the rest of Chimera through two mechanisms:
-
LoopConfig: A
Criticinstance (with itsCriticConfig) can be passed viaLoopConfigto any loop class that usesCriticMixin. -
EventBus: Each evaluation publishes a
CriticEventcontaining the score, pass/fail status, feedback, and iteration number:
from chimera.events.types import CriticEvent
# CriticEvent fields:# type = "critic"# score: float (e.g. 0.85)# passed: bool (True/False)# feedback: str | None# iteration: int (refinement iteration number)This allows external observers (dashboards, logging middleware, session storage) to track critic evaluations without coupling to the loop internals.
Import Reference
Section titled “Import Reference”from chimera.critic import ( Critic, CriticConfig, CriticMode, CriticResult, LLMCritic, ChecklistCritic, CriticMixin,)Related
Section titled “Related”- LoopConfig — wire a critic into any loop
- Loops —
RetryLoop/Reflexionare critic-aware - Events — subscribe to
CriticEventfrom the EventBus