Rerun on Failure
Rerun on Failure
Section titled “Rerun on Failure”Rerun-on-failure is the load-bearing distinction between badger and
the other Chimera coding-agent CLIs. When the agent’s first attempt
produces an unambiguous failure marker, the harness resets and retries
with a refined prompt up to --max-reruns extra attempts.
Enable rerun
Section titled “Enable rerun”chimera badger -p "Fix the broken tests in tests/api/" \ --rerun-on-failure --max-reruns 2Total attempts = 1 + max_reruns. With the default budget, that’s up
to three attempts.
What counts as a failure marker?
Section titled “What counts as a failure marker?”Detection is deliberately conservative. A false negative (a real failure that slips past) is preferred over a false positive (rerunning a healthy trajectory and burning more tokens).
The current marker list:
| Marker | Pattern |
|---|---|
| pytest test failure | ^FAILED \S+:: |
| pytest summary | ^=+\s*\d+\s+failed |
| Python syntax error | ^\s*SyntaxError\b |
| Python traceback | ^Traceback (most recent call last): |
| JavaScript syntax error | \bSyntaxError\b.*Unexpected |
| Rust compile error | ^error\[E\d+\] |
| Node assertion error | \bAssertionError(?:\s*\[ERR_\w+\])?: |
| Explicit failure marker | `^(BUILD FAILED |
When any of these fire on the trajectory output, rerun is
triggered. When none fire, the result is returned as-is — even if
the agent reported success=false.
Refined prompt
Section titled “Refined prompt”The next attempt’s prompt is the original prompt prefixed by a short directive that names the detected markers:
[badger rerun 1] The previous attempt produced pytest test failure,Python traceback. Re-read the failing output, isolate the rootcause, fix it, and re-run any failing tests before reportingcompletion. Do not claim done until verification passes.
Original task:<original prompt>This mirrors the harness-rewrite tradition’s “verify before claiming done” discipline.
Inside the REPL
Section titled “Inside the REPL”The /rerun slash command shows or sets the per-session budget:
> /rerun/rerun: rerun_on_failure=False max_reruns=2> /rerun on/rerun: enabled (max_reruns=2)> /rerun 5/rerun: enabled, max_reruns=5> /rerun off/rerun: disabledThe same machinery is exposed as a Python coroutine:
from chimera.badger.rerun import run_with_rerun
result = await run_with_rerun(agent, prompt, env=env, max_reruns=2)run_with_rerun accepts any object with an
async_run(prompt, env=...) -> result method, so it composes with the
canonical Chimera Agent and with custom subclasses.
Why opt-in?
Section titled “Why opt-in?”Rerun costs real tokens. The harness-rewrite posture treats that cost as a budget, not a free safety net. Operators turn rerun on when the workload warrants it (test fixes, code modernization), and leave it off for one-shot exploratory queries.