Skip to content

verify_answer — Python boolean cross-check

verify_answer runs a Python snippet that should print True if a candidate answer is correct and False otherwise. It’s a focused alternative to a full test suite when the check is single-shot.

ArgTypeRequiredDefaultDescription
codestringyesPython code that prints True or False.
timeoutintegerno30Wall-clock seconds before the snippet is killed.
{
"code": "answer = 42\nprint(answer == 6 * 7)"
}
from chimera.tools.verify import VerifyAnswerTool
tool = VerifyAnswerTool()
result = tool.execute(
{"code": "print(sum(range(11)) == 55)"},
env=local_env,
)
print(result.metadata["verified"]) # True
True

result.metadata["verified"] is the parsed boolean. Anything other than a stripped-lowercase true lands as False.

  • Math benchmarks (AIMO, MATH-500) — extract the integer answer and prove it via Python.
  • Synthesis — quick sanity check before running the full suite.
  • test — full test suite runner.
  • think — record reasoning without running anything.