verify_answer — Python boolean cross-check
verify_answer runs a Python snippet that should print True if a candidate answer is correct and False otherwise. It’s a focused alternative to a full test suite when the check is single-shot.
Schema
Section titled “Schema”| Arg | Type | Required | Default | Description |
|---|---|---|---|---|
code | string | yes | — | Python code that prints True or False. |
timeout | integer | no | 30 | Wall-clock seconds before the snippet is killed. |
Example invocation
Section titled “Example invocation”{ "code": "answer = 42\nprint(answer == 6 * 7)"}from chimera.tools.verify import VerifyAnswerTool
tool = VerifyAnswerTool()result = tool.execute( {"code": "print(sum(range(11)) == 55)"}, env=local_env,)print(result.metadata["verified"]) # TrueOutput sample
Section titled “Output sample”Trueresult.metadata["verified"] is the parsed boolean. Anything other than a stripped-lowercase true lands as False.
When to use it
Section titled “When to use it”- Math benchmarks (AIMO, MATH-500) — extract the integer answer and prove it via Python.
- Synthesis — quick sanity check before running the full suite.