Skip to content

Aider Polyglot

Aider Polyglot (multi-language code edits)

Section titled “Aider Polyglot (multi-language code edits)”

Aider Polyglot is the standard “code edit” benchmark — instead of generating code from scratch, the agent receives a buggy or incomplete file and must edit it. 225 exercises across Python, JavaScript, Rust, Go, C++, and Java.

References:

Status: TODO (adapter wired; not benchmarked yet)

Section titled “Status: TODO (adapter wired; not benchmarked yet)”
RunScore
ChimeraNOT RUN

The adapter supports two grading modes that coexist on a per-task basis:

ModeGrader
diff-matchCompare the agent’s patch against the gold patch line-for-line. Strict.
test-passRun the language’s native test runner against the patched tree. Loose.

Diff-match takes precedence; falls through to test-pass when the task carries tests but no canonical patch.

Terminal window
git clone https://github.com/Aider-AI/polyglot-benchmark ~/.chimera/datasets/aider-polyglot
from chimera.eval.benchmarks import AiderPolyglot
from chimera.eval.harness import Harness
# All languages
bench = AiderPolyglot(dataset_path="~/.chimera/datasets/aider-polyglot")
print(bench.name()) # "aider-polyglot"
# Filter
bench = AiderPolyglot(
dataset_path="~/.chimera/datasets/aider-polyglot",
languages=["python", "rust"],
)
print(bench.name()) # "aider-polyglot:python+rust"
harness = Harness(agent=my_agent, benchmark=bench)
results = harness.run()
{
"exercise": "reverse-string",
"language": "rust",
"instructions": "Reverse a string. ...",
"files": {"src/lib.rs": "pub fn reverse(s: &str) -> String { todo!() }"},
"tests_path": "tests/reverse_string.rs"
}

For tests-mode: the agent’s patch is applied, then cargo test / pytest / go test / npm test is run inside the workspace. Tools live under chimera/eval/benchmarks/runners/.

  • The dataset is not pip-installable — clone the GitHub repo directly.
  • Each language brings its own toolchain (rustc, cargo, go, etc.). Run inside a fat Docker image or skip the languages you don’t have set up.
  • Aider’s own leaderboard uses test-pass mode with their custom diff format. To match their numbers, use mode="test-pass" and Aider’s edit format.