Skip to content

Validation

chimera.training.validation splits a test suite into training and validation sets to detect overfitting during synthesis. The agent synthesizes against training tests only; validation tests are held out and evaluated afterwards.


Splits test files from a Spec.tests_dir into two temporary directories. The split is done at the file level (not individual test functions) to avoid import/fixture issues.

Constructor parameters:

ParamTypeDefaultDescription
specSpecrequiredA Spec with tests_dir set
ratiofloat0.3Fraction of test files held out for validation
seedint | NoneNoneRandom seed for reproducible splits

Key properties: train_spec, val_spec, train_files, val_files.

Dataclass returned by ValidationSplit.evaluate():

FieldTypeDescription
train_pass_ratefloatPass rate on training tests
val_pass_ratefloatPass rate on held-out validation tests
overfit_gapfloattrain_pass_rate - val_pass_rate
train_passed / val_passedintCounts of passing tests
train_total / val_totalintTotal test counts
from chimera.training.spec import Spec
from chimera.training.validation import ValidationSplit
spec = Spec.from_tests("tests/", description="Build a calculator")
split = ValidationSplit(spec, ratio=0.3, seed=42)
# Use split.train_spec for synthesis (agent never sees validation tests)
result = trainer.synthesize(spec=split.train_spec, ...)
# Evaluate on held-out tests
val_result = split.evaluate(env)
print(f"Train: {val_result.train_pass_rate:.0%}")
print(f"Val: {val_result.val_pass_rate:.0%}")
print(f"Overfit gap: {val_result.overfit_gap:.0%}")
  • Training concepts — strategies and the synthesis loop
  • Spec Inference — auto-generate tests from source
  • examples/synthesis/validation_split.py — runnable example