Validation
chimera.training.validation splits a test suite into training and validation
sets to detect overfitting during synthesis. The agent synthesizes against
training tests only; validation tests are held out and evaluated afterwards.
Key Classes
Section titled “Key Classes”ValidationSplit
Section titled “ValidationSplit”Splits test files from a Spec.tests_dir into two temporary directories.
The split is done at the file level (not individual test functions) to
avoid import/fixture issues.
Constructor parameters:
| Param | Type | Default | Description |
|---|---|---|---|
spec | Spec | required | A Spec with tests_dir set |
ratio | float | 0.3 | Fraction of test files held out for validation |
seed | int | None | None | Random seed for reproducible splits |
Key properties: train_spec, val_spec, train_files, val_files.
ValidationResult
Section titled “ValidationResult”Dataclass returned by ValidationSplit.evaluate():
| Field | Type | Description |
|---|---|---|
train_pass_rate | float | Pass rate on training tests |
val_pass_rate | float | Pass rate on held-out validation tests |
overfit_gap | float | train_pass_rate - val_pass_rate |
train_passed / val_passed | int | Counts of passing tests |
train_total / val_total | int | Total test counts |
from chimera.training.spec import Specfrom chimera.training.validation import ValidationSplit
spec = Spec.from_tests("tests/", description="Build a calculator")split = ValidationSplit(spec, ratio=0.3, seed=42)
# Use split.train_spec for synthesis (agent never sees validation tests)result = trainer.synthesize(spec=split.train_spec, ...)
# Evaluate on held-out testsval_result = split.evaluate(env)print(f"Train: {val_result.train_pass_rate:.0%}")print(f"Val: {val_result.val_pass_rate:.0%}")print(f"Overfit gap: {val_result.overfit_gap:.0%}")Related
Section titled “Related”- Training concepts — strategies and the synthesis loop
- Spec Inference — auto-generate tests from source
examples/synthesis/validation_split.py— runnable example