Evaluation workflows¶
This guide covers the low-level konfai EVALUATION workflow.
Evaluation in KonfAI compares saved dataset groups, not in-memory model outputs. That is why a typical workflow is:
train
predict
evaluate the written predictions
Minimal command¶
konfai EVALUATION -y --config Evaluation.yml
What evaluation writes¶
Evaluation writes to:
Evaluations/<train_name>/Metric_TRAIN.jsonoptionally
Evaluations/<train_name>/Metric_VALIDATION.json
The output directory is controlled by:
Evaluator.train_namein the YAML--evaluations-diron the CLI
What the JSON contains¶
The evaluator writes JSON with two sections:
casefor per-case valuesaggregatesfor summary statistics such as mean, std, percentiles, min, max, and count
This structure is implemented by konfai.evaluator.Statistics.
Pairing targets and predictions¶
Evaluation relies on dataset_filenames and groups_src to align:
the predicted output group
the reference target group
any optional mask or auxiliary group
For example, the synthesis evaluation example combines:
./Dataset:a:mha./Predictions/TRAIN_01/Dataset:i:mha
The i flag keeps only cases present in both sources.
Validation reports¶
Evaluator.Dataset.validation can optionally target a case list or an
explicit case selector. When it is set, KonfAI writes a separate validation
metrics JSON in addition to
Metric_TRAIN.json.
Common evaluation mistakes¶
prediction and target datasets do not share the same case names
output group names in
metricsdo not exist in the loaded datasetthe evaluation file still points to an old prediction folder
label definitions in the metric do not match the dataset encoding