Training workflows¶
This guide covers the low-level konfai TRAIN and konfai RESUME
workflows.
Use this mode when you want full control over:
the dataset structure
preprocessing and augmentation
the model graph
losses and metrics
checkpointing and validation
Start from a shipped example¶
The repository currently provides two strong starting points:
examples/Segmentation/Config.ymlfor a simple multiclass segmentation baselineexamples/Synthesis/Config.ymlfor a richer image-synthesis workflow
For most new users, the segmentation example is the easiest first template.
Minimal command¶
From the example directory:
konfai TRAIN -y --gpu 0 --config Config.yml
If you do not have a GPU available, use --cpu 1 instead of --gpu 0.
What training writes¶
Training writes into two top-level directories:
Checkpoints/<train_name>/for model checkpointsStatistics/<train_name>/for TensorBoard logs, copied configs, and train/validation case lists
The output folder name comes from Trainer.train_name in the YAML.
TensorBoard¶
Enable TensorBoard from the CLI:
konfai TRAIN -y --gpu 0 --config Config.yml -tb
KonfAI allocates a free local port automatically when TensorBoard is enabled.
Resume training¶
Resume from an existing checkpoint with RESUME:
konfai RESUME -y --config Config.yml \
--model Checkpoints/SEG_BASELINE/<checkpoint>.pt
You can also change the output directories:
konfai TRAIN -y --config Config.yml \
--checkpoints-dir ./Checkpoints \
--statistics-dir ./Statistics
Training checklist¶
Before launching a new run, verify:
dataset_filenamespoints to the right dataevery group named in
groups_srcexists on disktrain_nameis unique unless you intend to overwriteoutput names used in
outputs_criterionsmatch real model modulesvalidationis appropriate for your dataset size
Advanced training patterns¶
KonfAI supports several advanced training patterns visible in the codebase and examples:
dataset-level patch extraction through
Dataset.Patchmodel-level patching through
Model.<Class>.Patchmultiple criteria per output and per target
EMA through
ema_decayselective logging with
data_logmulti-process execution through the distributed runner
For a concrete advanced example, see the GAN variant in
examples/Synthesis/Config_GAN.yml.