Training workflows¶

This guide covers the low-level konfai TRAIN and konfai RESUME workflows.

Use this mode when you want full control over:

Start from a shipped example¶

The repository currently provides two strong starting points:

examples/Segmentation/Config.yml for a simple multiclass segmentation baseline
examples/Synthesis/Config.yml for a richer image-synthesis workflow

For most new users, the segmentation example is the easiest first template.

From the example directory:

konfai TRAIN -y --gpu 0 --config Config.yml

If you do not have a GPU available, use --cpu 1 instead of --gpu 0.

Training writes into two top-level directories:

Checkpoints/<train_name>/ for model checkpoints
Statistics/<train_name>/ for TensorBoard logs, copied configs, and train/validation case lists

The output folder name comes from Trainer.train_name in the YAML.

Enable TensorBoard from the CLI:

konfai TRAIN -y --gpu 0 --config Config.yml -tb

KonfAI allocates a free local port automatically when TensorBoard is enabled.

Resume from an existing checkpoint with RESUME:

konfai RESUME -y --config Config.yml \
  --model Checkpoints/SEG_BASELINE/<checkpoint>.pt

You can also change the output directories:

konfai TRAIN -y --config Config.yml \
  --checkpoints-dir ./Checkpoints \
  --statistics-dir ./Statistics

Before launching a new run, verify:

KonfAI supports several advanced training patterns visible in the codebase and examples:

For a concrete advanced example, see the GAN variant in examples/Synthesis/Config_GAN.yml.