Execution flow¶
KonfAI ships three low-level workflows and one higher-level app layer.
Low-level workflows¶
The konfai CLI dispatches to three public functions:
konfai.trainer.trainkonfai.predictor.predictkonfai.evaluator.evaluate
Each wrapper prepares a small execution context, then instantiates the corresponding configured object:
TrainerPredictorEvaluator
The key environment variables are documented in Environment variables.
What happens during training¶
At a high level, TRAIN does the following:
parse
Config.ymlinto aTrainerprepare the dataset and its train/validation split
initialize the model graph, losses, and schedulers
run the training loop
save checkpoints and logs
copy the active config into the statistics directory
Outputs are written to:
Checkpoints/<train_name>/Statistics/<train_name>/
What happens during prediction¶
PREDICTION:
parses
Prediction.ymlinto aPredictorloads one or more checkpoints
prepares the inference dataset
runs the model in prediction mode
writes output datasets defined in
outputs_datasetcopies
Prediction.ymlinto the prediction directory
Outputs are written to:
Predictions/<train_name>/
What happens during evaluation¶
EVALUATION:
parses
Evaluation.ymlinto anEvaluatorloads the dataset pairs needed for metric computation
validates that configured output and target groups exist
computes per-case and aggregate metrics
writes JSON reports
copies the evaluation config into the evaluation directory
Outputs are written to:
Evaluations/<train_name>/Metric_TRAIN.jsonoptionally
Evaluations/<train_name>/Metric_VALIDATION.json
Programmatic vs CLI entrypoints¶
The same workflows can also be built programmatically through:
build_train(...)build_predict(...)build_evaluate(...)
This is useful when you want to validate a config before launching the full runtime.
Distributed execution¶
The execution layer is handled by the distributed runtime utilities in
konfai.utils.runtime.
From the code, this layer is responsible for:
setting
CUDA_VISIBLE_DEVICEShandling overwrite and verbosity flags
launching TensorBoard when requested
spawning worker processes with
torch.multiprocessing.spawninitializing
torch.distributedwith a local TCP port
This means that even local multi-process execution uses the same distributed bootstrap logic.
Apps¶
konfai-apps is the higher-level interface. It packages low-level prediction,
evaluation, uncertainty, and fine-tuning workflows into reusable app bundles.
See KonfAI Apps.