konfai package¶
Subpackages¶
Submodules¶
konfai.evaluator module¶
Evaluation workflow classes and helpers for KonfAI.
- class konfai.evaluator.CriterionsAttr[source]¶
Bases:
objectContainer for additional metadata or configuration attributes related to a loss criterion.
This class is currently empty but acts as a placeholder for future extension. It is passed along with each loss function to allow parameterization or inspection of its behavior.
Use cases may include: - Weighting of individual loss terms - Conditional activation - Logging preferences
- class konfai.evaluator.CriterionsLoader(criterions_loader={'default|torch:nn:CrossEntropyLoss|Dice|NCC': <konfai.evaluator.CriterionsAttr object>})[source]¶
Bases:
objectLoader for multiple criterion modules to be applied between a model output and one or more targets.
Each loss module (e.g., Dice, CrossEntropy, NCC) is dynamically loaded using its fully-qualified classpath and is associated with a CriterionsAttr configuration object.
- Parameters:
criterions_loader (
dict[str,CriterionsAttr]) – A mapping from module classpaths (as strings) to CriterionsAttr instances. The module path is parsed and instantiated via get_module.
- class konfai.evaluator.TargetCriterionsLoader(targets_criterions={'default': <konfai.evaluator.CriterionsLoader object>})[source]¶
Bases:
objectLoader class for handling multiple target groups with associated criterion configurations.
This class allows defining a set of criterion loaders (e.g., Dice, BCE, MSE) for each target group to be used during evaluation or training. Each target group corresponds to one or more loss functions, all linked to a specific model output.
- Parameters:
targets_criterions (
dict[str,CriterionsLoader]) – Dictionary mapping each target group name to a CriterionsLoader instance that defines its associated loss functions.
- class konfai.evaluator.Statistics(filename)[source]¶
Bases:
objectUtility class to accumulate, structure, and write evaluation metric results.
This class is used to: - Collect metrics for each dataset sample. - Compute aggregate statistics (mean, std, percentiles, etc.). - Export all results in a structured JSON format, including both per-case and aggregate values.
- Parameters:
filename (
Path) – Path to the output JSON file that will store the final results.
- class konfai.evaluator.Evaluator(train_name='default|TRAIN_01', metrics={'default': <konfai.evaluator.TargetCriterionsLoader object>}, dataset={'dataset_filenames': ['default|./Dataset:mha'], 'groups_src': {'default': {'default|group_dest': {'transforms': [], 'patch_transforms': []}}}, 'patch': None, 'use_cache': True, 'subset': <konfai.data.data_manager.PredictionSubset object>, 'batch_size': 1, 'validation': None, 'inline_augmentations': False, 'data_augmentations_list': {}})[source]¶
Bases:
DistributedObjectDistributed evaluation engine for computing metrics on model predictions.
This class handles the evaluation of predicted outputs using predefined metric loaders. It supports multi-output and multi-target configurations, computes aggregated statistics across training and validation datasets, and synchronizes results across processes.
Evaluation results are stored in JSON format and optionally displayed during iteration.
- Parameters:
train_name (
str) – Unique name of the evaluation run, used for logging and output folders.metrics (
dict[str,TargetCriterionsLoader]) – Dictionary mapping output groups to loaders of target metrics.dataset (
DataMetric) – Dataset provider configured for evaluation mode.
- statistics_train¶
Object used to store training evaluation metrics.
- Type:
- statistics_validation¶
Object used to store validation evaluation metrics.
- Type:
- setup(world_size)[source]¶
Prepare the evaluator for distributed metric computation.
This method performs the following steps: - Checks whether previous evaluation results exist and optionally overwrites them. - Creates the output directory and copies the current configuration file for reproducibility. - Loads the evaluation dataset according to the world size.
- Parameters:
world_size (
int) – Number of processes in the distributed evaluation setup.
- update(batch_sample, statistics)[source]¶
Compute metrics for a batch and update running statistics.
- Parameters:
batch_sample (
dict[str,BatchDataItem]) – The batch sample object containing tensors and their metadata.statistics (
Statistics) – The statistics object to update (train or validation).
- Returns:
- Dictionary of computed metric values with keys in the format
’output_group:target_group:MetricName’.
- Return type:
- run_process(world_size, global_rank, gpu, dataloaders)[source]¶
Execute the distributed evaluation loop over the training and validation datasets.
This method iterates through the provided DataLoaders (train and optionally validation), updates the metric statistics using the configured metrics dictionary, and synchronizes the results across all processes. On the global rank 0, the metrics are saved as JSON files.
Metrics are displayed in real-time using tqdm progress bars, showing a summary of the current batch’s computed values.
- Parameters:
world_size (
int) – Total number of distributed processes.global_rank (
int) – Global rank of the current process (used for writing results).gpu (
int) – Local GPU ID used for synchronization.dataloaders (
list[DataLoader]) – A list containing one or two DataLoaders: - dataloaders[0] is used for training evaluation. - dataloaders[1] (optional) is used for validation evaluation.
Notes
Only the main process (global_rank == 0) writes final results to disk.
- konfai.evaluator.build_evaluate(evaluations_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluation.yml'), evaluations_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluations'))[source]¶
Build and return the configured evaluation workflow without executing it.
- konfai.evaluator.evaluate(overwrite=False, gpu=[], cpu=1, quiet=False, tb=False, evaluations_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluation.yml'), evaluations_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluations'))[source]¶
Build and execute the configured evaluation workflow.
This compatibility wrapper preserves the historical CLI-facing API while delegating the pure build step to
build_evaluate().- Return type:
DistributedObject
konfai.main module¶
Command-line entrypoints for KonfAI workflows, apps, and services.
- konfai.main.main()[source]¶
Entry point for the
konfaicommand-line interface.This function builds the top-level CLI parser and delegates the full argument parsing and command dispatching to _run(parser).
Supported commands are: - TRAIN - RESUME - PREDICTION - EVALUATION
Notes
The actual execution logic is implemented in konfai.trainer.train, konfai.predictor.predict, and konfai.evaluator.evaluate.
- konfai.main.cluster()[source]¶
Entry point for running KonfAI with cluster-oriented CLI arguments.
This command extends the standard KonfAI CLI with a “Cluster manager arguments” group (job name, nodes, memory, time limit, resubmit), then delegates parsing and command dispatching to _run(parser).
Notes
This function only defines extra CLI arguments before delegating to
_run.
konfai.predictor module¶
Prediction workflow classes, reductions, and export helpers for KonfAI.
- class konfai.predictor.Reduction[source]¶
Bases:
ABCAbstract reduction applied across model ensemble or augmentation outputs.
- class konfai.predictor.Mean[source]¶
Bases:
ReductionAverage ensemble or augmentation predictions element-wise.
- class konfai.predictor.Median[source]¶
Bases:
ReductionCompute the element-wise median across prediction tensors.
- class konfai.predictor.Concat[source]¶
Bases:
ReductionConcatenate prediction tensors along the channel dimension.
- class konfai.predictor.OutputDataset(filename, group, before_reduction_transforms, after_reduction_transforms, final_transforms, patch_combine, reduction)[source]¶
Bases:
Dataset,NeedDevice,ABCAbstract prediction sink that accumulates model outputs and writes them to disk.
Concrete subclasses define how layers are accumulated across patches, augmentations, and multiple models before the final prediction volume is materialized.
- abstractmethod add_layer(index_dataset, index_augmentation, index_patch, layer, dataset, attribute=None)[source]¶
- class konfai.predictor.OutSameAsGroupDataset(same_as_group='default', dataset_filename='default|./Dataset:mha', group='default', before_reduction_transforms={'default|Normalize': <konfai.data.transform.TransformLoader object>}, after_reduction_transforms={'default|Normalize': <konfai.data.transform.TransformLoader object>}, final_transforms={'default|Normalize': <konfai.data.transform.TransformLoader object>}, patch_combine=None, reduction='Mean')[source]¶
Bases:
OutputDatasetOutput dataset that mirrors the geometry and transform chain of an input group.
This is the default output writer used by KonfAI prediction workflows.
- class konfai.predictor.OutputDatasetLoader(name_class='OutSameAsGroupDataset')[source]¶
Bases:
objectFactory that instantiates output dataset classes from predictor config.
- class konfai.predictor.ModelComposite(model, combine)[source]¶
Bases:
NetworkA composite model that replicates a given base network multiple times and combines their outputs.
This class is designed to handle model ensembles or repeated predictions from the same architecture. It creates nb_models deep copies of the input model, each with its own name and output branch, and aggregates their outputs using a provided Reduction strategy (e.g., mean, median).
- Parameters:
- load(state_sources)[source]¶
Load weights for each sub-model in the composite from the corresponding state dictionaries.
- class konfai.predictor.Predictor(model=<konfai.network.network.ModelLoader object>, dataset={'dataset_filenames': ['default|./Dataset'], 'groups_src': {'default': {'default|Labels': {'transforms': [], 'patch_transforms': []}}}, 'patch': <konfai.data.patching.DatasetPatch object>, 'use_cache': False, 'subset': <konfai.data.data_manager.PredictionSubset object>, 'batch_size': 1, 'validation': None, 'inline_augmentations': False, 'data_augmentations_list': {'DataAugmentation_0': <konfai.data.augmentation.DataAugmentationsList object>}}, combine='Mean', train_name='name', manual_seed=None, gpu_checkpoints=None, autocast=False, outputs_dataset={'default|Default': <konfai.predictor.OutputDatasetLoader object>}, data_log=None)[source]¶
Bases:
DistributedObjectKonfAI’s main prediction controller.
This class orchestrates the prediction phase by: - Loading model weights from checkpoint(s) or URL(s) - Preparing datasets and output configurations - Managing distributed inference with optional multi-GPU support - Applying transformations and saving predictions - Optionally logging results to TensorBoard
- dataset¶
Dataset manager for prediction data.
- Type:
- outputs_dataset¶
Mapping from layer names to output writers.
- Type:
- setup(world_size)[source]¶
Set up the predictor for inference.
This method performs all necessary initialization steps before running predictions: - Ensures output directories exist, and optionally prompts the user before overwriting existing predictions. - Copies the current configuration file (Prediction.yml) into the output directory for reproducibility. - Dynamically loads pretrained weights from local files or remote URLs. - Wraps the base model into a ModelComposite to support ensemble inference. - Initializes the prediction dataloader, with proper distribution across available GPUs.
- Parameters:
world_size (
int) – Total number of processes or GPUs used for distributed prediction.
- run_process(world_size, global_rank, local_rank, dataloaders)[source]¶
Launch prediction on the given process rank.
- Parameters:
world_size (
int) – Total number of processes.global_rank (
int) – Rank of the current process.local_rank (
int) – Local device rank.dataloaders (
list[DataLoader]) – List of data loaders for prediction.
- konfai.predictor.build_predict(models, prediction_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Prediction.yml'), predictions_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Predictions'))[source]¶
Build and return the configured prediction workflow without executing it.
- Parameters:
- Returns:
Configured predictor object ready to be executed by the runtime wrapper.
- Return type:
DistributedObject
- konfai.predictor.predict(models, overwrite=False, gpu=[], cpu=1, quiet=False, tb=False, prediction_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Prediction.yml'), predictions_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Predictions'))[source]¶
Build and execute the configured prediction workflow.
This compatibility wrapper preserves the historical CLI-facing API while delegating the pure build step to
build_predict().- Return type:
DistributedObject
konfai.trainer module¶
Training workflow entrypoints and orchestration for KonfAI.
- class konfai.trainer.EarlyStoppingBase[source]¶
Bases:
objectMinimal protocol for early stopping strategies used by
Trainer.
- class konfai.trainer.EarlyStopping(monitor=None, patience=10, min_delta=0.0, mode='min')[source]¶
Bases:
EarlyStoppingBaseImplements early stopping logic with configurable patience and monitored metrics.
- class konfai.trainer.Trainer(model=<konfai.network.network.ModelLoader object>, dataset={'dataset_filenames': ['default|./Dataset:mha'], 'groups_src': {'default|Labels': {'default|Labels': {'transforms': [], 'patch_transforms': []}}}, 'patch': <konfai.data.patching.DatasetPatch object>, 'use_cache': True, 'subset': <konfai.data.data_manager.TrainSubset object>, 'batch_size': 1, 'validation': 0.2, 'inline_augmentations': False, 'data_augmentations_list': {'DataAugmentation_0': <konfai.data.augmentation.DataAugmentationsList object>}}, train_name='default|TRAIN_01', manual_seed=None, epochs=100, it_validation=None, it_lr_update=None, autocast=False, gradient_checkpoints=None, gpu_checkpoints=None, ema_decay=0, data_log=None, early_stopping=None, save_checkpoint_mode='BEST')[source]¶
Bases:
DistributedObjectPublic API for training a model using the KonfAI framework. Wraps setup, checkpointing, resuming, logging, and launching distributed _Trainer.
Main responsibilities: - Initialization from config (via @config) - Model and EMA setup - Checkpoint loading and saving - Distributed setup and launch
- Parameters:
model (
ModelLoader) – Loader for model architecture.dataset (
DataTrain) – Training/validation dataset.train_name (
str) – Training session name.epochs (
int) – Number of epochs to run.autocast (
bool) – Enable AMP training.gradient_checkpoints (
list[str] |None) – Modules to use gradient checkpointing on.gpu_checkpoints (
list[str] |None) – Modules to pin on specific GPUs.ema_decay (
float) – EMA decay factor.early_stopping (
EarlyStopping|None) – Optional early stopping config.save_checkpoint_mode (
str) – Either “BEST” or “ALL”.
- setup(world_size)[source]¶
Initializes the training environment: - Clears previous outputs (unless resuming) - Initializes model and EMA - Loads checkpoint (if resuming) - Prepares dataloaders
- Parameters:
world_size (
int) – Total number of distributed processes.
- run_process(world_size, global_rank, local_rank, dataloaders)[source]¶
Launches the actual training process via internal _Trainer class. Wraps model with DDP or CPU fallback, attaches EMA, and starts training.
- Parameters:
world_size (
int) – Total number of distributed processes.global_rank (
int) – Global rank of the current process.local_rank (
int) – Local rank within the node.dataloaders (
list[DataLoader]) – Training and validation dataloaders.
- konfai.trainer.build_train(command=State.TRAIN, model=None, config=PosixPath('Config.yml'), checkpoints_dir=PosixPath('Checkpoints'), statistics_dir=PosixPath('Statistics'))[source]¶
Build and return the configured training workflow without executing it.
- Parameters:
- Returns:
Configured trainer object ready to be executed by the runtime wrapper.
- Return type:
DistributedObject
- konfai.trainer.train(command=State.TRAIN, overwrite=False, model=None, gpu=[], cpu=None, quiet=False, tensorboard=False, config=PosixPath('Config.yml'), checkpoints_dir=PosixPath('Checkpoints'), statistics_dir=PosixPath('Statistics'))[source]¶
Build and execute the configured training workflow.
This compatibility wrapper preserves the historical CLI-facing API while delegating the pure build step to
build_train().- Return type:
DistributedObject
Module contents¶
Top-level helpers and runtime utilities exposed by the KonfAI package.
- konfai.checkpoints_directory()[source]¶
Return the configured checkpoint output directory.
- Return type:
- konfai.predictions_directory()[source]¶
Return the configured prediction output directory.
- Return type:
- konfai.evaluations_directory()[source]¶
Return the configured evaluation output directory.
- Return type:
- konfai.statistics_directory()[source]¶
Return the configured statistics output directory.
- Return type:
- konfai.config_file()[source]¶
Return the active configuration file used by the current workflow.
- Return type:
- konfai.konfai_state()[source]¶
Return the current KonfAI workflow state stored in the environment.
- Return type:
- konfai.konfai_root()[source]¶
Return the root configuration section name for the current workflow.
- Return type:
- class konfai.RemoteServer(host, port, token)[source]¶
Bases:
objectConnection settings for a remote KonfAI Apps server.
- konfai.get_available_devices(remote_server=None, timeout_s=2.0)[source]¶
Return the available GPU indices and their display names.
- konfai.get_vram(devices, remote_server=None, timeout_s=2.0)[source]¶
Return used and total VRAM in gigabytes for the selected devices.
- konfai.current_date()[source]¶
Return the current timestamp formatted for KonfAI output folders.
- Return type:
- konfai.check_server(remote_server, timeout_s=2.0)[source]¶
Check whether a remote KonfAI Apps server is reachable and healthy.
- Parameters:
remote_server (
RemoteServer) – Remote server connection settings.timeout_s (
float) – HTTP timeout used for the health check.
- Returns:
A boolean success flag and a human-readable status message.
- Return type:
- exception konfai.KonfAIPackagesError[source]¶
Bases:
RuntimeErrorRaised when required Python packages for KonfAI are missing/broken.
- konfai.assert_konfai_install()[source]¶
Raise
KonfAIPackagesErrorif the KonfAI dependency check fails.- Return type: