konfai package

Subpackages

Submodules

konfai.evaluator module

Evaluation workflow classes and helpers for KonfAI.

class konfai.evaluator.CriterionsAttr[source]

Bases: object

Container for additional metadata or configuration attributes related to a loss criterion.

This class is currently empty but acts as a placeholder for future extension. It is passed along with each loss function to allow parameterization or inspection of its behavior.

Use cases may include: - Weighting of individual loss terms - Conditional activation - Logging preferences

class konfai.evaluator.CriterionsLoader(criterions_loader={'default|torch:nn:CrossEntropyLoss|Dice|NCC': <konfai.evaluator.CriterionsAttr object>})[source]

Bases: object

Loader for multiple criterion modules to be applied between a model output and one or more targets.

Each loss module (e.g., Dice, CrossEntropy, NCC) is dynamically loaded using its fully-qualified classpath and is associated with a CriterionsAttr configuration object.

Parameters:

criterions_loader (dict[str, CriterionsAttr]) – A mapping from module classpaths (as strings) to CriterionsAttr instances. The module path is parsed and instantiated via get_module.

get_criterions(output_group, target_group)[source]
Return type:

dict[Module, CriterionsAttr]

class konfai.evaluator.TargetCriterionsLoader(targets_criterions={'default': <konfai.evaluator.CriterionsLoader object>})[source]

Bases: object

Loader class for handling multiple target groups with associated criterion configurations.

This class allows defining a set of criterion loaders (e.g., Dice, BCE, MSE) for each target group to be used during evaluation or training. Each target group corresponds to one or more loss functions, all linked to a specific model output.

Parameters:

targets_criterions (dict[str, CriterionsLoader]) – Dictionary mapping each target group name to a CriterionsLoader instance that defines its associated loss functions.

get_targets_criterions(output_group)[source]

Retrieve the criterion modules and their attributes for a specific output group.

This function prepares the loss functions to be applied for a given model output, grouped by their target group.

Parameters:

output_group (str) – Name of the model output group (e.g., “output_segmentation”).

Returns:

A nested dictionary where the first key is the target group name, and the value is a dictionary mapping each loss module to its attributes.

Return type:

dict[str, dict[Module, CriterionsAttr]]

class konfai.evaluator.Statistics(filename)[source]

Bases: object

Utility class to accumulate, structure, and write evaluation metric results.

This class is used to: - Collect metrics for each dataset sample. - Compute aggregate statistics (mean, std, percentiles, etc.). - Export all results in a structured JSON format, including both per-case and aggregate values.

Parameters:

filename (Path) – Path to the output JSON file that will store the final results.

add(values, name_dataset)[source]

Add a set of metric values for a given dataset case.

Parameters:
  • values (dict[str, float]) – Dictionary of metric names and their values.

  • name_dataset (str) – Identifier (e.g., case name) for the sample.

Return type:

None

static get_statistic(values)[source]

Compute statistical aggregates for a list of metric values.

Parameters:

values (list[float]) – Values to summarize.

Returns:

A dictionary containing:
  • max, min, std

  • 25th, 50th, and 75th percentiles

  • mean and count

Return type:

dict[str, float]

write(outputs)[source]

Write the collected and aggregated statistics to the configured output file.

The output JSON structure contains: - case: All individual metrics per sample. - aggregates: Global statistics computed over all cases.

Parameters:

outputs (list[dict[str, dict[str, Any]]]) – List of metric dictionaries to merge and serialize.

Return type:

None

read()[source]
class konfai.evaluator.Evaluator(train_name='default|TRAIN_01', metrics={'default': <konfai.evaluator.TargetCriterionsLoader object>}, dataset={'dataset_filenames': ['default|./Dataset:mha'], 'groups_src': {'default': {'default|group_dest': {'transforms': [], 'patch_transforms': []}}}, 'patch': None, 'use_cache': True, 'subset': <konfai.data.data_manager.PredictionSubset object>, 'batch_size': 1, 'validation': None, 'inline_augmentations': False, 'data_augmentations_list': {}})[source]

Bases: DistributedObject

Distributed evaluation engine for computing metrics on model predictions.

This class handles the evaluation of predicted outputs using predefined metric loaders. It supports multi-output and multi-target configurations, computes aggregated statistics across training and validation datasets, and synchronizes results across processes.

Evaluation results are stored in JSON format and optionally displayed during iteration.

Parameters:
  • train_name (str) – Unique name of the evaluation run, used for logging and output folders.

  • metrics (dict[str, TargetCriterionsLoader]) – Dictionary mapping output groups to loaders of target metrics.

  • dataset (DataMetric) – Dataset provider configured for evaluation mode.

statistics_train

Object used to store training evaluation metrics.

Type:

Statistics

statistics_validation

Object used to store validation evaluation metrics.

Type:

Statistics

dataloader

DataLoaders for training and validation sets.

Type:

list[DataLoader]

metric_path

Path to the evaluation output directory.

Type:

str

metrics

Instantiated metrics organized by output and target groups.

Type:

dict

setup(world_size)[source]

Prepare the evaluator for distributed metric computation.

This method performs the following steps: - Checks whether previous evaluation results exist and optionally overwrites them. - Creates the output directory and copies the current configuration file for reproducibility. - Loads the evaluation dataset according to the world size.

Parameters:

world_size (int) – Number of processes in the distributed evaluation setup.

update(batch_sample, statistics)[source]

Compute metrics for a batch and update running statistics.

Parameters:
  • batch_sample (dict[str, BatchDataItem]) – The batch sample object containing tensors and their metadata.

  • statistics (Statistics) – The statistics object to update (train or validation).

Returns:

Dictionary of computed metric values with keys in the format

’output_group:target_group:MetricName’.

Return type:

dict[str, float]

run_process(world_size, global_rank, gpu, dataloaders)[source]

Execute the distributed evaluation loop over the training and validation datasets.

This method iterates through the provided DataLoaders (train and optionally validation), updates the metric statistics using the configured metrics dictionary, and synchronizes the results across all processes. On the global rank 0, the metrics are saved as JSON files.

Metrics are displayed in real-time using tqdm progress bars, showing a summary of the current batch’s computed values.

Parameters:
  • world_size (int) – Total number of distributed processes.

  • global_rank (int) – Global rank of the current process (used for writing results).

  • gpu (int) – Local GPU ID used for synchronization.

  • dataloaders (list[DataLoader]) – A list containing one or two DataLoaders: - dataloaders[0] is used for training evaluation. - dataloaders[1] (optional) is used for validation evaluation.

Notes

  • Only the main process (global_rank == 0) writes final results to disk.

konfai.evaluator.build_evaluate(evaluations_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluation.yml'), evaluations_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluations'))[source]

Build and return the configured evaluation workflow without executing it.

Parameters:
  • evaluations_file (Path | str) – Evaluation configuration file.

  • evaluations_dir (Path | str) – Directory where metrics and JSON reports are written.

Returns:

Configured evaluator object ready to be executed by the runtime wrapper.

Return type:

DistributedObject

konfai.evaluator.evaluate(overwrite=False, gpu=[], cpu=1, quiet=False, tb=False, evaluations_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluation.yml'), evaluations_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Evaluations'))[source]

Build and execute the configured evaluation workflow.

This compatibility wrapper preserves the historical CLI-facing API while delegating the pure build step to build_evaluate().

Return type:

DistributedObject

konfai.main module

Command-line entrypoints for KonfAI workflows, apps, and services.

konfai.main.main()[source]

Entry point for the konfai command-line interface.

This function builds the top-level CLI parser and delegates the full argument parsing and command dispatching to _run(parser).

Supported commands are: - TRAIN - RESUME - PREDICTION - EVALUATION

Notes

The actual execution logic is implemented in konfai.trainer.train, konfai.predictor.predict, and konfai.evaluator.evaluate.

konfai.main.cluster()[source]

Entry point for running KonfAI with cluster-oriented CLI arguments.

This command extends the standard KonfAI CLI with a “Cluster manager arguments” group (job name, nodes, memory, time limit, resubmit), then delegates parsing and command dispatching to _run(parser).

Notes

  • This function only defines extra CLI arguments before delegating to _run.

konfai.predictor module

Prediction workflow classes, reductions, and export helpers for KonfAI.

class konfai.predictor.Reduction[source]

Bases: ABC

Abstract reduction applied across model ensemble or augmentation outputs.

class konfai.predictor.Mean[source]

Bases: Reduction

Average ensemble or augmentation predictions element-wise.

class konfai.predictor.Median[source]

Bases: Reduction

Compute the element-wise median across prediction tensors.

class konfai.predictor.Concat[source]

Bases: Reduction

Concatenate prediction tensors along the channel dimension.

class konfai.predictor.OutputDataset(filename, group, before_reduction_transforms, after_reduction_transforms, final_transforms, patch_combine, reduction)[source]

Bases: Dataset, NeedDevice, ABC

Abstract prediction sink that accumulates model outputs and writes them to disk.

Concrete subclasses define how layers are accumulated across patches, augmentations, and multiple models before the final prediction volume is materialized.

prepare(name_layer)[source]
Return type:

None

set_datasets(datasets)[source]
Return type:

None

abstractmethod setup(datasets, groups)[source]
set_patch_config(patch_size, overlap, nb_data_augmentation)[source]
Return type:

None

to(device)[source]
abstractmethod add_layer(index_dataset, index_augmentation, index_patch, layer, dataset, attribute=None)[source]
is_done(index)[source]
Return type:

bool

abstractmethod get_output(index, number_of_channels_per_model, dataset)[source]
Return type:

Tensor

write_prediction(index, name, layer)[source]
Return type:

None

class konfai.predictor.OutSameAsGroupDataset(same_as_group='default', dataset_filename='default|./Dataset:mha', group='default', before_reduction_transforms={'default|Normalize': <konfai.data.transform.TransformLoader object>}, after_reduction_transforms={'default|Normalize': <konfai.data.transform.TransformLoader object>}, final_transforms={'default|Normalize': <konfai.data.transform.TransformLoader object>}, patch_combine=None, reduction='Mean')[source]

Bases: OutputDataset

Output dataset that mirrors the geometry and transform chain of an input group.

This is the default output writer used by KonfAI prediction workflows.

add_layer(index_dataset, index_augmentation, index_patch, layer, dataset, attribute=None)[source]
setup(datasets, groups)[source]
get_output(index, number_of_channels_per_model, dataset)[source]
Return type:

Tensor

class konfai.predictor.OutputDatasetLoader(name_class='OutSameAsGroupDataset')[source]

Bases: object

Factory that instantiates output dataset classes from predictor config.

get_output_dataset(layer_name)[source]
Return type:

OutputDataset

class konfai.predictor.ModelComposite(model, combine)[source]

Bases: Network

A composite model that replicates a given base network multiple times and combines their outputs.

This class is designed to handle model ensembles or repeated predictions from the same architecture. It creates nb_models deep copies of the input model, each with its own name and output branch, and aggregates their outputs using a provided Reduction strategy (e.g., mean, median).

Parameters:
  • model (Network) – The base network to replicate.

  • nb_models (int) – Number of copies of the model to create.

  • combine (Reduction) – The reduction method used to combine outputs from all model replicas.

combine

The reduction method used during forward inference.

Type:

Reduction

load(state_sources)[source]

Load weights for each sub-model in the composite from the corresponding state dictionaries.

Parameters:

state_sources (list[dict[str, Any] | Path | str]) – One checkpoint source per model replica.

forward(data_dict, output_layers=[])[source]

Perform a forward pass on all model replicas and aggregate their outputs.

Parameters:
  • data_dict (dict[tuple[str, bool], Tensor]) – A dictionary mapping (group_name, requires_grad) to input tensors.

  • output_layers (list[str]) – List of output layer names to extract from each sub-model.

Returns:

Aggregated output for each layer, after applying the reduction.

Return type:

list[tuple[str, list[int], Tensor]]

class konfai.predictor.Predictor(model=<konfai.network.network.ModelLoader object>, dataset={'dataset_filenames': ['default|./Dataset'], 'groups_src': {'default': {'default|Labels': {'transforms': [], 'patch_transforms': []}}}, 'patch': <konfai.data.patching.DatasetPatch object>, 'use_cache': False, 'subset': <konfai.data.data_manager.PredictionSubset object>, 'batch_size': 1, 'validation': None, 'inline_augmentations': False, 'data_augmentations_list': {'DataAugmentation_0': <konfai.data.augmentation.DataAugmentationsList object>}}, combine='Mean', train_name='name', manual_seed=None, gpu_checkpoints=None, autocast=False, outputs_dataset={'default|Default': <konfai.predictor.OutputDatasetLoader object>}, data_log=None)[source]

Bases: DistributedObject

KonfAI’s main prediction controller.

This class orchestrates the prediction phase by: - Loading model weights from checkpoint(s) or URL(s) - Preparing datasets and output configurations - Managing distributed inference with optional multi-GPU support - Applying transformations and saving predictions - Optionally logging results to TensorBoard

model

The neural network model to use for prediction.

Type:

Network

dataset

Dataset manager for prediction data.

Type:

DataPrediction

combine_classpath

Path to the reduction strategy (e.g., “Mean”).

Type:

str

autocast

Whether to enable AMP inference.

Type:

bool

outputs_dataset

Mapping from layer names to output writers.

Type:

dict[str, OutputDataset]

data_log

List of tensors to log during inference.

Type:

list[str] | None

setup(world_size)[source]

Set up the predictor for inference.

This method performs all necessary initialization steps before running predictions: - Ensures output directories exist, and optionally prompts the user before overwriting existing predictions. - Copies the current configuration file (Prediction.yml) into the output directory for reproducibility. - Dynamically loads pretrained weights from local files or remote URLs. - Wraps the base model into a ModelComposite to support ensemble inference. - Initializes the prediction dataloader, with proper distribution across available GPUs.

Parameters:

world_size (int) – Total number of processes or GPUs used for distributed prediction.

set_models(path_to_models)[source]
Return type:

None

run_process(world_size, global_rank, local_rank, dataloaders)[source]

Launch prediction on the given process rank.

Parameters:
  • world_size (int) – Total number of processes.

  • global_rank (int) – Rank of the current process.

  • local_rank (int) – Local device rank.

  • dataloaders (list[DataLoader]) – List of data loaders for prediction.

konfai.predictor.build_predict(models, prediction_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Prediction.yml'), predictions_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Predictions'))[source]

Build and return the configured prediction workflow without executing it.

Parameters:
  • models (list[Path]) – One or more checkpoint files to load for prediction.

  • prediction_file (Path | str) – Prediction configuration file.

  • predictions_dir (Path | str) – Directory where prediction outputs are written.

Returns:

Configured predictor object ready to be executed by the runtime wrapper.

Return type:

DistributedObject

konfai.predictor.predict(models, overwrite=False, gpu=[], cpu=1, quiet=False, tb=False, prediction_file=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Prediction.yml'), predictions_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/konfai/checkouts/latest/docs/source/Predictions'))[source]

Build and execute the configured prediction workflow.

This compatibility wrapper preserves the historical CLI-facing API while delegating the pure build step to build_predict().

Return type:

DistributedObject

konfai.trainer module

Training workflow entrypoints and orchestration for KonfAI.

class konfai.trainer.EarlyStoppingBase[source]

Bases: object

Minimal protocol for early stopping strategies used by Trainer.

is_stopped()[source]
Return type:

bool

get_score(values)[source]
class konfai.trainer.EarlyStopping(monitor=None, patience=10, min_delta=0.0, mode='min')[source]

Bases: EarlyStoppingBase

Implements early stopping logic with configurable patience and monitored metrics.

monitor

Metrics to monitor.

Type:

list[str]

patience

Number of checks with no improvement before stopping.

Type:

int

min_delta

Minimum change to qualify as improvement.

Type:

float

mode

“min” or “max” depending on optimization direction.

Type:

str

is_stopped()[source]
Return type:

bool

get_score(values)[source]
class konfai.trainer.Trainer(model=<konfai.network.network.ModelLoader object>, dataset={'dataset_filenames': ['default|./Dataset:mha'], 'groups_src': {'default|Labels': {'default|Labels': {'transforms': [], 'patch_transforms': []}}}, 'patch': <konfai.data.patching.DatasetPatch object>, 'use_cache': True, 'subset': <konfai.data.data_manager.TrainSubset object>, 'batch_size': 1, 'validation': 0.2, 'inline_augmentations': False, 'data_augmentations_list': {'DataAugmentation_0': <konfai.data.augmentation.DataAugmentationsList object>}}, train_name='default|TRAIN_01', manual_seed=None, epochs=100, it_validation=None, it_lr_update=None, autocast=False, gradient_checkpoints=None, gpu_checkpoints=None, ema_decay=0, data_log=None, early_stopping=None, save_checkpoint_mode='BEST')[source]

Bases: DistributedObject

Public API for training a model using the KonfAI framework. Wraps setup, checkpointing, resuming, logging, and launching distributed _Trainer.

Main responsibilities: - Initialization from config (via @config) - Model and EMA setup - Checkpoint loading and saving - Distributed setup and launch

Parameters:
  • model (ModelLoader) – Loader for model architecture.

  • dataset (DataTrain) – Training/validation dataset.

  • train_name (str) – Training session name.

  • manual_seed (int | None) – Random seed.

  • epochs (int) – Number of epochs to run.

  • it_validation (int | None) – Validation interval.

  • it_lr_update (int | None) – Learning rate update interval.

  • autocast (bool) – Enable AMP training.

  • gradient_checkpoints (list[str] | None) – Modules to use gradient checkpointing on.

  • gpu_checkpoints (list[str] | None) – Modules to pin on specific GPUs.

  • ema_decay (float) – EMA decay factor.

  • data_log (list[str] | None) – Logging instructions.

  • early_stopping (EarlyStopping | None) – Optional early stopping config.

  • save_checkpoint_mode (str) – Either “BEST” or “ALL”.

setup(world_size)[source]

Initializes the training environment: - Clears previous outputs (unless resuming) - Initializes model and EMA - Loads checkpoint (if resuming) - Prepares dataloaders

Parameters:

world_size (int) – Total number of distributed processes.

set_model(path_to_model)[source]
Return type:

None

run_process(world_size, global_rank, local_rank, dataloaders)[source]

Launches the actual training process via internal _Trainer class. Wraps model with DDP or CPU fallback, attaches EMA, and starts training.

Parameters:
  • world_size (int) – Total number of distributed processes.

  • global_rank (int) – Global rank of the current process.

  • local_rank (int) – Local rank within the node.

  • dataloaders (list[DataLoader]) – Training and validation dataloaders.

konfai.trainer.build_train(command=State.TRAIN, model=None, config=PosixPath('Config.yml'), checkpoints_dir=PosixPath('Checkpoints'), statistics_dir=PosixPath('Statistics'))[source]

Build and return the configured training workflow without executing it.

Parameters:
  • command (State) – Training command variant, typically State.TRAIN or State.RESUME.

  • model (Path | str | None) – Checkpoint path used when resuming training.

  • config (Path | str) – Training configuration file.

  • checkpoints_dir (Path | str) – Output directory for checkpoints.

  • statistics_dir (Path | str) – Output directory for statistics and logs.

Returns:

Configured trainer object ready to be executed by the runtime wrapper.

Return type:

DistributedObject

konfai.trainer.train(command=State.TRAIN, overwrite=False, model=None, gpu=[], cpu=None, quiet=False, tensorboard=False, config=PosixPath('Config.yml'), checkpoints_dir=PosixPath('Checkpoints'), statistics_dir=PosixPath('Statistics'))[source]

Build and execute the configured training workflow.

This compatibility wrapper preserves the historical CLI-facing API while delegating the pure build step to build_train().

Return type:

DistributedObject

Module contents

Top-level helpers and runtime utilities exposed by the KonfAI package.

konfai.checkpoints_directory()[source]

Return the configured checkpoint output directory.

Return type:

Path

konfai.predictions_directory()[source]

Return the configured prediction output directory.

Return type:

Path

konfai.evaluations_directory()[source]

Return the configured evaluation output directory.

Return type:

Path

konfai.statistics_directory()[source]

Return the configured statistics output directory.

Return type:

Path

konfai.config_file()[source]

Return the active configuration file used by the current workflow.

Return type:

Path

konfai.konfai_state()[source]

Return the current KonfAI workflow state stored in the environment.

Return type:

str

konfai.konfai_root()[source]

Return the root configuration section name for the current workflow.

Return type:

str

class konfai.RemoteServer(host, port, token)[source]

Bases: object

Connection settings for a remote KonfAI Apps server.

get_headers()[source]

Return the HTTP headers required to talk to the remote server.

Return type:

dict[str, str]

get_url()[source]

Return the base URL of the remote server.

Return type:

str

konfai.cuda_visible_devices()[source]

Return the GPU indices visible to the current process.

Returns:

GPU ids exposed through CUDA_VISIBLE_DEVICES or detected by PyTorch.

Return type:

list[int]

konfai.get_available_devices(remote_server=None, timeout_s=2.0)[source]

Return the available GPU indices and their display names.

Parameters:
  • remote_server (RemoteServer | None) – Remote server to query instead of the local machine.

  • timeout_s (float) – HTTP timeout used for remote requests.

Returns:

Available device indices and the corresponding device names.

Return type:

tuple[list[int], list[str]]

konfai.get_ram(remote_server=None, timeout_s=2.0)[source]

Return used and total RAM in gigabytes.

Parameters:
  • remote_server (RemoteServer | None) – Remote server to query instead of the local machine.

  • timeout_s (float) – HTTP timeout used for remote requests.

Returns:

Used RAM and total RAM in gigabytes.

Return type:

tuple[float, float]

konfai.get_vram(devices, remote_server=None, timeout_s=2.0)[source]

Return used and total VRAM in gigabytes for the selected devices.

Parameters:
  • devices (list[int]) – GPU indices to inspect.

  • remote_server (RemoteServer | None) – Remote server to query instead of the local machine.

  • timeout_s (float) – HTTP timeout used for remote requests.

Returns:

Used VRAM and total VRAM in gigabytes.

Return type:

tuple[float, float]

konfai.current_date()[source]

Return the current timestamp formatted for KonfAI output folders.

Return type:

str

konfai.check_server(remote_server, timeout_s=2.0)[source]

Check whether a remote KonfAI Apps server is reachable and healthy.

Parameters:
  • remote_server (RemoteServer) – Remote server connection settings.

  • timeout_s (float) – HTTP timeout used for the health check.

Returns:

A boolean success flag and a human-readable status message.

Return type:

tuple[bool, str]

konfai.check_konfai_install()[source]

Checks that KonfAI dependencies are importable.

Returns:

A pair containing a global success flag and a report dictionary with the keys missing, errors, and versions.

Return type:

tuple[bool, dict]

exception konfai.KonfAIPackagesError[source]

Bases: RuntimeError

Raised when required Python packages for KonfAI are missing/broken.

konfai.assert_konfai_install()[source]

Raise KonfAIPackagesError if the KonfAI dependency check fails.

Return type:

None