Mixup and MixupMP Training & Ensembles with TorchUncertainty#

This tutorial illustrates how to train models using Mixup and MixupMP in the same style as the Packed-Ensembles tutorial you provided.

In this notebook we will show: 1. Standard Mixup training 2. Assembling Mixup-trained models into a Deep Ensemble 3. MixupMP training 4. Assembling MixupMP models the same way

Throughout this notebook we use PyTorch Lightning and TorchUncertainty for training.

Note: This script does not actually run training — it shows the configuration and conceptual usage only.

1. Training with Mixup#

To train a Mixup model using TorchUncertainty with Lightning, you need to configure the routine’s mixup_params:

  • mixtype: “mixup” tells the routine to use standard Mixup augmentation.

  • mixup_alpha controls the strength of mixing (Beta(α,α)).

Mixup internally creates convex combinations of images and labels and computes the soft-label cross entropy under the hood.

This config snippet focuses on only the mixup setup. Refer to your standard training setup for other keys.

# Example Lightning CLI config for Mixup training:
mixup_config = r"""
# lightning.pytorch==2.1.3
seed_everything: false
eval_after_fit: true

trainer:
  accelerator: gpu
  devices: 1
  precision: 16-mixed
  max_epochs: 200
  logger:
    class_path: lightning.pytorch.loggers.TensorBoardLogger
    init_args:
      save_dir: logs/wideresnet28x10
      name: mixup
      default_hp_metric: false
  callbacks:
    - class_path: torch_uncertainty.callbacks.TUClsCheckpoint
    - class_path: lightning.pytorch.callbacks.LearningRateMonitor
      init_args:
        logging_interval: step
    - class_path: lightning.pytorch.callbacks.EarlyStopping
      init_args:
        monitor: val/cls/Acc
        patience: 1000
        check_finite: true

routine:
  model:
    class_path: torch_uncertainty.models.classification.wideresnet28x10
    init_args:
      in_channels: 3
      num_classes: 10
      dropout_rate: 0.0
      style: cifar
  num_classes: 10
  loss: CrossEntropyLoss

  # Mixup-specific parameters
  mixup_params:
    mixtype: "mixup"
    mixup_alpha: 2

data:
  root: ./data
  batch_size: 128
  num_workers: 4

optimizer:
  class_path: torch.optim.SGD
  init_args:
    lr: 0.1
    momentum: 0.9
    weight_decay: 5e-4
    nesterov: true

lr_scheduler:
  class_path: torch.optim.lr_scheduler.MultiStepLR
  init_args:
    milestones:
      - 60
      - 120
      - 160
    gamma: 0.2
"""

2. How to Ensemble Mixup Models#

Once you have trained multiple Mixup models (e.g., using the previous config under different versions / runs), you can assemble them with deep_ensembles. This is identical to ensembling vanilla models, except the ckpt paths point to your Mixup-trained checkpoints.

TorchUncertainty’s deep_ensembles helper will load each checkpoint and produce an ensemble model that averages predictions across all members.:contentReference[oaicite:0]{index=0}

ensemble_mixup_config = r"""
# lightning.pytorch==2.1.3
seed_everything: false
eval_after_fit: true

trainer:
  accelerator: gpu
  devices: 1
  precision: 16-mixed
  max_epochs: 200
  logger:
    class_path: lightning.pytorch.loggers.TensorBoardLogger
    init_args:
      save_dir: logs/wideresnet28x10
      name: mixup_ensemble
      default_hp_metric: false
  callbacks:
    - class_path: torch_uncertainty.callbacks.TUClsCheckpoint
    - class_path: lightning.pytorch.callbacks.LearningRateMonitor
      init_args:
        logging_interval: step
    - class_path: lightning.pytorch.callbacks.EarlyStopping
      init_args:
        monitor: val/cls/Acc
        patience: 1000
        check_finite: true

routine:
  model:
    class_path: torch_uncertainty.models.deep_ensembles
    init_args:
      core_models:
        class_path: torch_uncertainty.models.classification.wideresnet28x10
        init_args:
          in_channels: 3
          num_classes: 10
          style: cifar
          dropout_rate: 0.0
      num_estimators: 4
      task: classification
      # Replace with your trained mixup Checkpoint paths
      ckpt_paths:
        - path/to/mixup/version_0.ckpt
        - path/to/mixup/version_1.ckpt
        - path/to/mixup/version_2.ckpt
        - path/to/mixup/version_3.ckpt

  num_classes: 10
  is_ensemble: true
  format_batch_fn:
    class_path: torch_uncertainty.transforms.RepeatTarget
    init_args:
      num_repeats: 4

data:
  root: ./data
  batch_size: 128
"""

3. Training with MixupMP#

MixupMP is a variant of mixup that produces posterior samples from a predictive distribution that is more realistic than deep ensembles alone, according to the MixupMP paper (Martingale posterior based construction).

The two key differences vs standard Mixup:
  • You use a specialized loss: MixupMPLoss

  • You set mixtype: “mixupmp” in mixup_params

This causes the routine to sample augmented predictive variations as part of the MixupMP methodology for uncertainty.

mixupmp_config = r"""
# lightning.pytorch==2.1.3
seed_everything: false
eval_after_fit: true

trainer:
  accelerator: gpu
  devices: 1
  precision: 16-mixed
  max_epochs: 200
  logger:
    class_path: lightning.pytorch.loggers.TensorBoardLogger
    init_args:
      save_dir: logs/wideresnet28x10
      name: mixupmp
      default_hp_metric: false
  callbacks:
    - class_path: torch_uncertainty.callbacks.TUClsCheckpoint
    - class_path: lightning.pytorch.callbacks.LearningRateMonitor
      init_args:
        logging_interval: step
    - class_path: lightning.pytorch.callbacks.EarlyStopping
      init_args:
        monitor: val/cls/Acc
        patience: 1000
        check_finite: true

routine:
  model:
    class_path: torch_uncertainty.models.classification.wideresnet28x10
    init_args:
      in_channels: 3
      num_classes: 10
      style: cifar
      dropout_rate: 0.0

  num_classes: 10

  # Use MixupMP-specific loss with default parameters
  loss: torch_uncertainty.losses.MixupMPLoss

  mixup_params:
    mixtype: "mixupmp"
    mixup_alpha: 2

data:
  root: ./data
  batch_size: 128
  num_workers: 4

optimizer:
  class_path: torch.optim.SGD
  init_args:
    lr: 0.1
    momentum: 0.9
    weight_decay: 5e-4
    nesterov: true

lr_scheduler:
  class_path: torch.optim.lr_scheduler.MultiStepLR
  init_args:
    milestones:
      - 60
      - 120
      - 160
    gamma: 0.2
"""

4. How to Ensemble MixupMP Models#

Ensembling MixupMP models is conceptually the same as ensembling any other model: train N different runs (with different seeds / settings), then point their ckpt_paths to the Deep Ensembles config.

The only difference is that each individual model is a MixupMP-trained checkpoint instead of standard training or mixup training.

Use the exact same deep_ensembles format as shown above (Section 2), but give it the paths to your MixupMP checkpoints.

5. References#

For more information on Mixup Ensembles, we refer to the following resources:

  • Combining ensembles and data augmentation can harm your calibration ICLR 2021 (Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, and Dustin Tran)

  • Uncertainty quantification and deep ensembles NeurIPS 2021 (Rahul Rahaman & Alexandre H. Thiery)

For more information on MixupMP, we refer to the following resource:

  • Posterior Uncertainty Quantification in Neural Networks using Data Augmentation AISTATS 2024 (Luhuan Wu & Sinead Williamson)

Total running time of the script: (0 minutes 0.001 seconds)

Gallery generated by Sphinx-Gallery