.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_tutorials/Post_Hoc_Methods/tutorial_deup.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_tutorials_Post_Hoc_Methods_tutorial_deup.py: DEUP: Direct Epistemic Uncertainty Prediction with TorchUncertainty ==================================================================== DEUP estimates the *epistemic* component of uncertainty by training a lightweight error-predictor ``g`` on out-of-fold generalization errors collected from a held-out calibration set (Algorithm 2 in Lahlou et al. 2023). Once fitted, ``g(x)`` returns a non-negative score: higher means "the base model is more likely to be wrong here." This tutorial has two parts: 1. **Synthetic walkthrough** - illustrates the DEUP API on random tabular data. 2. **CIFAR-10 + ClassificationRoutine** - integrates DEUP with a pretrained ResNet-18 for OOD detection against SVHN, the standard CIFAR-10 OOD benchmark. How DEUP works: ~~~~~~~~~~~~~~~ 1. Run the (already-trained) base model on a held-out calibration set and compute per-sample errors (cross-entropy for classification, squared error for regression). 2. Perform K-fold cross-validation *on those calibration errors* to train K lightweight error-predictor MLPs. The out-of-fold predictions become the targets for the final predictor, acting as generalization-error proxies. 3. Train the final error predictor ``g`` on all calibration features to predict those OOF targets. At inference, ``g(x) ≥ 0`` is the epistemic uncertainty estimate. Reference: Lahlou et al. (2023). *DEUP: Direct Epistemic Uncertainty Prediction.* TMLR. https://openreview.net/forum?id=eGLdVRvvfQ .. GENERATED FROM PYTHON SOURCE LINES 34-36 1. Imports ~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 36-45 .. code-block:: Python import os import torch from torch import nn from torch.utils.data import DataLoader, TensorDataset from torch_uncertainty.post_processing import DEUP .. GENERATED FROM PYTHON SOURCE LINES 46-52 2. Synthetic classification task ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We create a small random dataset to illustrate the DEUP API without any expensive training. In practice the base model should be *pre-trained*; here we use random weights just to show the interface. .. GENERATED FROM PYTHON SOURCE LINES 52-63 .. code-block:: Python torch.manual_seed(0) n_cal, in_dim, n_classes = 100, 8, 5 x_cal = torch.randn(n_cal, in_dim) y_cal = torch.randint(0, n_classes, (n_cal,)) cal_loader = DataLoader(TensorDataset(x_cal, y_cal), batch_size=32) model = nn.Sequential(nn.Linear(in_dim, 32), nn.ReLU(), nn.Linear(32, n_classes)) # In a real use-case, load a pre-trained model here. .. GENERATED FROM PYTHON SOURCE LINES 64-70 3. Fit DEUP on the calibration split ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``DEUP.fit`` collects per-sample cross-entropy errors from the base model on the calibration loader, runs K-fold cross-validation to build OOF error estimates, and trains the final error predictor ``g`` on those estimates. .. GENERATED FROM PYTHON SOURCE LINES 70-81 .. code-block:: Python deup = DEUP( task="classification", model=model, num_folds=5, hidden_dim=32, max_epochs=30, device="cpu", ) deup.fit(cal_loader) .. GENERATED FROM PYTHON SOURCE LINES 82-88 4. Epistemic uncertainty at inference ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``deup(x)`` takes raw inputs, runs them through the base model to extract features, and returns a non-negative epistemic score per sample. Higher scores signal that the base model is more likely to be unreliable on that input. .. GENERATED FROM PYTHON SOURCE LINES 88-96 .. code-block:: Python x_test = torch.randn(10, in_dim) uncertainty = deup(x_test) print("Epistemic scores g(x):", uncertainty) probs = deup.predict_proba(x_test) print("Base-model likelihood shape:", probs.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none Epistemic scores g(x): tensor([0.8570, 0.8676, 0.8819, 0.8829, 0.8861, 0.8889, 0.8522, 0.8715, 0.8617, 0.8622]) Base-model likelihood shape: torch.Size([10, 5]) .. GENERATED FROM PYTHON SOURCE LINES 97-107 5. Apply DEUP on CIFAR-10 with the ClassificationRoutine ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this section we integrate DEUP with TorchUncertainty's :class:`~torch_uncertainty.routines.ClassificationRoutine` to evaluate OOD detection performance on CIFAR-10 versus SVHN. The routine fits the error predictor **automatically** on the validation split at the start of ``trainer.test()``, so no manual ``fit()`` call is needed here. .. GENERATED FROM PYTHON SOURCE LINES 107-117 .. code-block:: Python import torch from huggingface_hub import hf_hub_download from torch_uncertainty import TUTrainer from torch_uncertainty.datamodules import CIFAR10DataModule from torch_uncertainty.models.classification.resnet import resnet from torch_uncertainty.post_processing import DEUP from torch_uncertainty.routines import ClassificationRoutine .. GENERATED FROM PYTHON SOURCE LINES 118-124 6. Load a pretrained ResNet-18 from Hugging Face ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use a CIFAR-style ResNet-18 (3x3 first convolution, no max-pooling) from TorchUncertainty's HuggingFace hub. The CIFAR-style variant preserves more spatial information on small 32x32 images than the standard ImageNet variant. .. GENERATED FROM PYTHON SOURCE LINES 124-132 .. code-block:: Python cifar_model = resnet(in_channels=3, num_classes=10, arch=18, style="cifar", conv_bias=False) ckpt_path = hf_hub_download(repo_id="torch-uncertainty/resnet18_c10", filename="resnet18_c10.ckpt") weights = torch.load(ckpt_path, map_location="cpu", weights_only=True) cifar_model.load_state_dict(weights) cifar_model = cifar_model.cuda().eval() .. GENERATED FROM PYTHON SOURCE LINES 133-145 7. DataModule and Trainer ~~~~~~~~~~~~~~~~~~~~~~~~~ :class:`~torch_uncertainty.datamodules.CIFAR10DataModule` with ``eval_ood=True`` automatically provides the SVHN out-of-distribution test loader alongside the standard CIFAR-10 test loader. The key argument here is ``postprocess_set="val"``: it tells the routine to fit the DEUP error predictor on the *validation* split rather than the test set, avoiding any data leakage. We reserve 10% of the training set as validation via ``val_split=0.1``. The routine automatically builds this split before fitting DEUP, so no manual ``setup`` call is needed. .. GENERATED FROM PYTHON SOURCE LINES 145-162 .. code-block:: Python datamodule = CIFAR10DataModule( root=os.environ.get("TU_DATA_DIR", "data"), batch_size=256, num_workers=4, eval_ood=True, val_split=0.1, postprocess_set="val", ) trainer = TUTrainer( accelerator="gpu", devices=1, max_epochs=1, enable_progress_bar=True, ) .. GENERATED FROM PYTHON SOURCE LINES 163-175 8. ClassificationRoutine with DEUP post-processing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Passing ``post_processing=deup_c10`` and ``ood_criterion="deup"`` wires DEUP into the routine: - At the start of ``trainer.test()``, the routine calls ``deup_c10.fit()`` on the validation dataloader (``postprocess_set="val"``). - During the test loop, the DEUP epistemic score is used as the OOD detection criterion: higher score ⟹ more likely OOD. No loss or optimizer is needed since we are only running evaluation. .. GENERATED FROM PYTHON SOURCE LINES 175-192 .. code-block:: Python deup_c10 = DEUP( task="classification", hidden_dim=64, max_epochs=50, device="cuda", ) routine = ClassificationRoutine( model=cifar_model, num_classes=10, loss=None, eval_ood=True, post_processing=deup_c10, ood_criterion="deup", ) .. GENERATED FROM PYTHON SOURCE LINES 193-203 9. Evaluate OOD detection with DEUP ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``trainer.test()`` runs the following sequence automatically: 1. Fit the DEUP error predictor on the CIFAR-10 validation split. 2. Compute in-distribution classification metrics on the CIFAR-10 test set. 3. Compute OOD detection metrics (AUROC, AUPR, FPR95) using SVHN as OOD data. OOD detection results are reported under the ``ood/`` prefix. .. GENERATED FROM PYTHON SOURCE LINES 203-206 .. code-block:: Python results_deup = trainer.test(routine, datamodule=datamodule) .. rst-class:: sphx-glr-script-out .. code-block:: none ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Classification ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Acc │ 93.380% │ │ Brier │ 0.10812 │ │ Entropy │ 0.08849 │ │ NLL │ 0.26405 │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Calibration ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ ECE │ 3.537% │ │ MCE │ 23.670% │ │ SmECE │ 10.143% │ │ aECE │ 3.500% │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ OOD Detection ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ AUPR │ 81.944% │ │ AUROC │ 67.990% │ │ Entropy │ 0.35549 │ │ FPR95 │ 79.300% │ │ SCOD_AUGRC │ 0.32517 │ │ SCOD_AURC │ 0.58219 │ │ SCOD_Cov_5R… │ nan │ │ SCOD_Risk_8… │ 0.69073 │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Selective Classification ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ AUGRC │ 0.779% │ │ AURC │ 0.959% │ │ Cov_5Risk │ 96.510% │ │ Risk_80Cov │ 1.200% │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Complexity ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ flops │ 284.38 G │ │ params │ 11.17 M │ └──────────────┴───────────────────────────┘ Testing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102/102 0:00:06 • 0:00:00 14.63it/s .. GENERATED FROM PYTHON SOURCE LINES 207-213 10. Compare with the Maximum Softmax Probability baseline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can swap the OOD criterion to compare DEUP against the Maximum Softmax Probability (MSP) baseline — no re-training or re-fitting required. Only the OOD detection scores change; in-distribution metrics remain identical. .. GENERATED FROM PYTHON SOURCE LINES 213-220 .. code-block:: Python from torch_uncertainty.ood_criteria import MaxSoftmaxCriterion routine.ood_criterion = MaxSoftmaxCriterion() results_msp = trainer.test(routine, datamodule=datamodule) .. rst-class:: sphx-glr-script-out .. code-block:: none ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Classification ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Acc │ 93.380% │ │ Brier │ 0.10812 │ │ Entropy │ 0.08849 │ │ NLL │ 0.26405 │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Calibration ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ ECE │ 3.537% │ │ MCE │ 23.670% │ │ SmECE │ 10.143% │ │ aECE │ 3.500% │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ OOD Detection ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ AUPR │ 90.246% │ │ AUROC │ 82.969% │ │ Entropy │ 0.35549 │ │ FPR95 │ 56.050% │ │ SCOD_AUGRC │ 0.29513 │ │ SCOD_AURC │ 0.47860 │ │ SCOD_Cov_5R… │ nan │ │ SCOD_Risk_8… │ 0.67224 │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Selective Classification ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ AUGRC │ 0.779% │ │ AURC │ 0.959% │ │ Cov_5Risk │ 96.510% │ │ Risk_80Cov │ 1.200% │ └──────────────┴───────────────────────────┘ ┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ Complexity ┃ ┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ flops │ 284.38 G │ │ params │ 11.17 M │ └──────────────┴───────────────────────────┘ Testing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102/102 0:00:06 • 0:00:00 14.58it/s .. GENERATED FROM PYTHON SOURCE LINES 221-236 The ``ood/`` rows in both tables allow a direct comparison between the DEUP epistemic score and the MSP confidence score as OOD detectors on a well-trained ResNet-18. DEUP is expected to complement confidence-based criteria by focusing on the epistemic component of the model's uncertainty. References ---------- - **DEUP:** Lahlou, S., Jain, M., Nekoei, H., Butoi, V. I., Bertin, P., Rector-Brooks, J., ... & Bengio, Y. (2023). DEUP: Direct Epistemic Uncertainty Prediction. TMLR. `openreview `_. - **ResNet:** He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. CVPR 2016. - **MSP baseline:** Hendrycks, D., & Gimpel, K. (2017). A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. ICLR 2017. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 26.760 seconds) .. _sphx_glr_download_auto_tutorials_Post_Hoc_Methods_tutorial_deup.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: tutorial_deup.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: tutorial_deup.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: tutorial_deup.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_