Deep Evidential Classification on a Toy Example#

This tutorial aims to provide an introductory overview of Deep Evidential Classification (DEC) using a practical example. We demonstrate an application of DEC by tackling the toy-problem of fitting the MNIST dataset using a Multi-Layer Perceptron (MLP) neural network model. The output of the MLP is modeled as a Dirichlet distribution. The MLP is trained by minimizing the DEC loss function, composed of a Bayesian risk square error loss and a regularization term based on KL Divergence.

DEC represents an evidential approach to quantifying uncertainty in neural network classification models. This method involves introducing prior distributions over the parameters of the Categorical likelihood function. Then, the MLP model estimates the parameters of the evidential distribution.

Training a LeNet with DEC using TorchUncertainty models#

In this part, we train a neural network, based on the model and routines already implemented in TU.

1. Loading the utilities#

To train a LeNet with the DEC loss function using TorchUncertainty, we have to load the following utilities from TorchUncertainty:

our wrapper of the Lightning Trainer
the model: lenet, which lies in torch_uncertainty.models.classification.lenet
the classification training routine in the torch_uncertainty.routines
the evidential objective: the DECLoss from torch_uncertainty.losses
the datamodule that handles dataloaders & transforms: MNISTDataModule from torch_uncertainty.datamodules

We also need to define an optimizer using torch.optim, the neural network utils within torch.nn.

from pathlib import Path

import torch
from torch import optim

from torch_uncertainty import TUTrainer
from torch_uncertainty.datamodules import MNISTDataModule
from torch_uncertainty.losses import DECLoss
from torch_uncertainty.models.classification import lenet
from torch_uncertainty.routines import ClassificationRoutine

# We also define the main hyperparameters.
# We set the number of epochs to some very low value for the sake of time.
MAX_EPOCHS = 3
BATCH_SIZE = 512

2. Creating the necessary variables#

In the following, we need to define the root of the logs, and to We use the same MNIST classification example as that used in the original DEC paper.

trainer = TUTrainer(accelerator="gpu", devices=1, max_epochs=MAX_EPOCHS, enable_progress_bar=False)

# datamodule
root = Path() / "data"
datamodule = MNISTDataModule(root=root, batch_size=BATCH_SIZE, num_workers=8)

model = lenet(
    in_channels=datamodule.num_channels,
    num_classes=datamodule.num_classes,
)

3. The Loss and the Training Routine#

Next, we need to define the loss to be used during training. After that, we define the training routine using the single classification model training routine from torch_uncertainty.routines.ClassificationRoutine. In this routine, we provide the model, the DEC loss, the optimizer, and all the default arguments. We follow the official implementation in DEC, use the Adam optimizer with the default learning rate of 0.002 and a weight decay of 0.005.

loss = DECLoss(reg_weight=1e-2)

routine = ClassificationRoutine(
    model=model,
    num_classes=datamodule.num_classes,
    loss=loss,
    optim_recipe=optim.Adam(model.parameters(), lr=2e-2, weight_decay=0.005),
)

4. Gathering Everything and Training the Model#

trainer.fit(model=routine, datamodule=datamodule)
trainer.test(model=routine, datamodule=datamodule)

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric  ┃      Classification       ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│     Acc      │          67.620%          │
│    Brier     │          0.51249          │
│   Entropy    │          0.20731          │
│     NLL      │          2.40091          │
└──────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric  ┃        Calibration        ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│     ECE      │          23.052%          │
│     aECE     │          23.052%          │
└──────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric  ┃ Selective Classification  ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    AUGRC     │          7.660%           │
│     AURC     │          11.023%          │
│  Cov@5Risk   │          56.750%          │
│  Risk@80Cov  │          20.350%          │
└──────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric  ┃        Complexity         ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    flops     │         288.40 M          │
│    params    │          44.43 K          │
└──────────────┴───────────────────────────┘

[{'test/cal/ECE': 0.23052017390727997, 'test/cal/aECE': 0.23051974177360535, 'test/cls/Acc': 0.6761999726295471, 'test/cls/Brier': 0.5124949216842651, 'test/cls/NLL': 2.4009127616882324, 'test/sc/AUGRC': 0.0765976533293724, 'test/sc/AURC': 0.11022715270519257, 'test/sc/Cov@5Risk': 0.5674999952316284, 'test/sc/Risk@80Cov': 0.20350000262260437, 'test/cls/Entropy': 0.207307830452919, 'test/cplx/flops': 288399360.0, 'test/cplx/params': 44426.0}]

5. Testing the Model#

Now that the model is trained, let’s test it on MNIST.

import matplotlib.pyplot as plt
import numpy as np
import torchvision
import torchvision.transforms.functional as F


def imshow(img) -> None:
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


def rotated_mnist(angle: int) -> None:
    """Rotate MNIST images and show images and confidence.

    Args:
        angle: Rotation angle in degrees.
    """
    rotated_images = F.rotate(images, angle)
    # print rotated images
    plt.axis("off")
    imshow(torchvision.utils.make_grid(rotated_images[:4, ...], padding=0))
    print("Ground truth: ", " ".join(f"{labels[j]}" for j in range(4)))

    evidence = routine(rotated_images)
    alpha = torch.relu(evidence) + 1
    strength = torch.sum(alpha, dim=1, keepdim=True)
    probs = alpha / strength
    entropy = -1 * torch.sum(probs * torch.log(probs), dim=1, keepdim=True)
    for j in range(4):
        predicted = torch.argmax(probs[j, :])
        print(
            f"Predicted digits for the image {j}: {predicted} with strength "
            f"{strength[j, 0]:.3f} and entropy {entropy[j, 0]:.3f}."
        )


dataiter = iter(datamodule.val_dataloader())
images, labels = next(dataiter)

with torch.no_grad():
    routine.eval()
    rotated_mnist(0)
    rotated_mnist(45)
    rotated_mnist(90)

Ground truth:  7 2 1 0
Predicted digits for the image 0: 7 with strength 49.682 and entropy 0.871.
Predicted digits for the image 1: 2 with strength 54.707 and entropy 1.257.
Predicted digits for the image 2: 1 with strength 64.140 and entropy 0.714.
Predicted digits for the image 3: 0 with strength 41.927 and entropy 1.217.
Ground truth:  7 2 1 0
Predicted digits for the image 0: 9 with strength 25.163 and entropy 1.463.
Predicted digits for the image 1: 0 with strength 10.000 and entropy 2.303.
Predicted digits for the image 2: 3 with strength 10.778 and entropy 2.283.
Predicted digits for the image 3: 0 with strength 29.989 and entropy 1.270.
Ground truth:  7 2 1 0
Predicted digits for the image 0: 0 with strength 24.346 and entropy 1.471.
Predicted digits for the image 1: 2 with strength 28.415 and entropy 1.320.
Predicted digits for the image 2: 2 with strength 39.471 and entropy 1.370.
Predicted digits for the image 3: 0 with strength 31.508 and entropy 1.226.

References#

Deep Evidential Classification: Murat Sensoy, Lance Kaplan, & Melih Kandemir (2018). Evidential Deep Learning to Quantify Classification Uncertainty NeurIPS 2018.

Total running time of the script: (0 minutes 6.085 seconds)

Gallery generated by Sphinx-Gallery