AdaptiveCalibrationError#

class torch_uncertainty.metrics.classification.AdaptiveCalibrationError(task, num_bins=10, norm='l1', num_classes=None, ignore_index=None, validate_args=True, **kwargs)[source]#

Computes the Adaptive Top-label Calibration Error (ACE) for classification tasks.

The Adaptive Calibration Error is a metric designed to measure the calibration of predicted probabilities by dividing the probability space into bins that adapt to the distribution of predicted probabilities. Unlike uniform binning, adaptive binning ensures a more balanced representation of predictions across bins.

Given top-class confidences \(\hat{p}_i\) and accuracies \(a_i = \mathbf{1}[\hat{y}_i = y_i]\), the \(N\) samples are sorted by confidence and split into \(M\) bins \(B_1, \dots, B_M\) each containing (approximately) the same number of samples. Three norms are available:

Adaptive Calibration Error (ACE):

\[\text{ACE} = \sum_{m=1}^{M} \frac{|B_m|}{N} \left| \operatorname{acc}(B_m) - \operatorname{conf}(B_m) \right|\]

Maximum Adaptive Calibration Error (MACE):

\[\text{MACE} = \max_{m} \left| \operatorname{acc}(B_m) - \operatorname{conf}(B_m) \right|\]

Root Mean Square Adaptive Calibration Error (RMACE):

\[\text{RMACE} = \sqrt{\sum_{m=1}^{M} \frac{|B_m|}{N} \left( \operatorname{acc}(B_m) - \operatorname{conf}(B_m) \right)^2}\]

where \(\operatorname{acc}(B_m) = \tfrac{1}{|B_m|}\sum_{i \in B_m} a_i\) is the fraction of correct predictions in bin \(m\), \(\operatorname{conf}(B_m) = \tfrac{1}{|B_m|}\sum_{i \in B_m} \hat{p}_i\) is the mean predicted confidence in bin \(m\), and \(|B_m|/N\) is the fraction of total samples in bin \(m\).

This metric is particularly useful for datasets or models where predictions are concentrated in certain regions of the probability space.

Parameters:
  • task – Specifies the task type, either "binary" or "multiclass".

  • num_bins – Number of bins to divide the probability space. Defaults to 10.

  • norm – Specifies the type of norm to use: "l1", "l2", or "max". Defaults to "l1".

  • num_classes – Number of classes for "multiclass" tasks. Required when task is "multiclass".

  • ignore_index – Index to ignore during calculations. Defaults to None.

  • validate_args – Whether to validate input arguments. Defaults to True.

  • kwargs – Additional keyword arguments passed to the metric.

Example

from torch_uncertainty.metrics.classification.adaptive_calibration_error import (
    AdaptiveCalibrationError,
)

# Binary classification example
predicted_probs = torch.tensor([0.95, 0.85, 0.15, 0.05])
true_labels = torch.tensor([1, 1, 0, 0])

metric = AdaptiveCalibrationError(
    task="binary",
    num_bins=5,
    norm="l1",
)

calibration_error = metric(predicted_probs, true_labels)
print(f"Calibration Error (Binary): {calibration_error}")
# Output : Calibration Error (Binary): 0.1

Note

  • Adaptive binning adjusts the size of bins to ensure a more uniform distribution of samples across bins.

  • If task=”multiclass”, num_classes must be provided; otherwise, a TypeError will be raised.

Warning

  • Ensure that num_classes matches the actual number of classes in the dataset for multiclass tasks.

References

[1] Nixon et al., Measuring calibration in deep learning, CVPR Workshops, 2019.

See also