CalibrationError#

class torch_uncertainty.metrics.classification.CalibrationError(task, adaptive=False, num_bins=10, norm='l1', num_classes=None, ignore_index=None, validate_args=True, **kwargs)[source]#

Computes the Calibration Error for classification tasks.

This metric evaluates how well a model’s predicted probabilities align with the actual ground truth probabilities. Calibration is crucial in assessing the reliability of probabilistic predictions, especially for downstream decision-making tasks.

Three norms are available for measuring calibration error:

Expected Calibration Error (ECE):

\[\text{ECE} = \sum_{i=1}^N b_i \lvert p_i - c_i \rvert\]

Maximum Calibration Error (MCE):

\[\text{MCE} = \max_{i} \lvert p_i - c_i \rvert\]

Root Mean Square Calibration Error (RMSCE):

\[\text{RMSCE} = \sqrt{\sum_{i=1}^N b_i (p_i - c_i)^2}\]
Here:
  • \(p_i\) is the accuracy of bin \(i\) (fraction of correct predictions).

  • \(c_i\) is the mean predicted confidence in bin \(i\).

  • \(b_i\) is the fraction of total samples falling into bin \(i\).

Bins are constructed either uniformly in the range \([0, 1]\) or adaptively (if adaptive=True).

Parameters:
  • task (str) – Specifies the task type, either "binary" or "multiclass".

  • adaptive (bool, optional) – Whether to use adaptive binning. Defaults to False.

  • num_bins (int, optional) – Number of bins to divide the probability space. Defaults to 10.

  • norm (str, optional) – Specifies the type of norm to use: "l1", "l2", or "max". Defaults to "l1".

  • num_classes (int, optional) – Number of classes for "multiclass" tasks. Required when task is "multiclass".

  • ignore_index (int, optional) – Index to ignore during calculations. Defaults to None.

  • validate_args (bool, optional) – Whether to validate input arguments. Defaults to True.

  • **kwargs – Additional keyword arguments for the metric.

Example:

from torch_uncertainty.metrics.classification.calibration_error import (
    CalibrationError,
)

# Example for binary classification
predicted_probs = torch.tensor([0.9, 0.8, 0.3, 0.2])
true_labels = torch.tensor([1, 1, 0, 0])

metric = CalibrationError(
    task="binary",
    num_bins=5,
    norm="l1",
    adaptive=False,
)

calibration_error = metric(predicted_probs, true_labels)
print(f"Calibration Error: {calibration_error}")
# Output: Calibration Error: 0.199

Note

Bins are either uniformly distributed in \([0, 1]\) or adaptively sized (if adaptive=True).

Warning

If task=”multiclass”, num_classes must be an integer; otherwise, a TypeError is raised.

References

[1] Naeini et al. Obtaining well calibrated probabilities using Bayesian binning. In AAAI, 2015.

See also

See CalibrationError for details. Our version of the metric is a wrapper around the original metric providing a plotting functionality.