CalibrationError#
- class torch_uncertainty.metrics.classification.CalibrationError(task, adaptive=False, num_bins=10, norm='l1', num_classes=None, ignore_index=None, validate_args=True, **kwargs)[source]#
Computes the Calibration Error for classification tasks.
This metric evaluates how well a model’s predicted probabilities align with the actual ground truth probabilities. Calibration is crucial in assessing the reliability of probabilistic predictions, especially for downstream decision-making tasks.
Three norms are available for measuring calibration error:
Expected Calibration Error (ECE):
\[\text{ECE} = \sum_{i=1}^N b_i \lvert p_i - c_i \rvert\]Maximum Calibration Error (MCE):
\[\text{MCE} = \max_{i} \lvert p_i - c_i \rvert\]Root Mean Square Calibration Error (RMSCE):
\[\text{RMSCE} = \sqrt{\sum_{i=1}^N b_i (p_i - c_i)^2}\]- Here:
\(p_i\) is the accuracy of bin \(i\) (fraction of correct predictions).
\(c_i\) is the mean predicted confidence in bin \(i\).
\(b_i\) is the fraction of total samples falling into bin \(i\).
Bins are constructed either uniformly in the range \([0, 1]\) or adaptively (if adaptive=True).
- Parameters:
task (str) – Specifies the task type, either
"binary"
or"multiclass"
.adaptive (bool, optional) – Whether to use adaptive binning. Defaults to
False
.num_bins (int, optional) – Number of bins to divide the probability space. Defaults to
10
.norm (str, optional) – Specifies the type of norm to use:
"l1"
,"l2"
, or"max"
. Defaults to"l1"
.num_classes (int, optional) – Number of classes for
"multiclass"
tasks. Required when task is"multiclass"
.ignore_index (int, optional) – Index to ignore during calculations. Defaults to
None
.validate_args (bool, optional) – Whether to validate input arguments. Defaults to
True
.**kwargs – Additional keyword arguments for the metric.
Example:
from torch_uncertainty.metrics.classification.calibration_error import ( CalibrationError, ) # Example for binary classification predicted_probs = torch.tensor([0.9, 0.8, 0.3, 0.2]) true_labels = torch.tensor([1, 1, 0, 0]) metric = CalibrationError( task="binary", num_bins=5, norm="l1", adaptive=False, ) calibration_error = metric(predicted_probs, true_labels) print(f"Calibration Error: {calibration_error}") # Output: Calibration Error: 0.199
Note
Bins are either uniformly distributed in \([0, 1]\) or adaptively sized (if adaptive=True).
Warning
If task=”multiclass”, num_classes must be an integer; otherwise, a
TypeError
is raised.References
[1] Naeini et al. Obtaining well calibrated probabilities using Bayesian binning. In AAAI, 2015.
See also
See CalibrationError for details. Our version of the metric is a wrapper around the original metric providing a plotting functionality.