PackedLinear#

class torch_uncertainty.layers.PackedLinear(in_features, out_features, alpha, num_estimators, gamma=1, bias=True, first=False, last=False, implementation='einsum', device=None, dtype=None)[source]#

Packed-Ensembles-style Linear layer.

This layer computes fully-connected operation for a given number of estimators (num_estimators).

Parameters:

in_features (int) – Number of input features of the linear layer.
out_features (int) – Number of channels produced by the linear layer.
alpha (float) – The width multiplier of the linear layer.
num_estimators (int) – The number of estimators grouped in the layer.
gamma (int, optional) – Defaults to 1.
bias (bool, optional) – It True, adds a learnable bias to the output. Defaults to True.
first (bool, optional) – Whether this is the first layer of the network. Defaults to False.
last (bool, optional) – Whether this is the last layer of the network. Defaults to False.
implementation (str, optional) –
The implementation to use. Available implementations:
- "conv1d": The conv1d implementation of the linear layer.
- "sparse": The sparse implementation of the linear layer.
- "full": The full implementation of the linear layer.
- "einsum" (default): The einsum implementation of the linear layer.
device (torch.device, optional) – The device to use for the layer’s parameters. Defaults to None.
dtype (torch.dtype, optional) – The dtype to use for the layer’s parameters. Defaults to None.

Shape:

Input:
- If first is True: \((B, \ast, H_{\text{in}})\) where \(B\) is the batch size, \(\ast\) means any number of additional dimensions and \(H_{\text{in}}=\text{in\_features}\).
- Otherwise: \((B, \ast, H_{\text{in}} \times \alpha)\)
Output:
- If last is True: \((B, \ast, H_{\text{out}}\times M)\) where \(H_{\text{out}}=\text{out\_features}\) and \(M=\text{num\_estimators}\).
- Otherwise: \((B, \ast, H_{\text{out}} \times \alpha)\)

Explanation Note:

Increasing alpha will increase the number of channels of the ensemble, increasing its representation capacity. Increasing gamma will increase the number of groups in the network and therefore reduce the number of parameters.

Note

Each ensemble member will only see \(\frac{\text{in_features}}{\text{num_estimators}}\) features, so when using gamma you should make sure that in_features and out_features are both divisible by n_estimators \(\times\)gamma. However, the number of input and output features will be changed to comply with this constraint.

PackedLinear#

This Page