preprocessor#
Submodule for aggregating and preprocessing raw data from neural net layers.
Classes from this module implement computations of different variables that are collected during training.
Those include but are not limited to activations of neurons and norms of vectorized tensors appearing in neural net.
Those values can come from output, parameters or gradients of layers.
It is expected that, preprocessors are called from one of objects in monitorch.gatherer.
- class monitorch.preprocessor.AbstractBackwardPreprocessor[source]#
Bases:
AbstractPreprocessorBase class for all preprocessors that aggregate data obtain from backward pass.
Subclasses of
AbstractBackwardPreprocessorprocess gradients with respect to inputs or outputs of module.- abstract process_bw(name: str, module, grad_input, grad_output)[source]#
Processes backward pass data.
- Parameters:
name (str) – Name of the module, its data is processed
module (torch.nn.Module) – Module object from which the data is processed
grad_input (torch.Tensor) – Gradients with respect to input of module.
grad_output (torch.Tensor) – Gradients with respect to output of module.
- class monitorch.preprocessor.AbstractForwardPreprocessor[source]#
Bases:
AbstractPreprocessorBase class for all preprocessors that aggregate data obtain from forward pass.
Subclasses’ of
AbstractForwardPreprocessorprocess input and output of module. Expects module to take a single tensor and output a single tensor, hence feed-forward preprocessor.- abstract process_fw(name: str, module, layer_input, layer_output)[source]#
Processes forward pass data.
- Parameters:
name (str) – Name of the module which data is processed.
module (torch.nn.Module) – The module which inputs and outputs are processed.
layer_input (torch.Tensor) – Input to the module.
layer_output (torch.Tensor) – Output of the module.
- class monitorch.preprocessor.AbstractModulePreprocessor[source]#
Bases:
AbstractPreprocessorBase class for all preprocessors that process module on its own.
Does not restrict usage by requiring inputs, outputs or gradients of module.
- class monitorch.preprocessor.AbstractPreprocessor[source]#
Bases:
ABCBase class for all preprocessors.
- abstract property value: dict[str, Any]#
Value computed by preprocessor for all layers, that it is processing, identified by name.
- Returns:
Result of computations done from creation of preprocessor or last reset.
- Return type:
dict[str, Any]
- class monitorch.preprocessor.AbstractTensorPreprocessor[source]#
Bases:
AbstractPreprocessorBase class for all preprocessors that process single tensor.
Subclasses are mostly preprocessors that process gradient obtained during backward pass. Those preprocessors cannot be made
AbstractBackwardPreprocessor, because backward hooks are executed before gradients in tensors were updated.
- class monitorch.preprocessor.ExplicitCall(train_loss_str, non_train_loss_str)[source]#
Bases:
AbstractPreprocessorClass for accumulating data passed by explicit call.
Object of
ExplicitCallclass are provided byPyTorchInspectorto lenses as a foreign preprocessor.ExplicitCallimplements methods to interact directly with its data. Its primary usage is to track loss and other performance metrics forLossMetricslens.- Parameters:
train_loss_str (str) – String to save training loss under.
non_train_loss_str (str) – String to save development, validation or test loss under.
- state#
Aggregated data indexed by their names.
- Type:
dict[str, Any]
- train_loss_str#
String to save training loss under.
- Type:
str
- non_train_loss_str#
String to save non-training loss under.
- Type:
str
- push_loss(value: float, *, train: bool, running: bool = True)[source]#
A utility function to save loss.
A shorthand to choose whether loss is running and what name to push it under.
- Parameters:
value (float) – Value of loss to be saved.
train (bool) – Whether loss should be saved under
train_loss_strornon_train_loss_strrunning (bool) – Indicates if
push_running()orpush_memory()should used.
- push_memory(name: str, value) None[source]#
Appends value to container under name and creates a list if there is none.
- Parameters:
name (str) – Name under which the value will be saved.
value – The value to be saved.
- push_running(name: str, value: float) None[source]#
Appends value to container under name and creates a
RunningMeanVarif there is none.- Parameters:
name (str) – Name under which the value will be saved.
value – The value to be saved.
- property value: dict[str, Any]#
Value computed by preprocessor for all layers, that it is processing, identified by name.
- Returns:
Result of computations done from creation of preprocessor or last reset.
- Return type:
dict[str, Any]
- class monitorch.preprocessor.GradientActivation(death: bool, inplace: bool, eps: float = 1e-08)[source]#
Bases:
AbstractTensorPreprocessorPreprocessor class to compute gradient activaitions and death.
We define a neuron to be active if it has non-zero gradient at any datapoint in a batch iteration, it is dead otherwise. This preprocessor calcualtes death rate and activations over an epoch. Death rate is a proportion of dead neurons in each batch. It can be further aggregated into mean or median accross all batch iterations in an epoch.
- Parameters:
death (bool) – Flag indicating if death rate should be computed.
inplace (bool) – Flag indicating whether to collect data inplace using
RunningMeanVaror to stack them into a list.eps (float) – Numerical constant under which value is regarded as a zero.
- process_tensor(name: str, grad)[source]#
Computes activation and death rate on a gradient.
Transforms gradient into a boolean mask, applies
reduce_activation_to_activation_rates(). Activation rates are saved and used to compute death rate.- Parameters:
name (str) – Name of a source of gradient.
grad (torch.Tensor) – Gradient tensor to compute activations from.
- property value: dict[str, Any]#
See base class.
- class monitorch.preprocessor.GradientGeometry(correlation: bool, normalize: bool, inplace: bool, eps: float = 1e-08)[source]#
Bases:
AbstractTensorPreprocessorPreprocessor to keep track of parameters’ gradients.
Computes (normalized) L2 norm of gradient tensor. Optionally computes correlation between consecutive gradients for further gradient oscilations investigation, normalized to fit into [-1, 1] range.
- Parameters:
correlation (bool) – Indicator if correlation must be computed.
normalize (bool) – Indicator if gradient norm should be divided by square root of number of elements.
inplace (bool) – Flag indicating whether to collect data inplace using
RunningMeanVaror to stack them into a list.
- process_tensor(name: str, grad) None[source]#
Computes (normalized) L2 norm and optionally correlation with previous gradient.
The first gradient is taken to be 0.0 with norm 1.0.
- Parameters:
name (str) – Name of source of gradient.
grad (torch.Tensor) – Gradient tensor to be processed.
- property value: dict[str, Any]#
See base class.
- class monitorch.preprocessor.LossModule(inplace: bool, evaluation_from_grad: bool)[source]#
Bases:
AbstractForwardPreprocessorModule to record single value loss.
Aggregates loss from loss modules (i.e.
torch.nn.MSELossortorch.nn.NLLLoss). It can be accessed later.- Parameters:
inplace (bool) – Indicator if
RunningMeanVarorlistshould be used for aggregation.evaluation_from_grad (bool) – Flag indicating if evaluation passes should be considered from gradient or modele.training
- process_fw(name: str, module, layer_input, layer_output)[source]#
Saves loss passed as layer output.
- Parameters:
name (str) – Name of the module. Ignored.
module (torch.nn.Module) – The module object. Ignored.
layer_input (torch.Tensor) – Input to loss module. Ignored.
layer_output (torch.Tensor) – Loss tensor. Must have single element.
- Raises:
AttributeError – If layer_output has none or more than one elements.
- set_loss_strs(train_loss_str: str, non_train_loss_str: str)[source]#
Defines names for training and test/validation/development loss. Given strings will be used in
value()for indexing.- Parameters:
train_loss_str (str) – String used for training loss.
non_train_loss_str (str) – String used for test/validation/development loss.
- property value: dict[str, Any]#
See base class.
- class monitorch.preprocessor.OutputActivation(death: bool, inplace: bool, record_eval: bool, evaluation_from_grad: bool, eps: float = 1e-08, channel_last: bool = False)[source]#
Bases:
AbstractForwardPreprocessorPreprocessor to record activations of outputs.
We say that a neuron or a channel is activated if the output is non-zero (information is propagated forward). If neuron is not activated for all samples in a batch, we say it is dead. Death rate is a proportion of dead neurons against layer size.
- Parameters:
death (bool) – Indicator if death rate is to be collected.
inplace (bool) – Indicator if
RunningMeanVarorlistshould be used for aggregation.record_eval (bool) – Indicator if outputs during evaluation must be preprocessed.
evaluation_from_grad (bool) – Flag indicating if evaluation passes should be considered from gradient or modele.training
eps (float) – Numerical constant under which value is regarded as a zero.
channel_last (bool) – If
True, expects data in[batch, seq_len, ..., features]format where the feature/channel dimension is last (e.g. transformer outputs). IfFalse(default), expects PyTorch’s standard[batch, features, spatial_dims, ...]format.
- process_fw(name: str, module, layer_input, layer_output) None[source]#
Computes activation from layer output.
Flattens spatial dimensions, computes activations and saves each sample. Computes death rate if
death=Truewas set.- Parameters:
name (str) – Name of the module which outputs are processed.
module (torch.nn.Module) – Module object, its outputs are processed.
layer_input (torch.Tensor) – Should be input of layer, but it is ignored in this method.
layer_output (torch.Tensor) – Outputs to compute activations from.
- property value: dict[str, Any]#
See base class.
- class monitorch.preprocessor.OutputGradientGeometry(correlation: bool, normalize: bool, inplace: bool, eps: float = 1e-08)[source]#
Bases:
AbstractBackwardPreprocessorPreprocessor to keep track of outputs’ gradients.
Computes (normalized) L2 norm of gradient tensor. Optionally computes correlation between consecutive gradients for further gradient oscilations investigation, normalized to fit into [-1, 1] range.
- Parameters:
correlation (bool) – Indicator if correlation must be computed.
normalize (bool) – Indicator if gradient norm should be divided by square root of number of elements.
inplace (bool) – Flag indicating whether to collect data inplace using
RunningMeanVaror to stack them into a list.
- process_bw(name: str, module, grad_input, grad_output) None[source]#
Computes (normalized) L2 norm and optionally computes correlation with previous output’s gradient.
The first gradient is taken to be 0.0 with norm 1.0.
- Parameters:
name (str) – Name of the module which output’s gradients to record.
moduel (torch.nn.Module) – The module object. Ignored.
grad_input – Gradients with respect to input of layer. Ignored.
grad_output – Gradients with respect to outputs of layer. Assumes layer outputs single tensor, thus having single output gradient.
- property value: dict[str, Any]#
See base class.
- class monitorch.preprocessor.OutputNorm(normalize: bool, inplace: bool, record_eval: bool, evaluation_from_grad: bool, channel_last: bool = False)[source]#
Bases:
AbstractForwardPreprocessorPreprocessor to compute norms of outputs.
Flattens spatial and channel/neuron dimensions of output, computes L2 norm or RMS (if normalized) of flattened vectors and takes mean over a batch.
- Parameters:
normalize (bool) – Indicator if output norm should be normalized by square root of number of elements in single sample output.
inplace (bool) – Indicator if
RunningMeanVarorlistshould be used for aggregation.record_eval (bool) – Indicator if outputs during evaluation must be preprocessed.
evaluation_from_grad (bool) – Flag indicating if evaluation passes should be considered from gradient or modele.training
channel_last (bool) – If
True, expects data in[batch, seq_len, ..., features]format where the feature/channel dimension is last (e.g. transformer outputs). IfFalse(default), expects PyTorch’s standard[batch, features, spatial_dims, ...]format. The norm computation is equivalent in both cases since all non-batch dimensions are flattened before computing the L2 norm.
- process_fw(name: str, module, layer_input, layer_output)[source]#
Computes mean output norm.
Flattens spatial and channel dimensions, computes (normalized) norm of individual samples and saves their average.
- Parameters:
name (str) – Name of the module which outputs are processed.
module (torch.nn.Module) – Module object, its outputs are processed.
layer_input (torch.Tensor) – Should be input of layer, but it is ignored in this method.
layer_output (torch.Tensor) – Outputs to compute norm from.
- property value: dict[str, Any]#
Value computed by preprocessor for all layers, that it is processing, identified by name.
- Returns:
Result of computations done from creation of preprocessor or last reset.
- Return type:
dict[str, Any]
- class monitorch.preprocessor.ParameterDifferenceGeometry(correlation: bool, normalize: bool, inplace: bool, eps: float = 1e-08)[source]#
Bases:
AbstractTensorPreprocessorPreprocessor to keep track of parameters evolution with respect to preprocessor calls by inspecting it updates.
Main usage is to inspect optimizer update step behaviour.
Computes (normalized) L2 norm of parameter updates. Optionally computes correlation between consecutive parameter differences for further investigation, normalized to fit into [-1, 1] range.
- Parameters:
correlation (bool) – Indicator if correlation must be computed.
normalize (bool) – Indicator if gradient norm should be divided by square root of number of elements.
inplace (bool) – Flag indicating whether to collect data inplace using
RunningMeanVaror to stack them into a list.
- process_tensor(name: str, param: Tensor) None[source]#
Computes (normalized) L2 norm and optionally correlation with previous difference.
- Parameters:
name (str) – Name of source of parameter.
param (torch.Tensor) – Parameter tensor to be processed.
- property value: dict[str, Any]#
See base class.
- class monitorch.preprocessor.ParameterNorm(attrs: list[str], normalize: bool, inplace: bool)[source]#
Bases:
AbstractModulePreprocessorPreprocessor computing norms of parameters.
Computes norm of parameters listed in
attrs_for every module that is being passed to process module.- Parameters:
attrs (list[str]) – List of attributes for which norm will be computed.
normalize (bool) – Flag indicating whether norm should be normalized by tensor size. If true computes RMS of tensor values, L2-norm otherwise.
inplace (bool) – Flag indicating if
RunningMeanVarorlistwill be used.
- attrs_#
List of attributes to compute norm for.
- Type:
list[str]
- process_module(name: str, module)[source]#
Computes norms of all
attrs_.Uses
torch.linalg.vector_normto compute L2-norm of module’s attributes. Ifnormalizeis true, divides norm by a square root of number of elements in attributes.
- property value: OrderedDict[str, Any]#
See base class