preprocessor#

Submodule for aggregating and preprocessing raw data from neural net layers.

Classes from this module implement computations of different variables that are collected during training. Those include but are not limited to activations of neurons and norms of vectorized tensors appearing in neural net. Those values can come from output, parameters or gradients of layers. It is expected that, preprocessors are called from one of objects in monitorch.gatherer.

class monitorch.preprocessor.AbstractBackwardPreprocessor[source]#

Bases: AbstractPreprocessor

Base class for all preprocessors that aggregate data obtain from backward pass.

Subclasses of AbstractBackwardPreprocessor process gradients with respect to inputs or outputs of module.

abstract process_bw(name: str, module, grad_input, grad_output)[source]#

Processes backward pass data.

Parameters:

name (str) – Name of the module, its data is processed
module (torch.nn.Module) – Module object from which the data is processed
grad_input (torch.Tensor) – Gradients with respect to input of module.
grad_output (torch.Tensor) – Gradients with respect to output of module.

class monitorch.preprocessor.AbstractForwardPreprocessor[source]#

Bases: AbstractPreprocessor

Base class for all preprocessors that aggregate data obtain from forward pass.

Subclasses’ of AbstractForwardPreprocessor process input and output of module. Expects module to take a single tensor and output a single tensor, hence feed-forward preprocessor.

abstract process_fw(name: str, module, layer_input, layer_output)[source]#

Processes forward pass data.

Parameters:

name (str) – Name of the module which data is processed.
module (torch.nn.Module) – The module which inputs and outputs are processed.
layer_input (torch.Tensor) – Input to the module.
layer_output (torch.Tensor) – Output of the module.

class monitorch.preprocessor.AbstractModulePreprocessor[source]#

Bases: AbstractPreprocessor

Base class for all preprocessors that process module on its own.

Does not restrict usage by requiring inputs, outputs or gradients of module.

abstract process_module(name: str, module)[source]#

Processes module.

Parameters:

name (str) – Name of the module.
module (torch.nn.Module) – The module object.

class monitorch.preprocessor.AbstractPreprocessor[source]#

Bases: ABC

Base class for all preprocessors.

abstract finish_sync() → None[source]#

Start synchronization of the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

abstract reset() → None[source]#: Resets preprocessor for further computation

abstract start_sync(dst_rank: int = 0) → None[source]#

Start synchronization of the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

abstract property value: dict[str, Any]#

Value computed by preprocessor for all layers, that it is processing, identified by name.

Returns:: Result of computations done from creation of preprocessor or last reset.
Return type:: dict[str, Any]

class monitorch.preprocessor.AbstractTensorPreprocessor[source]#

Bases: AbstractPreprocessor

Base class for all preprocessors that process single tensor.

Subclasses are mostly preprocessors that process gradient obtained during backward pass. Those preprocessors cannot be made AbstractBackwardPreprocessor, because backward hooks are executed before gradients in tensors were updated.

abstract process_tensor(name, tensor)[source]#

Processes tensor.

Parameters:

name (str) – Name of the source of tensor
tensor (torch.Tensor) – Tensor to be processed

class monitorch.preprocessor.ExplicitCall(train_loss_str, non_train_loss_str)[source]#

Bases: AbstractPreprocessor

Class for accumulating data passed by explicit call.

Object of ExplicitCall class are provided by PyTorchInspector to lenses as a foreign preprocessor. ExplicitCall implements methods to interact directly with its data. Its primary usage is to track loss and other performance metrics for LossMetrics lens.

Parameters:

train_loss_str (str) – String to save training loss under.
non_train_loss_str (str) – String to save development, validation or test loss under.

state#

Aggregated data indexed by their names.

Type:: dict[str, Any]

train_loss_str#

String to save training loss under.

Type:: str

non_train_loss_str#

String to save non-training loss under.

Type:: str

finish_sync() → None[source]#: Finish syncing the data started by start_sync().

push_loss(value: float, *, train: bool, running: bool = True)[source]#

A utility function to save loss.

A shorthand to choose whether loss is running and what name to push it under.

Parameters:

value (float) – Value of loss to be saved.
train (bool) – Whether loss should be saved under train_loss_str or non_train_loss_str
running (bool) – Indicates if push_running() or push_memory() should used.

push_memory(name: str, value) → None[source]#

Appends value to container under name and creates a list if there is none.

Parameters:

name (str) – Name under which the value will be saved.
value – The value to be saved.

push_running(name: str, value: float) → None[source]#

Appends value to container under name and creates a RunningMeanVar if there is none.

Parameters:

name (str) – Name under which the value will be saved.
value – The value to be saved.

reset() → None[source]#: Resets preprocessor for further computation

start_sync(dst_rank: int = 0) → None[source]#

Syncs the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#

Value computed by preprocessor for all layers, that it is processing, identified by name.

Returns:: Result of computations done from creation of preprocessor or last reset.
Return type:: dict[str, Any]

class monitorch.preprocessor.GradientActivation(death: bool, inplace: bool, eps: float = 1e-08)[source]#

Bases: AbstractTensorPreprocessor

Preprocessor class to compute gradient activaitions and death.

We define a neuron to be active if it has non-zero gradient at any datapoint in a batch iteration, it is dead otherwise. This preprocessor calcualtes death rate and activations over an epoch. Death rate is a proportion of dead neurons in each batch. It can be further aggregated into mean or median accross all batch iterations in an epoch.

Parameters:

death (bool) – Flag indicating if death rate should be computed.
inplace (bool) – Flag indicating whether to collect data inplace using RunningMeanVar or to stack them into a list.
eps (float) – Numerical constant under which value is regarded as a zero.

finish_sync(dst_rank: int = 0) → None[source]#: Finishes syncing the data with the dst_rank.

process_tensor(name: str, grad)[source]#

Computes activation and death rate on a gradient.

Transforms gradient into a boolean mask, applies reduce_activation_to_activation_rates(). Activation rates are saved and used to compute death rate.

Parameters:

name (str) – Name of a source of gradient.
grad (torch.Tensor) – Gradient tensor to compute activations from.

reset() → None[source]#: See base class.

start_sync(dst_rank: int = 0) → None[source]#

Syncs the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#: See base class.

class monitorch.preprocessor.GradientGeometry(correlation: bool, normalize: bool, inplace: bool, eps: float = 1e-08)[source]#

Bases: AbstractTensorPreprocessor

Preprocessor to keep track of parameters’ gradients.

Computes (normalized) L2 norm of gradient tensor. Optionally computes correlation between consecutive gradients for further gradient oscilations investigation, normalized to fit into [-1, 1] range.

Parameters:

correlation (bool) – Indicator if correlation must be computed.
normalize (bool) – Indicator if gradient norm should be divided by square root of number of elements.
inplace (bool) – Flag indicating whether to collect data inplace using RunningMeanVar or to stack them into a list.

finish_sync() → None[source]#: Finishes syncronizing the data with the dst_rank.

process_tensor(name: str, grad) → None[source]#

Computes (normalized) L2 norm and optionally correlation with previous gradient.

The first gradient is taken to be 0.0 with norm 1.0.

Parameters:

name (str) – Name of source of gradient.
grad (torch.Tensor) – Gradient tensor to be processed.

reset() → None[source]#: See base class.

start_sync(dst_rank: int = 0) → None[source]#

Starts syncronizing the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#: See base class.

class monitorch.preprocessor.LossModule(inplace: bool, evaluation_from_grad: bool)[source]#

Bases: AbstractForwardPreprocessor

Module to record single value loss.

Aggregates loss from loss modules (i.e. torch.nn.MSELoss or torch.nn.NLLLoss). It can be accessed later.

Parameters:

inplace (bool) – Indicator if RunningMeanVar or list should be used for aggregation.
evaluation_from_grad (bool) – Flag indicating if evaluation passes should be considered from gradient or modele.training

finish_sync() → None[source]#

Finish synchronization the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

process_fw(name: str, module, layer_input, layer_output)[source]#

Saves loss passed as layer output.

Parameters:

name (str) – Name of the module. Ignored.
module (torch.nn.Module) – The module object. Ignored.
layer_input (torch.Tensor) – Input to loss module. Ignored.
layer_output (torch.Tensor) – Loss tensor. Must have single element.

Raises:

AttributeError – If layer_output has none or more than one elements.

reset() → None[source]#: See base class.

set_loss_strs(train_loss_str: str, non_train_loss_str: str)[source]#

Defines names for training and test/validation/development loss. Given strings will be used in value() for indexing.

Parameters:

train_loss_str (str) – String used for training loss.
non_train_loss_str (str) – String used for test/validation/development loss.

start_sync(dst_rank: int = 0) → None[source]#

Start synchronization the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#: See base class.

class monitorch.preprocessor.OutputActivation(death: bool, inplace: bool, record_eval: bool, evaluation_from_grad: bool, eps: float = 1e-08, channel_last: bool = False)[source]#

Bases: AbstractForwardPreprocessor

Preprocessor to record activations of outputs.

We say that a neuron or a channel is activated if the output is non-zero (information is propagated forward). If neuron is not activated for all samples in a batch, we say it is dead. Death rate is a proportion of dead neurons against layer size.

Parameters:

death (bool) – Indicator if death rate is to be collected.
inplace (bool) – Indicator if RunningMeanVar or list should be used for aggregation.
record_eval (bool) – Indicator if outputs during evaluation must be preprocessed.
evaluation_from_grad (bool) – Flag indicating if evaluation passes should be considered from gradient or modele.training
eps (float) – Numerical constant under which value is regarded as a zero.
channel_last (bool) – If True, expects data in [batch, seq_len, ..., features] format where the feature/channel dimension is last (e.g. transformer outputs). If False (default), expects PyTorch’s standard [batch, features, spatial_dims, ...] format.

finish_sync(dst_rank: int = 0) → None[source]#: Finishes syncing the data with the dst_rank.

process_fw(name: str, module, layer_input, layer_output) → None[source]#

Computes activation from layer output.

Flattens spatial dimensions, computes activations and saves each sample. Computes death rate if death=True was set.

Parameters:

name (str) – Name of the module which outputs are processed.
module (torch.nn.Module) – Module object, its outputs are processed.
layer_input (torch.Tensor) – Should be input of layer, but it is ignored in this method.
layer_output (torch.Tensor) – Outputs to compute activations from.

reset() → None[source]#: See base class.

start_sync(dst_rank: int = 0) → None[source]#

Syncs the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#: See base class.

class monitorch.preprocessor.OutputGradientGeometry(correlation: bool, normalize: bool, inplace: bool, eps: float = 1e-08)[source]#

Bases: AbstractBackwardPreprocessor

Preprocessor to keep track of outputs’ gradients.

Computes (normalized) L2 norm of gradient tensor. Optionally computes correlation between consecutive gradients for further gradient oscilations investigation, normalized to fit into [-1, 1] range.

Parameters:

correlation (bool) – Indicator if correlation must be computed.
normalize (bool) – Indicator if gradient norm should be divided by square root of number of elements.
inplace (bool) – Flag indicating whether to collect data inplace using RunningMeanVar or to stack them into a list.

finish_sync() → None[source]#: Finishes syncronizing the data with the dst_rank.

process_bw(name: str, module, grad_input, grad_output) → None[source]#

Computes (normalized) L2 norm and optionally computes correlation with previous output’s gradient.

The first gradient is taken to be 0.0 with norm 1.0.

Parameters:

name (str) – Name of the module which output’s gradients to record.
moduel (torch.nn.Module) – The module object. Ignored.
grad_input – Gradients with respect to input of layer. Ignored.
grad_output – Gradients with respect to outputs of layer. Assumes layer outputs single tensor, thus having single output gradient.

reset() → None[source]#: See base class.

start_sync(dst_rank: int = 0) → None[source]#

Starts syncronizing the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#: See base class.

class monitorch.preprocessor.OutputNorm(normalize: bool, inplace: bool, record_eval: bool, evaluation_from_grad: bool, channel_last: bool = False)[source]#

Bases: AbstractForwardPreprocessor

Preprocessor to compute norms of outputs.

Flattens spatial and channel/neuron dimensions of output, computes L2 norm or RMS (if normalized) of flattened vectors and takes mean over a batch.

Parameters:

normalize (bool) – Indicator if output norm should be normalized by square root of number of elements in single sample output.
inplace (bool) – Indicator if RunningMeanVar or list should be used for aggregation.
record_eval (bool) – Indicator if outputs during evaluation must be preprocessed.
evaluation_from_grad (bool) – Flag indicating if evaluation passes should be considered from gradient or modele.training
channel_last (bool) – If True, expects data in [batch, seq_len, ..., features] format where the feature/channel dimension is last (e.g. transformer outputs). If False (default), expects PyTorch’s standard [batch, features, spatial_dims, ...] format. The norm computation is equivalent in both cases since all non-batch dimensions are flattened before computing the L2 norm.

finish_sync() → None[source]#

Finish syncing the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

process_fw(name: str, module, layer_input, layer_output)[source]#

Computes mean output norm.

Flattens spatial and channel dimensions, computes (normalized) norm of individual samples and saves their average.

Parameters:

name (str) – Name of the module which outputs are processed.
module (torch.nn.Module) – Module object, its outputs are processed.
layer_input (torch.Tensor) – Should be input of layer, but it is ignored in this method.
layer_output (torch.Tensor) – Outputs to compute norm from.

reset() → None[source]#: Resets preprocessor for further computation

start_sync(dst_rank: int = 0) → None[source]#

Syncs the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#

Value computed by preprocessor for all layers, that it is processing, identified by name.

Returns:: Result of computations done from creation of preprocessor or last reset.
Return type:: dict[str, Any]

class monitorch.preprocessor.ParameterDifferenceGeometry(correlation: bool, normalize: bool, inplace: bool, eps: float = 1e-08)[source]#

Bases: AbstractTensorPreprocessor

Preprocessor to keep track of parameters evolution with respect to preprocessor calls by inspecting it updates.

Main usage is to inspect optimizer update step behaviour.

Computes (normalized) L2 norm of parameter updates. Optionally computes correlation between consecutive parameter differences for further investigation, normalized to fit into [-1, 1] range.

Parameters:

correlation (bool) – Indicator if correlation must be computed.
normalize (bool) – Indicator if gradient norm should be divided by square root of number of elements.
inplace (bool) – Flag indicating whether to collect data inplace using RunningMeanVar or to stack them into a list.

finish_sync() → None[source]#: Finishes syncronizing the data with the dst_rank.

process_tensor(name: str, param: Tensor) → None[source]#

Computes (normalized) L2 norm and optionally correlation with previous difference.

Parameters:

name (str) – Name of source of parameter.
param (torch.Tensor) – Parameter tensor to be processed.

reset() → None[source]#: See base class.

start_sync(dst_rank: int = 0) → None[source]#

Starts syncronizing the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: dict[str, Any]#: See base class.

class monitorch.preprocessor.ParameterNorm(attrs: list[str], normalize: bool, inplace: bool)[source]#

Bases: AbstractModulePreprocessor

Preprocessor computing norms of parameters.

Computes norm of parameters listed in attrs_ for every module that is being passed to process module.

Parameters:

attrs (list[str]) – List of attributes for which norm will be computed.
normalize (bool) – Flag indicating whether norm should be normalized by tensor size. If true computes RMS of tensor values, L2-norm otherwise.
inplace (bool) – Flag indicating if RunningMeanVar or list will be used.

attrs_#

List of attributes to compute norm for.

Type:: list[str]

finish_sync() → None[source]#

Starts synchronization the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

process_module(name: str, module)[source]#

Computes norms of all attrs_.

Uses torch.linalg.vector_norm to compute L2-norm of module’s attributes. If normalize is true, divides norm by a square root of number of elements in attributes.

reset() → None[source]#: See base class

start_sync(dst_rank: int = 0) → None[source]#

Starts synchronization the data with the dst_rank.

Parameters:: dst_rank (int = 0) – Master rank to gather data at.

property value: OrderedDict[str, Any]#: See base class

preprocessor#

This Page