rofunc.learning.utils.networks

`rofunc.learning.utils.networks`#

1. Module Contents#

1.1. Classes#

`SqueezeLayer`	Torch module that squeezes a B*1 tensor down into a size-B vector.
`BaseNorm`	Base class for layers that try to normalize the input to mean 0 and variance 1. Similar to BatchNorm, LayerNorm, etc. but whereas they only use statistics from the current batch at train time, we use statistics from all batches.
`RunningNorm`	Normalizes input to mean 0 and standard deviation 1 using a running average. Similar to BatchNorm, LayerNorm, etc. but whereas they only use statistics from the current batch at train time, we use statistics from all batches. This should closely replicate the common practice in RL of normalizing environment observations, such as using VecNormalize in Stable Baselines.
`EMANorm`	Similar to RunningNorm but uses an exponential weighting.

1.2. Functions#

`build_mlp`	Constructs a Torch MLP. Args:
`build_cnn`	Constructs a Torch CNN. Args:

1.3. API#

class rofunc.learning.utils.networks.SqueezeLayer(*args, **kwargs)[source]#

Bases: torch.nn.Module

Torch module that squeezes a B*1 tensor down into a size-B vector.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

class rofunc.learning.utils.networks.BaseNorm(num_features: int, eps: float = 1e-05)[source]#

Bases: torch.nn.Module, abc.ABC

Base class for layers that try to normalize the input to mean 0 and variance 1. Similar to BatchNorm, LayerNorm, etc. but whereas they only use statistics from the current batch at train time, we use statistics from all batches.

Initialization

Builds RunningNorm. Args:

num_features: Number of features; the length of the non-batch dimension. eps: Small constant for numerical stability. Inputs are rescaled by

1 / sqrt(estimated_variance + eps).

running_mean: torch.Tensor = None#

running_var: torch.Tensor = None#

count: torch.Tensor = None#

reset_running_stats() → None[source]#: Resets running stats to defaults, yielding the identity transformation.

forward(x: torch.Tensor) → torch.Tensor[source]#: Updates statistics if in training mode. Returns normalized x.

abstract update_stats(batch: torch.Tensor) → None[source]#: Update self.running_mean, self.running_var and self.count.

class rofunc.learning.utils.networks.RunningNorm(num_features: int, eps: float = 1e-05)[source]#

Bases: rofunc.learning.utils.networks.BaseNorm

Normalizes input to mean 0 and standard deviation 1 using a running average. Similar to BatchNorm, LayerNorm, etc. but whereas they only use statistics from the current batch at train time, we use statistics from all batches. This should closely replicate the common practice in RL of normalizing environment observations, such as using VecNormalize in Stable Baselines.

Initialization

Builds RunningNorm. Args:

num_features: Number of features; the length of the non-batch dimension. eps: Small constant for numerical stability. Inputs are rescaled by

1 / sqrt(estimated_variance + eps).

update_stats(batch: torch.Tensor) → None[source]#: Update self.running_mean, self.running_var and self.count. Uses Chan et al (1979), “Updating Formulae and a Pairwise Algorithm for Computing Sample Variances.” to update the running moments in a numerically stable fashion. Args:

batch: A batch of data to use to update the running mean and variance.

class rofunc.learning.utils.networks.EMANorm(num_features: int, decay: float = 0.99, eps: float = 1e-05)[source]#

Bases: rofunc.learning.utils.networks.BaseNorm

Similar to RunningNorm but uses an exponential weighting.

Initialization

Builds EMARunningNorm. Args:

num_features: Number of features; the length of the non-batch dim. decay: how quickly the weight on past samples decays over time. eps: small constant for numerical stability.

Raises:: ValueError: if decay is out of range.

inv_learning_rate: torch.Tensor = None#

num_batches: torch.IntTensor = None#

reset_running_stats()[source]#: Reset the running stats of the normalization layer.

update_stats(batch: torch.Tensor) → None[source]#: Update self.running_mean and self.running_var in batch mode. Reference Algorithm 3 from: HumanCompatibleAI/imitation Args:

batch: A batch of data to use to update the running mean and variance.

rofunc.learning.utils.networks.build_mlp(in_size: int, hid_sizes: Iterable[int], out_size: int = 1, name: Optional[str] = None, activation: Type[torch.nn.Module] = nn.ReLU, dropout_prob: float = 0.0, squeeze_output: bool = False, flatten_input: bool = False, normalize_input_layer: Optional[Type[torch.nn.Module]] = None) → torch.nn.Module[source]#

Constructs a Torch MLP. Args:

in_size: size of individual input vectors; input to the MLP will be of
shape (batch_size, in_size).

hid_sizes: sizes of hidden layers. If this is an empty iterable, then we build
a linear function approximator.

out_size: size of output vector. name: Name to use as a prefix for the layers ID. activation: activation to apply after hidden layers. dropout_prob: Dropout probability to use after each hidden layer. If 0,

no dropout layers are added to the network.

squeeze_output: if out_size=1, then squeeze_input=True ensures that MLP
output is of size (B,) instead of (B,1).

flatten_input: should input be flattened along axes 1, 2, 3, …? Useful
if you want to, e.g., process small images inputs with an MLP.

normalize_input_layer: if specified, module to use to normalize inputs;
e.g. nn.BatchNorm or RunningNorm.

Returns:

nn.Module: an MLP mapping from inputs of size (batch_size, in_size) to: (batch_size, out_size), unless out_size=1 and squeeze_output=True, in which case the output is of size (batch_size, ).

Raises:

ValueError: if squeeze_output was supplied with out_size!=1.

rofunc.learning.utils.networks.build_cnn(in_channels: int, hid_channels: Iterable[int], out_size: int = 1, name: Optional[str] = None, activation: Type[torch.nn.Module] = nn.ReLU, kernel_size: int = 3, stride: int = 1, padding: Union[int, str] = 'same', dropout_prob: float = 0.0, squeeze_output: bool = False) → torch.nn.Module[source]#

Constructs a Torch CNN. Args:

in_channels: number of channels of individual inputs; input to the CNN will have
shape (batch_size, in_size, in_height, in_width).

hid_channels: number of channels of hidden layers. If this is an empty iterable,
then we build a linear function approximator.

out_size: size of output vector. name: Name to use as a prefix for the layers ID. activation: activation to apply after hidden layers. kernel_size: size of convolutional kernels. stride: stride of convolutional kernels. padding: padding of convolutional kernels. dropout_prob: Dropout probability to use after each hidden layer. If 0,

no dropout layers are added to the network.

squeeze_output: if out_size=1, then squeeze_input=True ensures that CNN
output is of size (B,) instead of (B,1).

Returns:

nn.Module: a CNN mapping from inputs of size (batch_size, in_size, in_height,: in_width) to (batch_size, out_size), unless out_size=1 and squeeze_output=True, in which case the output is of size (batch_size, ).

Raises:

ValueError: if squeeze_output was supplied with out_size!=1.

rofunc.learning.utils.networks

Contents

rofunc.learning.utils.networks#

1. Module Contents#

1.1. Classes#

1.2. Functions#

1.3. API#

`rofunc.learning.utils.networks`#