rofunc.utils.visualab.segment.vlpart.swintransformer#

1.  Module Contents#

1.1.  Classes#

LastLevelP6P7_P5

This module is used in RetinaNet to generate extra layers, P6 and P7 from C5 feature.

Mlp

Multilayer perceptron.

WindowAttention

Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window. Args:

SwinTransformerBlock

Swin Transformer Block. Args:

PatchMerging

Patch Merging Layer Args:

BasicLayer

A basic Swin Transformer layer for one stage. Args:

PatchEmbed

Image to Patch Embedding Args:

SwinTransformer

Swin Transformer backbone.
A PyTorch impl ofSwin Transformer: Hierarchical Vision Transformer using Shifted Windows -

https://arxiv.org/pdf/2103.14030

Args:
pretrain_img_size (int): Input image size for training the pretrained model,

used in absolute postion embedding. Default 224.

patch_size (int | tuple(int)): Patch size. Default: 4. in_chans (int): Number of input image channels. Default: 3. embed_dim (int): Number of linear projection output channels. Default: 96. depths (tuple[int]): Depths of each Swin Transformer stage. num_heads (tuple[int]): Number of attention head of each stage. window_size (int): Window size. Default: 7. mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4. qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. drop_rate (float): Dropout rate. attn_drop_rate (float): Attention dropout rate. Default: 0. drop_path_rate (float): Stochastic depth rate. Default: 0.2. norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm. ape (bool): If True, add absolute position embedding to the patch embedding. Default: False. patch_norm (bool): If True, add normalization after patch embedding. Default: True. out_indices (Sequence[int]): Output from which stages. frozen_stages (int): Stages to be frozen (stop grad and set eval mode).

-1 means not freezing any parameters.

use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.

1.2.  Functions#

window_partition

Args:

x: (B, H, W, C) window_size (int): window size

Returns:

windows: (num_windows*B, window_size, window_size, C)

window_reverse

Args:

windows: (num_windows*B, window_size, window_size, C) window_size (int): Window size H (int): Height of image W (int): Width of image

Returns:

x: (B, H, W, C)

build_swinbase_fpn_backbone

1.3.  Data#

size2config

1.4.  API#

class rofunc.utils.visualab.segment.vlpart.swintransformer.LastLevelP6P7_P5(in_channels, out_channels)[source]#

Bases: torch.nn.Module

This module is used in RetinaNet to generate extra layers, P6 and P7 from C5 feature.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(c5)[source]#
class rofunc.utils.visualab.segment.vlpart.swintransformer.Mlp(in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.0)[source]#

Bases: torch.nn.Module

Multilayer perceptron.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#
rofunc.utils.visualab.segment.vlpart.swintransformer.window_partition(x, window_size)[source]#
Args:

x: (B, H, W, C) window_size (int): window size

Returns:

windows: (num_windows*B, window_size, window_size, C)

rofunc.utils.visualab.segment.vlpart.swintransformer.window_reverse(windows, window_size, H, W)[source]#
Args:

windows: (num_windows*B, window_size, window_size, C) window_size (int): Window size H (int): Height of image W (int): Width of image

Returns:

x: (B, H, W, C)

class rofunc.utils.visualab.segment.vlpart.swintransformer.WindowAttention(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]#

Bases: torch.nn.Module

Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window. Args:

dim (int): Number of input channels. window_size (tuple[int]): The height and width of the window. num_heads (int): Number of attention heads. qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0 proj_drop (float, optional): Dropout ratio of output. Default: 0.0

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, mask=None)[source]#

Forward function. Args:

x: input features with shape of (num_windows*B, N, C) mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None

class rofunc.utils.visualab.segment.vlpart.swintransformer.SwinTransformerBlock(dim, num_heads, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=nn.GELU, norm_layer=nn.LayerNorm)[source]#

Bases: torch.nn.Module

Swin Transformer Block. Args:

dim (int): Number of input channels. num_heads (int): Number of attention heads. window_size (int): Window size. shift_size (int): Shift size for SW-MSA. mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set. drop (float, optional): Dropout rate. Default: 0.0 attn_drop (float, optional): Attention dropout rate. Default: 0.0 drop_path (float, optional): Stochastic depth rate. Default: 0.0 act_layer (nn.Module, optional): Activation layer. Default: nn.GELU norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, mask_matrix)[source]#

Forward function. Args:

x: Input feature, tensor size (B, H*W, C). H, W: Spatial resolution of the input feature. mask_matrix: Attention mask for cyclic shift.

class rofunc.utils.visualab.segment.vlpart.swintransformer.PatchMerging(dim, norm_layer=nn.LayerNorm)[source]#

Bases: torch.nn.Module

Patch Merging Layer Args:

dim (int): Number of input channels. norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, H, W)[source]#

Forward function. Args:

x: Input feature, tensor size (B, H*W, C). H, W: Spatial resolution of the input feature.

class rofunc.utils.visualab.segment.vlpart.swintransformer.BasicLayer(dim, depth, num_heads, window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_layer=nn.LayerNorm, downsample=None, use_checkpoint=False)[source]#

Bases: torch.nn.Module

A basic Swin Transformer layer for one stage. Args:

dim (int): Number of feature channels depth (int): Depths of this stage. num_heads (int): Number of attention head. window_size (int): Local window size. Default: 7. mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4. qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set. drop (float, optional): Dropout rate. Default: 0.0 attn_drop (float, optional): Attention dropout rate. Default: 0.0 drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0 norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, H, W)[source]#

Forward function. Args:

x: Input feature, tensor size (B, H*W, C). H, W: Spatial resolution of the input feature.

class rofunc.utils.visualab.segment.vlpart.swintransformer.PatchEmbed(patch_size=4, in_chans=3, embed_dim=96, norm_layer=None)[source]#

Bases: torch.nn.Module

Image to Patch Embedding Args:

patch_size (int): Patch token size. Default: 4. in_chans (int): Number of input image channels. Default: 3. embed_dim (int): Number of linear projection output channels. Default: 96. norm_layer (nn.Module, optional): Normalization layer. Default: None

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Forward function.

class rofunc.utils.visualab.segment.vlpart.swintransformer.SwinTransformer(pretrain_img_size=224, patch_size=4, in_chans=3, embed_dim=96, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, norm_layer=nn.LayerNorm, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), frozen_stages=-1, use_checkpoint=False)[source]#

Bases: detectron2.modeling.backbone.backbone.Backbone

Swin Transformer backbone.
A PyTorch impl ofSwin Transformer: Hierarchical Vision Transformer using Shifted Windows -

https://arxiv.org/pdf/2103.14030

Args:
pretrain_img_size (int): Input image size for training the pretrained model,

used in absolute postion embedding. Default 224.

patch_size (int | tuple(int)): Patch size. Default: 4. in_chans (int): Number of input image channels. Default: 3. embed_dim (int): Number of linear projection output channels. Default: 96. depths (tuple[int]): Depths of each Swin Transformer stage. num_heads (tuple[int]): Number of attention head of each stage. window_size (int): Window size. Default: 7. mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4. qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. drop_rate (float): Dropout rate. attn_drop_rate (float): Attention dropout rate. Default: 0. drop_path_rate (float): Stochastic depth rate. Default: 0.2. norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm. ape (bool): If True, add absolute position embedding to the patch embedding. Default: False. patch_norm (bool): If True, add normalization after patch embedding. Default: True. out_indices (Sequence[int]): Output from which stages. frozen_stages (int): Stages to be frozen (stop grad and set eval mode).

-1 means not freezing any parameters.

use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.

Initialization

init_weights(pretrained=None)[source]#

Initialize the weights in backbone. Args:

pretrained (str, optional): Path to pre-trained weights.

Defaults to None.

forward(x)[source]#

Forward function.

train(mode=True)[source]#

Convert the model into training mode while keep layers freezed.

rofunc.utils.visualab.segment.vlpart.swintransformer.size2config = None#
rofunc.utils.visualab.segment.vlpart.swintransformer.build_swinbase_fpn_backbone()[source]#