rofunc.utils.visualab.segment.vlpart.swintransformer#
1. Module Contents#
1.1. Classes#
This module is used in RetinaNet to generate extra layers, P6 and P7 from C5 feature. |
|
Multilayer perceptron. |
|
Window based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window. Args: |
|
Swin Transformer Block. Args: |
|
Patch Merging Layer Args: |
|
A basic Swin Transformer layer for one stage. Args: |
|
Image to Patch Embedding Args: |
|
|
1.2. Functions#
|
|
|
|
1.3. Data#
1.4. API#
- class rofunc.utils.visualab.segment.vlpart.swintransformer.LastLevelP6P7_P5(in_channels, out_channels)[source]#
Bases:
torch.nn.ModuleThis module is used in RetinaNet to generate extra layers, P6 and P7 from C5 feature.
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class rofunc.utils.visualab.segment.vlpart.swintransformer.Mlp(in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.0)[source]#
Bases:
torch.nn.ModuleMultilayer perceptron.
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- rofunc.utils.visualab.segment.vlpart.swintransformer.window_partition(x, window_size)[source]#
- Args:
x: (B, H, W, C) window_size (int): window size
- Returns:
windows: (num_windows*B, window_size, window_size, C)
- rofunc.utils.visualab.segment.vlpart.swintransformer.window_reverse(windows, window_size, H, W)[source]#
- Args:
windows: (num_windows*B, window_size, window_size, C) window_size (int): Window size H (int): Height of image W (int): Width of image
- Returns:
x: (B, H, W, C)
- class rofunc.utils.visualab.segment.vlpart.swintransformer.WindowAttention(dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0)[source]#
Bases:
torch.nn.ModuleWindow based multi-head self attention (W-MSA) module with relative position bias. It supports both of shifted and non-shifted window. Args:
dim (int): Number of input channels. window_size (tuple[int]): The height and width of the window. num_heads (int): Number of attention heads. qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0 proj_drop (float, optional): Dropout ratio of output. Default: 0.0
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class rofunc.utils.visualab.segment.vlpart.swintransformer.SwinTransformerBlock(dim, num_heads, window_size=7, shift_size=0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, act_layer=nn.GELU, norm_layer=nn.LayerNorm)[source]#
Bases:
torch.nn.ModuleSwin Transformer Block. Args:
dim (int): Number of input channels. num_heads (int): Number of attention heads. window_size (int): Window size. shift_size (int): Shift size for SW-MSA. mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set. drop (float, optional): Dropout rate. Default: 0.0 attn_drop (float, optional): Attention dropout rate. Default: 0.0 drop_path (float, optional): Stochastic depth rate. Default: 0.0 act_layer (nn.Module, optional): Activation layer. Default: nn.GELU norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class rofunc.utils.visualab.segment.vlpart.swintransformer.PatchMerging(dim, norm_layer=nn.LayerNorm)[source]#
Bases:
torch.nn.ModulePatch Merging Layer Args:
dim (int): Number of input channels. norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class rofunc.utils.visualab.segment.vlpart.swintransformer.BasicLayer(dim, depth, num_heads, window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop=0.0, attn_drop=0.0, drop_path=0.0, norm_layer=nn.LayerNorm, downsample=None, use_checkpoint=False)[source]#
Bases:
torch.nn.ModuleA basic Swin Transformer layer for one stage. Args:
dim (int): Number of feature channels depth (int): Depths of this stage. num_heads (int): Number of attention head. window_size (int): Local window size. Default: 7. mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4. qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set. drop (float, optional): Dropout rate. Default: 0.0 attn_drop (float, optional): Attention dropout rate. Default: 0.0 drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0 norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class rofunc.utils.visualab.segment.vlpart.swintransformer.PatchEmbed(patch_size=4, in_chans=3, embed_dim=96, norm_layer=None)[source]#
Bases:
torch.nn.ModuleImage to Patch Embedding Args:
patch_size (int): Patch token size. Default: 4. in_chans (int): Number of input image channels. Default: 3. embed_dim (int): Number of linear projection output channels. Default: 96. norm_layer (nn.Module, optional): Normalization layer. Default: None
Initialization
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class rofunc.utils.visualab.segment.vlpart.swintransformer.SwinTransformer(pretrain_img_size=224, patch_size=4, in_chans=3, embed_dim=96, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, norm_layer=nn.LayerNorm, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), frozen_stages=-1, use_checkpoint=False)[source]#
Bases:
detectron2.modeling.backbone.backbone.Backbone- Swin Transformer backbone.
- A PyTorch impl ofSwin Transformer: Hierarchical Vision Transformer using Shifted Windows -
- Args:
- pretrain_img_size (int): Input image size for training the pretrained model,
used in absolute postion embedding. Default 224.
patch_size (int | tuple(int)): Patch size. Default: 4. in_chans (int): Number of input image channels. Default: 3. embed_dim (int): Number of linear projection output channels. Default: 96. depths (tuple[int]): Depths of each Swin Transformer stage. num_heads (tuple[int]): Number of attention head of each stage. window_size (int): Window size. Default: 7. mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4. qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. drop_rate (float): Dropout rate. attn_drop_rate (float): Attention dropout rate. Default: 0. drop_path_rate (float): Stochastic depth rate. Default: 0.2. norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm. ape (bool): If True, add absolute position embedding to the patch embedding. Default: False. patch_norm (bool): If True, add normalization after patch embedding. Default: True. out_indices (Sequence[int]): Output from which stages. frozen_stages (int): Stages to be frozen (stop grad and set eval mode).
-1 means not freezing any parameters.
use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.
Initialization
- rofunc.utils.visualab.segment.vlpart.swintransformer.size2config = None#