rofunc.utils.visualab.segment.vlpart.text_encoder#

1.  Module Contents#

1.1.  Classes#

LayerNorm

Subclass torch’s LayerNorm to handle fp16.

QuickGELU

ResidualAttentionBlock

Transformer

CLIPTEXT

1.2.  Functions#

build_text_encoder

1.3.  API#

class rofunc.utils.visualab.segment.vlpart.text_encoder.LayerNorm(normalized_shape: torch.nn.modules.normalization._shape_t, eps: float = 1e-05, elementwise_affine: bool = True, bias: bool = True, device=None, dtype=None)[source]#

Bases: torch.nn.LayerNorm

Subclass torch’s LayerNorm to handle fp16.

Initialization

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: torch.Tensor)[source]#
class rofunc.utils.visualab.segment.vlpart.text_encoder.QuickGELU(*args, **kwargs)[source]#

Bases: torch.nn.Module

forward(x: torch.Tensor)[source]#
class rofunc.utils.visualab.segment.vlpart.text_encoder.ResidualAttentionBlock(d_model: int, n_head: int, attn_mask: torch.Tensor = None)[source]#

Bases: torch.nn.Module

attention(x: torch.Tensor)[source]#
forward(x: torch.Tensor)[source]#
class rofunc.utils.visualab.segment.vlpart.text_encoder.Transformer(width: int, layers: int, heads: int, attn_mask: torch.Tensor = None)[source]#

Bases: torch.nn.Module

forward(x: torch.Tensor)[source]#
class rofunc.utils.visualab.segment.vlpart.text_encoder.CLIPTEXT(embed_dim=512, context_length=77, vocab_size=49408, transformer_width=512, transformer_heads=8, transformer_layers=12)[source]#

Bases: torch.nn.Module

initialize_parameters()[source]#
build_attention_mask()[source]#
property device#
property dtype#
tokenize(texts: Union[str, List[str]], context_length: int = 77) torch.LongTensor[source]#
encode_text(text)[source]#
forward(captions)[source]#

captions: list of strings

rofunc.utils.visualab.segment.vlpart.text_encoder.build_text_encoder(pretrain=True, visual_type='RN50')[source]#