Custom Layers and Utilities
Last updated
Last updated
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
( nfnx )
Parameters
nf (int
) — The number of output features.
nx (int
) — The number of input features.
1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).
Basically works like a linear layer but the weights are transposed.
( config: PretrainedConfig )
Parameters
Compute SQuAD start logits from sequence hidden states.
forward
( hidden_states: FloatTensorp_mask: typing.Optional[torch.FloatTensor] = None ) → torch.FloatTensor
Parameters
hidden_states (torch.FloatTensor
of shape (batch_size, seq_len, hidden_size)
) — The final hidden states of the model.
p_mask (torch.FloatTensor
of shape (batch_size, seq_len)
, optional) — Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.
Returns
torch.FloatTensor
The start logits for SQuAD.
( config: PretrainedConfig )
Parameters
Compute SQuAD end logits from sequence hidden states.
forward
( hidden_states: FloatTensorstart_states: typing.Optional[torch.FloatTensor] = Nonestart_positions: typing.Optional[torch.LongTensor] = Nonep_mask: typing.Optional[torch.FloatTensor] = None ) → torch.FloatTensor
Parameters
hidden_states (torch.FloatTensor
of shape (batch_size, seq_len, hidden_size)
) — The final hidden states of the model.
start_states (torch.FloatTensor
of shape (batch_size, seq_len, hidden_size)
, optional) — The hidden states of the first tokens for the labeled span.
start_positions (torch.LongTensor
of shape (batch_size,)
, optional) — The position of the first token for the labeled span.
p_mask (torch.FloatTensor
of shape (batch_size, seq_len)
, optional) — Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.
Returns
torch.FloatTensor
The end logits for SQuAD.
One of start_states
or start_positions
should be not None
. If both are set, start_positions
overrides start_states
.
( config )
Parameters
Compute SQuAD 2.0 answer class from classification and start tokens hidden states.
forward
( hidden_states: FloatTensorstart_states: typing.Optional[torch.FloatTensor] = Nonestart_positions: typing.Optional[torch.LongTensor] = Nonecls_index: typing.Optional[torch.LongTensor] = None ) → torch.FloatTensor
Parameters
hidden_states (torch.FloatTensor
of shape (batch_size, seq_len, hidden_size)
) — The final hidden states of the model.
start_states (torch.FloatTensor
of shape (batch_size, seq_len, hidden_size)
, optional) — The hidden states of the first tokens for the labeled span.
start_positions (torch.LongTensor
of shape (batch_size,)
, optional) — The position of the first token for the labeled span.
cls_index (torch.LongTensor
of shape (batch_size,)
, optional) — Position of the CLS token for each sentence in the batch. If None
, takes the last token.
Returns
torch.FloatTensor
The SQuAD 2.0 answer class.
One of start_states
or start_positions
should be not None
. If both are set, start_positions
overrides start_states
.
( loss: typing.Optional[torch.FloatTensor] = Nonestart_top_log_probs: typing.Optional[torch.FloatTensor] = Nonestart_top_index: typing.Optional[torch.LongTensor] = Noneend_top_log_probs: typing.Optional[torch.FloatTensor] = Noneend_top_index: typing.Optional[torch.LongTensor] = Nonecls_logits: typing.Optional[torch.FloatTensor] = None )
Parameters
loss (torch.FloatTensor
of shape (1,)
, optional, returned if both start_positions
and end_positions
are provided) — Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.
start_top_log_probs (torch.FloatTensor
of shape (batch_size, config.start_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Log probabilities for the top config.start_n_top start token possibilities (beam-search).
start_top_index (torch.LongTensor
of shape (batch_size, config.start_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Indices for the top config.start_n_top start token possibilities (beam-search).
end_top_log_probs (torch.FloatTensor
of shape (batch_size, config.start_n_top * config.end_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Log probabilities for the top config.start_n_top * config.end_n_top
end token possibilities (beam-search).
end_top_index (torch.LongTensor
of shape (batch_size, config.start_n_top * config.end_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Indices for the top config.start_n_top * config.end_n_top
end token possibilities (beam-search).
cls_logits (torch.FloatTensor
of shape (batch_size,)
, optional, returned if start_positions
or end_positions
is not provided) — Log probabilities for the is_impossible
label of the answers.
( config )
Parameters
A SQuAD head inspired by XLNet.
forward
Parameters
hidden_states (torch.FloatTensor
of shape (batch_size, seq_len, hidden_size)
) — Final hidden states of the model on the sequence tokens.
start_positions (torch.LongTensor
of shape (batch_size,)
, optional) — Positions of the first token for the labeled span.
end_positions (torch.LongTensor
of shape (batch_size,)
, optional) — Positions of the last token for the labeled span.
cls_index (torch.LongTensor
of shape (batch_size,)
, optional) — Position of the CLS token for each sentence in the batch. If None
, takes the last token.
is_impossible (torch.LongTensor
of shape (batch_size,)
, optional) — Whether the question has a possible answer in the paragraph or not.
p_mask (torch.FloatTensor
of shape (batch_size, seq_len)
, optional) — Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.
Returns
loss (torch.FloatTensor
of shape (1,)
, optional, returned if both start_positions
and end_positions
are provided) — Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.
start_top_log_probs (torch.FloatTensor
of shape (batch_size, config.start_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Log probabilities for the top config.start_n_top start token possibilities (beam-search).
start_top_index (torch.LongTensor
of shape (batch_size, config.start_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Indices for the top config.start_n_top start token possibilities (beam-search).
end_top_log_probs (torch.FloatTensor
of shape (batch_size, config.start_n_top * config.end_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Log probabilities for the top config.start_n_top * config.end_n_top
end token possibilities (beam-search).
end_top_index (torch.LongTensor
of shape (batch_size, config.start_n_top * config.end_n_top)
, optional, returned if start_positions
or end_positions
is not provided) — Indices for the top config.start_n_top * config.end_n_top
end token possibilities (beam-search).
cls_logits (torch.FloatTensor
of shape (batch_size,)
, optional, returned if start_positions
or end_positions
is not provided) — Log probabilities for the is_impossible
label of the answers.
( config: PretrainedConfig )
Parameters
summary_type (str
) — The method to use to make this summary. Accepted values are:
"last"
— Take the last token hidden state (like XLNet)
"first"
— Take the first token hidden state (like Bert)
"mean"
— Take the mean of all tokens hidden states
"cls_index"
— Supply a Tensor of classification token position (GPT/GPT-2)
"attn"
— Not implemented now, use multi-head attention
summary_use_proj (bool
) — Add a projection after the vector extraction.
summary_proj_to_labels (bool
) — If True
, the projection outputs to config.num_labels
classes (otherwise to config.hidden_size
).
summary_activation (Optional[str]
) — Set to "tanh"
to add a tanh activation to the output, another string or None
will add no activation.
summary_first_dropout (float
) — Optional dropout probability before the projection and activation.
summary_last_dropout (float
)— Optional dropout probability after the projection and activation.
Compute a single vector summary of a sequence hidden states.
forward
( hidden_states: FloatTensorcls_index: typing.Optional[torch.LongTensor] = None ) → torch.FloatTensor
Parameters
hidden_states (torch.FloatTensor
of shape [batch_size, seq_len, hidden_size]
) — The hidden states of the last layer.
cls_index (torch.LongTensor
of shape [batch_size]
or [batch_size, ...]
where … are optional leading dimensions of hidden_states
, optional) — Used if summary_type == "cls_index"
and takes the last token of the sequence as classification token.
Returns
torch.FloatTensor
The summary of the sequence hidden states.
Compute a single vector summary of a sequence hidden states.
transformers.apply_chunking_to_forward
( forward_fn: typing.Callable[..., torch.Tensor]chunk_size: intchunk_dim: int*input_tensors ) → torch.Tensor
Parameters
forward_fn (Callable[..., torch.Tensor]
) — The forward function of the model.
chunk_size (int
) — The chunk size of a chunked tensor: num_chunks = len(input_tensors[0]) / chunk_size
.
chunk_dim (int
) — The dimension over which the input_tensors
should be chunked.
input_tensors (Tuple[torch.Tensor]
) — The input tensors of forward_fn
which will be chunked
Returns
torch.Tensor
A tensor with the same shape as the forward_fn
would have given if applied`.
This function chunks the input_tensors
into smaller input tensor parts of size chunk_size
over the dimension chunk_dim
. It then applies a layer forward_fn
to each chunk independently to save memory.
If the forward_fn
is independent across the chunk_dim
this function will yield the same result as directly applying forward_fn
to input_tensors
.
Examples:
Copied
transformers.pytorch_utils.find_pruneable_heads_and_indices
( heads: typing.List[int]n_heads: inthead_size: intalready_pruned_heads: typing.Set[int] ) → Tuple[Set[int], torch.LongTensor]
Parameters
heads (List[int]
) — List of the indices of heads to prune.
n_heads (int
) — The number of heads in the model.
head_size (int
) — The size of each head.
already_pruned_heads (Set[int]
) — A set of already pruned heads.
Returns
Tuple[Set[int], torch.LongTensor]
A tuple with the indices of heads to prune taking already_pruned_heads
into account and the indices of rows/columns to keep in the layer weight.
Finds the heads and their indices taking already_pruned_heads
into account.
transformers.prune_layer
Parameters
layer (Union[torch.nn.Linear, Conv1D]
) — The layer to prune.
index (torch.LongTensor
) — The indices to keep in the layer.
dim (int
, optional) — The dimension on which to keep the indices.
Returns
The pruned layer as a new layer with requires_grad=True
.
Prune a Conv1D or linear layer to keep only entries in index.
Used to remove heads.
transformers.pytorch_utils.prune_conv1d_layer
Parameters
index (torch.LongTensor
) — The indices to keep in the layer.
dim (int
, optional, defaults to 1) — The dimension on which to keep the indices.
Returns
The pruned layer as a new layer with requires_grad=True
.
Prune a Conv1D layer to keep only entries in index. A Conv1D work as a Linear layer (see e.g. BERT) but the weights are transposed.
Used to remove heads.
transformers.pytorch_utils.prune_linear_layer
( layer: Linearindex: LongTensordim: int = 0 ) → torch.nn.Linear
Parameters
layer (torch.nn.Linear
) — The layer to prune.
index (torch.LongTensor
) — The indices to keep in the layer.
dim (int
, optional, defaults to 0) — The dimension on which to keep the indices.
Returns
torch.nn.Linear
The pruned layer as a new layer with requires_grad=True
.
Prune a linear layer to keep only entries in index.
Used to remove heads.
( *args**kwargs )
Parameters
nf (int
) — The number of output features.
nx (int
) — The number of input features.
initializer_range (float
, optional, defaults to 0.02) — The standard deviation to use to initialize the weights.
kwargs (Dict[str, Any]
, optional) — Additional keyword arguments passed along to the __init__
of tf.keras.layers.Layer
.
1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).
Basically works like a linear layer but the weights are transposed.
( *args**kwargs )
Parameters
summary_type (str
) — The method to use to make this summary. Accepted values are:
"last"
— Take the last token hidden state (like XLNet)
"first"
— Take the first token hidden state (like Bert)
"mean"
— Take the mean of all tokens hidden states
"cls_index"
— Supply a Tensor of classification token position (GPT/GPT-2)
"attn"
— Not implemented now, use multi-head attention
summary_use_proj (bool
) — Add a projection after the vector extraction.
summary_proj_to_labels (bool
) — If True
, the projection outputs to config.num_labels
classes (otherwise to config.hidden_size
).
summary_activation (Optional[str]
) — Set to "tanh"
to add a tanh activation to the output, another string or None
will add no activation.
summary_first_dropout (float
) — Optional dropout probability before the projection and activation.
summary_last_dropout (float
)— Optional dropout probability after the projection and activation.
initializer_range (float
, defaults to 0.02) — The standard deviation to use to initialize the weights.
kwargs (Dict[str, Any]
, optional) — Additional keyword arguments passed along to the __init__
of tf.keras.layers.Layer
.
Compute a single vector summary of a sequence hidden states.
( )
Loss function suitable for causal language modeling (CLM), that is, the task of guessing the next token.
Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.
( )
Loss function suitable for masked language modeling (MLM), that is, the task of guessing the masked tokens.
Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.
( )
Loss function suitable for multiple choice tasks.
( )
Loss function suitable for question answering.
( )
Loss function suitable for sequence classification.
( )
Loss function suitable for token classification.
Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.
transformers.modeling_tf_utils.get_initializer
( initializer_range: float = 0.02 ) → tf.keras.initializers.TruncatedNormal
Parameters
initializer_range (float, defaults to 0.02) — Standard deviation of the initializer range.
Returns
tf.keras.initializers.TruncatedNormal
The truncated normal initializer.
Creates a tf.keras.initializers.TruncatedNormal
with the given range.
transformers.modeling_tf_utils.keras_serializable
( )
Parameters
cls (a tf.keras.layers.Layers subclass
) — Typically a TF.MainLayer
class in this project, in general must accept a config
argument to its initializer.
Decorate a Keras Layer class to support Keras serialization.
This is done by:
Adding a transformers_config
dict to the Keras config dictionary in get_config
(called by Keras at serialization time.
Wrapping __init__
to accept that transformers_config
dict (passed by Keras at deserialization time) and convert it to a config object for the actual layer initializer.
Registering the class as a custom object in Keras (if the Tensorflow version supports this), so that it does not need to be supplied in custom_objects
in the call to tf.keras.models.load_model
.
transformers.shape_list
( tensor: typing.Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray] ) → List[int]
Parameters
tensor (tf.Tensor
or np.ndarray
) — The tensor we want the shape of.
Returns
List[int]
The shape of the tensor as a list.
Deal with dynamic shape in tensorflow cleanly.
config () — The config used by the model, will be used to grab the hidden_size
of the model.
config () — The config used by the model, will be used to grab the hidden_size
of the model and the layer_norm_eps
to use.
config () — The config used by the model, will be used to grab the hidden_size
of the model.
Base class for outputs of question answering models using a .
config () — The config used by the model, will be used to grab the hidden_size
of the model and the layer_norm_eps
to use.
( hidden_states: FloatTensorstart_positions: typing.Optional[torch.LongTensor] = Noneend_positions: typing.Optional[torch.LongTensor] = Nonecls_index: typing.Optional[torch.LongTensor] = Noneis_impossible: typing.Optional[torch.LongTensor] = Nonep_mask: typing.Optional[torch.FloatTensor] = Nonereturn_dict: bool = False ) → or tuple(torch.FloatTensor)
return_dict (bool
, optional, defaults to False
) — Whether or not to return a instead of a plain tuple.
or tuple(torch.FloatTensor)
A or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration (<class 'transformers.configuration_utils.PretrainedConfig'>
) and inputs.
config () — The config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses):
( layer: typing.Union[torch.nn.modules.linear.Linear, transformers.pytorch_utils.Conv1D]index: LongTensordim: typing.Optional[int] = None ) → torch.nn.Linear
or
torch.nn.Linear
or
( layer: Conv1Dindex: LongTensordim: int = 1 ) →
layer () — The layer to prune.
config () — The config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses):