Transformers
  • 🌍GET STARTED
    • Transformers
    • Quick tour
    • Installation
  • 🌍TUTORIALS
    • Run inference with pipelines
    • Write portable code with AutoClass
    • Preprocess data
    • Fine-tune a pretrained model
    • Train with a script
    • Set up distributed training with BOINC AI Accelerate
    • Load and train adapters with BOINC AI PEFT
    • Share your model
    • Agents
    • Generation with LLMs
  • 🌍TASK GUIDES
    • 🌍NATURAL LANGUAGE PROCESSING
      • Text classification
      • Token classification
      • Question answering
      • Causal language modeling
      • Masked language modeling
      • Translation
      • Summarization
      • Multiple choice
    • 🌍AUDIO
      • Audio classification
      • Automatic speech recognition
    • 🌍COMPUTER VISION
      • Image classification
      • Semantic segmentation
      • Video classification
      • Object detection
      • Zero-shot object detection
      • Zero-shot image classification
      • Depth estimation
    • 🌍MULTIMODAL
      • Image captioning
      • Document Question Answering
      • Visual Question Answering
      • Text to speech
    • 🌍GENERATION
      • Customize the generation strategy
    • 🌍PROMPTING
      • Image tasks with IDEFICS
  • 🌍DEVELOPER GUIDES
    • Use fast tokenizers from BOINC AI Tokenizers
    • Run inference with multilingual models
    • Use model-specific APIs
    • Share a custom model
    • Templates for chat models
    • Run training on Amazon SageMaker
    • Export to ONNX
    • Export to TFLite
    • Export to TorchScript
    • Benchmarks
    • Notebooks with examples
    • Community resources
    • Custom Tools and Prompts
    • Troubleshoot
  • 🌍PERFORMANCE AND SCALABILITY
    • Overview
    • 🌍EFFICIENT TRAINING TECHNIQUES
      • Methods and tools for efficient training on a single GPU
      • Multiple GPUs and parallelism
      • Efficient training on CPU
      • Distributed CPU training
      • Training on TPUs
      • Training on TPU with TensorFlow
      • Training on Specialized Hardware
      • Custom hardware for training
      • Hyperparameter Search using Trainer API
    • 🌍OPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • 🌍CONTRIBUTE
    • How to contribute to transformers?
    • How to add a model to BOINC AI Transformers?
    • How to convert a BOINC AI Transformers model to TensorFlow?
    • How to add a pipeline to BOINC AI Transformers?
    • Testing
    • Checks on a Pull Request
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Glossary
    • What BOINC AI Transformers can do
    • How BOINC AI Transformers solve tasks
    • The Transformer model family
    • Summary of the tokenizers
    • Attention mechanisms
    • Padding and truncation
    • BERTology
    • Perplexity of fixed-length models
    • Pipelines for webserver inference
    • Model training anatomy
  • 🌍API
    • 🌍MAIN CLASSES
      • Agents and Tools
      • 🌍Auto Classes
        • Extending the Auto Classes
        • AutoConfig
        • AutoTokenizer
        • AutoFeatureExtractor
        • AutoImageProcessor
        • AutoProcessor
        • Generic model classes
          • AutoModel
          • TFAutoModel
          • FlaxAutoModel
        • Generic pretraining classes
          • AutoModelForPreTraining
          • TFAutoModelForPreTraining
          • FlaxAutoModelForPreTraining
        • Natural Language Processing
          • AutoModelForCausalLM
          • TFAutoModelForCausalLM
          • FlaxAutoModelForCausalLM
          • AutoModelForMaskedLM
          • TFAutoModelForMaskedLM
          • FlaxAutoModelForMaskedLM
          • AutoModelForMaskGenerationge
          • TFAutoModelForMaskGeneration
          • AutoModelForSeq2SeqLM
          • TFAutoModelForSeq2SeqLM
          • FlaxAutoModelForSeq2SeqLM
          • AutoModelForSequenceClassification
          • TFAutoModelForSequenceClassification
          • FlaxAutoModelForSequenceClassification
          • AutoModelForMultipleChoice
          • TFAutoModelForMultipleChoice
          • FlaxAutoModelForMultipleChoice
          • AutoModelForNextSentencePrediction
          • TFAutoModelForNextSentencePrediction
          • FlaxAutoModelForNextSentencePrediction
          • AutoModelForTokenClassification
          • TFAutoModelForTokenClassification
          • FlaxAutoModelForTokenClassification
          • AutoModelForQuestionAnswering
          • TFAutoModelForQuestionAnswering
          • FlaxAutoModelForQuestionAnswering
          • AutoModelForTextEncoding
          • TFAutoModelForTextEncoding
        • Computer vision
          • AutoModelForDepthEstimation
          • AutoModelForImageClassification
          • TFAutoModelForImageClassification
          • FlaxAutoModelForImageClassification
          • AutoModelForVideoClassification
          • AutoModelForMaskedImageModeling
          • TFAutoModelForMaskedImageModeling
          • AutoModelForObjectDetection
          • AutoModelForImageSegmentation
          • AutoModelForImageToImage
          • AutoModelForSemanticSegmentation
          • TFAutoModelForSemanticSegmentation
          • AutoModelForInstanceSegmentation
          • AutoModelForUniversalSegmentation
          • AutoModelForZeroShotImageClassification
          • TFAutoModelForZeroShotImageClassification
          • AutoModelForZeroShotObjectDetection
        • Audio
          • AutoModelForAudioClassification
          • AutoModelForAudioFrameClassification
          • TFAutoModelForAudioFrameClassification
          • AutoModelForCTC
          • AutoModelForSpeechSeq2Seq
          • TFAutoModelForSpeechSeq2Seq
          • FlaxAutoModelForSpeechSeq2Seq
          • AutoModelForAudioXVector
          • AutoModelForTextToSpectrogram
          • AutoModelForTextToWaveform
        • Multimodal
          • AutoModelForTableQuestionAnswering
          • TFAutoModelForTableQuestionAnswering
          • AutoModelForDocumentQuestionAnswering
          • TFAutoModelForDocumentQuestionAnswering
          • AutoModelForVisualQuestionAnswering
          • AutoModelForVision2Seq
          • TFAutoModelForVision2Seq
          • FlaxAutoModelForVision2Seq
      • Callbacks
      • Configuration
      • Data Collator
      • Keras callbacks
      • Logging
      • Models
      • Text Generation
      • ONNX
      • Optimization
      • Model outputs
      • Pipelines
      • Processors
      • Quantization
      • Tokenizer
      • Trainer
      • DeepSpeed Integration
      • Feature Extractor
      • Image Processor
    • 🌍MODELS
      • 🌍TEXT MODELS
        • ALBERT
        • BART
        • BARThez
        • BARTpho
        • BERT
        • BertGeneration
        • BertJapanese
        • Bertweet
        • BigBird
        • BigBirdPegasus
        • BioGpt
        • Blenderbot
        • Blenderbot Small
        • BLOOM
        • BORT
        • ByT5
        • CamemBERT
        • CANINE
        • CodeGen
        • CodeLlama
        • ConvBERT
        • CPM
        • CPMANT
        • CTRL
        • DeBERTa
        • DeBERTa-v2
        • DialoGPT
        • DistilBERT
        • DPR
        • ELECTRA
        • Encoder Decoder Models
        • ERNIE
        • ErnieM
        • ESM
        • Falcon
        • FLAN-T5
        • FLAN-UL2
        • FlauBERT
        • FNet
        • FSMT
        • Funnel Transformer
        • GPT
        • GPT Neo
        • GPT NeoX
        • GPT NeoX Japanese
        • GPT-J
        • GPT2
        • GPTBigCode
        • GPTSAN Japanese
        • GPTSw3
        • HerBERT
        • I-BERT
        • Jukebox
        • LED
        • LLaMA
        • LLama2
        • Longformer
        • LongT5
        • LUKE
        • M2M100
        • MarianMT
        • MarkupLM
        • MBart and MBart-50
        • MEGA
        • MegatronBERT
        • MegatronGPT2
        • Mistral
        • mLUKE
        • MobileBERT
        • MPNet
        • MPT
        • MRA
        • MT5
        • MVP
        • NEZHA
        • NLLB
        • NLLB-MoE
        • Nyströmformer
        • Open-Llama
        • OPT
        • Pegasus
        • PEGASUS-X
        • Persimmon
        • PhoBERT
        • PLBart
        • ProphetNet
        • QDQBert
        • RAG
        • REALM
        • Reformer
        • RemBERT
        • RetriBERT
        • RoBERTa
        • RoBERTa-PreLayerNorm
        • RoCBert
        • RoFormer
        • RWKV
        • Splinter
        • SqueezeBERT
        • SwitchTransformers
        • T5
        • T5v1.1
        • TAPEX
        • Transformer XL
        • UL2
        • UMT5
        • X-MOD
        • XGLM
        • XLM
        • XLM-ProphetNet
        • XLM-RoBERTa
        • XLM-RoBERTa-XL
        • XLM-V
        • XLNet
        • YOSO
      • 🌍VISION MODELS
        • BEiT
        • BiT
        • Conditional DETR
        • ConvNeXT
        • ConvNeXTV2
        • CvT
        • Deformable DETR
        • DeiT
        • DETA
        • DETR
        • DiNAT
        • DINO V2
        • DiT
        • DPT
        • EfficientFormer
        • EfficientNet
        • FocalNet
        • GLPN
        • ImageGPT
        • LeViT
        • Mask2Former
        • MaskFormer
        • MobileNetV1
        • MobileNetV2
        • MobileViT
        • MobileViTV2
        • NAT
        • PoolFormer
        • Pyramid Vision Transformer (PVT)
        • RegNet
        • ResNet
        • SegFormer
        • SwiftFormer
        • Swin Transformer
        • Swin Transformer V2
        • Swin2SR
        • Table Transformer
        • TimeSformer
        • UperNet
        • VAN
        • VideoMAE
        • Vision Transformer (ViT)
        • ViT Hybrid
        • ViTDet
        • ViTMAE
        • ViTMatte
        • ViTMSN
        • ViViT
        • YOLOS
      • 🌍AUDIO MODELS
        • Audio Spectrogram Transformer
        • Bark
        • CLAP
        • EnCodec
        • Hubert
        • MCTCT
        • MMS
        • MusicGen
        • Pop2Piano
        • SEW
        • SEW-D
        • Speech2Text
        • Speech2Text2
        • SpeechT5
        • UniSpeech
        • UniSpeech-SAT
        • VITS
        • Wav2Vec2
        • Wav2Vec2-Conformer
        • Wav2Vec2Phoneme
        • WavLM
        • Whisper
        • XLS-R
        • XLSR-Wav2Vec2
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
        • Decision Transformer
        • Trajectory Transformer
      • 🌍TIME SERIES MODELS
        • Autoformer
        • Informer
        • Time Series Transformer
      • 🌍GRAPH MODELS
        • Graphormer
  • 🌍INTERNAL HELPERS
    • Custom Layers and Utilities
    • Utilities for pipelines
    • Utilities for Tokenizers
    • Utilities for Trainer
    • Utilities for Generation
    • Utilities for Image Processors
    • Utilities for Audio processing
    • General Utilities
    • Utilities for Time Series
Powered by GitBook
On this page
  • Custom Layers and Utilities
  • Pytorch custom modules
  • PyTorch Helper Functions
  • TensorFlow custom layers
  • TensorFlow loss functions
  • TensorFlow Helper Functions
  1. INTERNAL HELPERS

Custom Layers and Utilities

PreviousINTERNAL HELPERSNextUtilities for pipelines

Last updated 1 year ago

Custom Layers and Utilities

This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.

Most of those are only useful if you are studying the code of the models in the library.

Pytorch custom modules

class transformers.Conv1D

( nfnx )

Parameters

  • nf (int) — The number of output features.

  • nx (int) — The number of input features.

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.

class transformers.modeling_utils.PoolerStartLogits

( config: PretrainedConfig )

Parameters

Compute SQuAD start logits from sequence hidden states.

forward

( hidden_states: FloatTensorp_mask: typing.Optional[torch.FloatTensor] = None ) → torch.FloatTensor

Parameters

  • hidden_states (torch.FloatTensor of shape (batch_size, seq_len, hidden_size)) — The final hidden states of the model.

  • p_mask (torch.FloatTensor of shape (batch_size, seq_len), optional) — Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.

Returns

torch.FloatTensor

The start logits for SQuAD.

class transformers.modeling_utils.PoolerEndLogits

( config: PretrainedConfig )

Parameters

Compute SQuAD end logits from sequence hidden states.

forward

( hidden_states: FloatTensorstart_states: typing.Optional[torch.FloatTensor] = Nonestart_positions: typing.Optional[torch.LongTensor] = Nonep_mask: typing.Optional[torch.FloatTensor] = None ) → torch.FloatTensor

Parameters

  • hidden_states (torch.FloatTensor of shape (batch_size, seq_len, hidden_size)) — The final hidden states of the model.

  • start_states (torch.FloatTensor of shape (batch_size, seq_len, hidden_size), optional) — The hidden states of the first tokens for the labeled span.

  • start_positions (torch.LongTensor of shape (batch_size,), optional) — The position of the first token for the labeled span.

  • p_mask (torch.FloatTensor of shape (batch_size, seq_len), optional) — Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.

Returns

torch.FloatTensor

The end logits for SQuAD.

One of start_states or start_positions should be not None. If both are set, start_positions overrides start_states.

class transformers.modeling_utils.PoolerAnswerClass

( config )

Parameters

Compute SQuAD 2.0 answer class from classification and start tokens hidden states.

forward

( hidden_states: FloatTensorstart_states: typing.Optional[torch.FloatTensor] = Nonestart_positions: typing.Optional[torch.LongTensor] = Nonecls_index: typing.Optional[torch.LongTensor] = None ) → torch.FloatTensor

Parameters

  • hidden_states (torch.FloatTensor of shape (batch_size, seq_len, hidden_size)) — The final hidden states of the model.

  • start_states (torch.FloatTensor of shape (batch_size, seq_len, hidden_size), optional) — The hidden states of the first tokens for the labeled span.

  • start_positions (torch.LongTensor of shape (batch_size,), optional) — The position of the first token for the labeled span.

  • cls_index (torch.LongTensor of shape (batch_size,), optional) — Position of the CLS token for each sentence in the batch. If None, takes the last token.

Returns

torch.FloatTensor

The SQuAD 2.0 answer class.

One of start_states or start_positions should be not None. If both are set, start_positions overrides start_states.

class transformers.modeling_utils.SquadHeadOutput

( loss: typing.Optional[torch.FloatTensor] = Nonestart_top_log_probs: typing.Optional[torch.FloatTensor] = Nonestart_top_index: typing.Optional[torch.LongTensor] = Noneend_top_log_probs: typing.Optional[torch.FloatTensor] = Noneend_top_index: typing.Optional[torch.LongTensor] = Nonecls_logits: typing.Optional[torch.FloatTensor] = None )

Parameters

  • loss (torch.FloatTensor of shape (1,), optional, returned if both start_positions and end_positions are provided) — Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.

  • start_top_log_probs (torch.FloatTensor of shape (batch_size, config.start_n_top), optional, returned if start_positions or end_positions is not provided) — Log probabilities for the top config.start_n_top start token possibilities (beam-search).

  • start_top_index (torch.LongTensor of shape (batch_size, config.start_n_top), optional, returned if start_positions or end_positions is not provided) — Indices for the top config.start_n_top start token possibilities (beam-search).

  • end_top_log_probs (torch.FloatTensor of shape (batch_size, config.start_n_top * config.end_n_top), optional, returned if start_positions or end_positions is not provided) — Log probabilities for the top config.start_n_top * config.end_n_top end token possibilities (beam-search).

  • end_top_index (torch.LongTensor of shape (batch_size, config.start_n_top * config.end_n_top), optional, returned if start_positions or end_positions is not provided) — Indices for the top config.start_n_top * config.end_n_top end token possibilities (beam-search).

  • cls_logits (torch.FloatTensor of shape (batch_size,), optional, returned if start_positions or end_positions is not provided) — Log probabilities for the is_impossible label of the answers.

class transformers.modeling_utils.SQuADHead

( config )

Parameters

A SQuAD head inspired by XLNet.

forward

Parameters

  • hidden_states (torch.FloatTensor of shape (batch_size, seq_len, hidden_size)) — Final hidden states of the model on the sequence tokens.

  • start_positions (torch.LongTensor of shape (batch_size,), optional) — Positions of the first token for the labeled span.

  • end_positions (torch.LongTensor of shape (batch_size,), optional) — Positions of the last token for the labeled span.

  • cls_index (torch.LongTensor of shape (batch_size,), optional) — Position of the CLS token for each sentence in the batch. If None, takes the last token.

  • is_impossible (torch.LongTensor of shape (batch_size,), optional) — Whether the question has a possible answer in the paragraph or not.

  • p_mask (torch.FloatTensor of shape (batch_size, seq_len), optional) — Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked.

Returns

  • loss (torch.FloatTensor of shape (1,), optional, returned if both start_positions and end_positions are provided) — Classification loss as the sum of start token, end token (and is_impossible if provided) classification losses.

  • start_top_log_probs (torch.FloatTensor of shape (batch_size, config.start_n_top), optional, returned if start_positions or end_positions is not provided) — Log probabilities for the top config.start_n_top start token possibilities (beam-search).

  • start_top_index (torch.LongTensor of shape (batch_size, config.start_n_top), optional, returned if start_positions or end_positions is not provided) — Indices for the top config.start_n_top start token possibilities (beam-search).

  • end_top_log_probs (torch.FloatTensor of shape (batch_size, config.start_n_top * config.end_n_top), optional, returned if start_positions or end_positions is not provided) — Log probabilities for the top config.start_n_top * config.end_n_top end token possibilities (beam-search).

  • end_top_index (torch.LongTensor of shape (batch_size, config.start_n_top * config.end_n_top), optional, returned if start_positions or end_positions is not provided) — Indices for the top config.start_n_top * config.end_n_top end token possibilities (beam-search).

  • cls_logits (torch.FloatTensor of shape (batch_size,), optional, returned if start_positions or end_positions is not provided) — Log probabilities for the is_impossible label of the answers.

class transformers.modeling_utils.SequenceSummary

( config: PretrainedConfig )

Parameters

    • summary_type (str) — The method to use to make this summary. Accepted values are:

      • "last" — Take the last token hidden state (like XLNet)

      • "first" — Take the first token hidden state (like Bert)

      • "mean" — Take the mean of all tokens hidden states

      • "cls_index" — Supply a Tensor of classification token position (GPT/GPT-2)

      • "attn" — Not implemented now, use multi-head attention

    • summary_use_proj (bool) — Add a projection after the vector extraction.

    • summary_proj_to_labels (bool) — If True, the projection outputs to config.num_labels classes (otherwise to config.hidden_size).

    • summary_activation (Optional[str]) — Set to "tanh" to add a tanh activation to the output, another string or None will add no activation.

    • summary_first_dropout (float) — Optional dropout probability before the projection and activation.

    • summary_last_dropout (float)— Optional dropout probability after the projection and activation.

Compute a single vector summary of a sequence hidden states.

forward

( hidden_states: FloatTensorcls_index: typing.Optional[torch.LongTensor] = None ) → torch.FloatTensor

Parameters

  • hidden_states (torch.FloatTensor of shape [batch_size, seq_len, hidden_size]) — The hidden states of the last layer.

  • cls_index (torch.LongTensor of shape [batch_size] or [batch_size, ...] where … are optional leading dimensions of hidden_states, optional) — Used if summary_type == "cls_index" and takes the last token of the sequence as classification token.

Returns

torch.FloatTensor

The summary of the sequence hidden states.

Compute a single vector summary of a sequence hidden states.

PyTorch Helper Functions

transformers.apply_chunking_to_forward

( forward_fn: typing.Callable[..., torch.Tensor]chunk_size: intchunk_dim: int*input_tensors ) → torch.Tensor

Parameters

  • forward_fn (Callable[..., torch.Tensor]) — The forward function of the model.

  • chunk_size (int) — The chunk size of a chunked tensor: num_chunks = len(input_tensors[0]) / chunk_size.

  • chunk_dim (int) — The dimension over which the input_tensors should be chunked.

  • input_tensors (Tuple[torch.Tensor]) — The input tensors of forward_fn which will be chunked

Returns

torch.Tensor

A tensor with the same shape as the forward_fn would have given if applied`.

This function chunks the input_tensors into smaller input tensor parts of size chunk_size over the dimension chunk_dim. It then applies a layer forward_fn to each chunk independently to save memory.

If the forward_fn is independent across the chunk_dim this function will yield the same result as directly applying forward_fn to input_tensors.

Examples:

Copied

# rename the usual forward() fn to forward_chunk()
def forward_chunk(self, hidden_states):
    hidden_states = self.decoder(hidden_states)
    return hidden_states


# implement a chunked forward function
def forward(self, hidden_states):
    return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states)

transformers.pytorch_utils.find_pruneable_heads_and_indices

( heads: typing.List[int]n_heads: inthead_size: intalready_pruned_heads: typing.Set[int] ) → Tuple[Set[int], torch.LongTensor]

Parameters

  • heads (List[int]) — List of the indices of heads to prune.

  • n_heads (int) — The number of heads in the model.

  • head_size (int) — The size of each head.

  • already_pruned_heads (Set[int]) — A set of already pruned heads.

Returns

Tuple[Set[int], torch.LongTensor]

A tuple with the indices of heads to prune taking already_pruned_heads into account and the indices of rows/columns to keep in the layer weight.

Finds the heads and their indices taking already_pruned_heads into account.

transformers.prune_layer

Parameters

  • layer (Union[torch.nn.Linear, Conv1D]) — The layer to prune.

  • index (torch.LongTensor) — The indices to keep in the layer.

  • dim (int, optional) — The dimension on which to keep the indices.

Returns

The pruned layer as a new layer with requires_grad=True.

Prune a Conv1D or linear layer to keep only entries in index.

Used to remove heads.

transformers.pytorch_utils.prune_conv1d_layer

Parameters

  • index (torch.LongTensor) — The indices to keep in the layer.

  • dim (int, optional, defaults to 1) — The dimension on which to keep the indices.

Returns

The pruned layer as a new layer with requires_grad=True.

Prune a Conv1D layer to keep only entries in index. A Conv1D work as a Linear layer (see e.g. BERT) but the weights are transposed.

Used to remove heads.

transformers.pytorch_utils.prune_linear_layer

( layer: Linearindex: LongTensordim: int = 0 ) → torch.nn.Linear

Parameters

  • layer (torch.nn.Linear) — The layer to prune.

  • index (torch.LongTensor) — The indices to keep in the layer.

  • dim (int, optional, defaults to 0) — The dimension on which to keep the indices.

Returns

torch.nn.Linear

The pruned layer as a new layer with requires_grad=True.

Prune a linear layer to keep only entries in index.

Used to remove heads.

TensorFlow custom layers

class transformers.modeling_tf_utils.TFConv1D

( *args**kwargs )

Parameters

  • nf (int) — The number of output features.

  • nx (int) — The number of input features.

  • initializer_range (float, optional, defaults to 0.02) — The standard deviation to use to initialize the weights.

  • kwargs (Dict[str, Any], optional) — Additional keyword arguments passed along to the __init__ of tf.keras.layers.Layer.

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.

class transformers.TFSequenceSummary

( *args**kwargs )

Parameters

    • summary_type (str) — The method to use to make this summary. Accepted values are:

      • "last" — Take the last token hidden state (like XLNet)

      • "first" — Take the first token hidden state (like Bert)

      • "mean" — Take the mean of all tokens hidden states

      • "cls_index" — Supply a Tensor of classification token position (GPT/GPT-2)

      • "attn" — Not implemented now, use multi-head attention

    • summary_use_proj (bool) — Add a projection after the vector extraction.

    • summary_proj_to_labels (bool) — If True, the projection outputs to config.num_labels classes (otherwise to config.hidden_size).

    • summary_activation (Optional[str]) — Set to "tanh" to add a tanh activation to the output, another string or None will add no activation.

    • summary_first_dropout (float) — Optional dropout probability before the projection and activation.

    • summary_last_dropout (float)— Optional dropout probability after the projection and activation.

  • initializer_range (float, defaults to 0.02) — The standard deviation to use to initialize the weights.

  • kwargs (Dict[str, Any], optional) — Additional keyword arguments passed along to the __init__ of tf.keras.layers.Layer.

Compute a single vector summary of a sequence hidden states.

TensorFlow loss functions

class transformers.modeling_tf_utils.TFCausalLanguageModelingLoss

( )

Loss function suitable for causal language modeling (CLM), that is, the task of guessing the next token.

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

class transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss

( )

Loss function suitable for masked language modeling (MLM), that is, the task of guessing the masked tokens.

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

class transformers.modeling_tf_utils.TFMultipleChoiceLoss

( )

Loss function suitable for multiple choice tasks.

class transformers.modeling_tf_utils.TFQuestionAnsweringLoss

( )

Loss function suitable for question answering.

class transformers.modeling_tf_utils.TFSequenceClassificationLoss

( )

Loss function suitable for sequence classification.

class transformers.modeling_tf_utils.TFTokenClassificationLoss

( )

Loss function suitable for token classification.

Any label of -100 will be ignored (along with the corresponding logits) in the loss computation.

TensorFlow Helper Functions

transformers.modeling_tf_utils.get_initializer

( initializer_range: float = 0.02 ) → tf.keras.initializers.TruncatedNormal

Parameters

  • initializer_range (float, defaults to 0.02) — Standard deviation of the initializer range.

Returns

tf.keras.initializers.TruncatedNormal

The truncated normal initializer.

Creates a tf.keras.initializers.TruncatedNormal with the given range.

transformers.modeling_tf_utils.keras_serializable

( )

Parameters

  • cls (a tf.keras.layers.Layers subclass) — Typically a TF.MainLayer class in this project, in general must accept a config argument to its initializer.

Decorate a Keras Layer class to support Keras serialization.

This is done by:

  1. Adding a transformers_config dict to the Keras config dictionary in get_config (called by Keras at serialization time.

  2. Wrapping __init__ to accept that transformers_config dict (passed by Keras at deserialization time) and convert it to a config object for the actual layer initializer.

  3. Registering the class as a custom object in Keras (if the Tensorflow version supports this), so that it does not need to be supplied in custom_objects in the call to tf.keras.models.load_model.

transformers.shape_list

( tensor: typing.Union[tensorflow.python.framework.ops.Tensor, numpy.ndarray] ) → List[int]

Parameters

  • tensor (tf.Tensor or np.ndarray) — The tensor we want the shape of.

Returns

List[int]

The shape of the tensor as a list.

Deal with dynamic shape in tensorflow cleanly.

config () — The config used by the model, will be used to grab the hidden_size of the model.

config () — The config used by the model, will be used to grab the hidden_size of the model and the layer_norm_eps to use.

config () — The config used by the model, will be used to grab the hidden_size of the model.

Base class for outputs of question answering models using a .

config () — The config used by the model, will be used to grab the hidden_size of the model and the layer_norm_eps to use.

( hidden_states: FloatTensorstart_positions: typing.Optional[torch.LongTensor] = Noneend_positions: typing.Optional[torch.LongTensor] = Nonecls_index: typing.Optional[torch.LongTensor] = Noneis_impossible: typing.Optional[torch.LongTensor] = Nonep_mask: typing.Optional[torch.FloatTensor] = Nonereturn_dict: bool = False ) → or tuple(torch.FloatTensor)

return_dict (bool, optional, defaults to False) — Whether or not to return a instead of a plain tuple.

or tuple(torch.FloatTensor)

A or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (<class 'transformers.configuration_utils.PretrainedConfig'>) and inputs.

config () — The config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses):

( layer: typing.Union[torch.nn.modules.linear.Linear, transformers.pytorch_utils.Conv1D]index: LongTensordim: typing.Optional[int] = None ) → torch.nn.Linear or

torch.nn.Linear or

( layer: Conv1Dindex: LongTensordim: int = 1 ) →

layer () — The layer to prune.

config () — The config used by the model. Relevant arguments in the config class of the model are (refer to the actual config class of your model for the default values it uses):

🌍
<source>
<source>
PretrainedConfig
<source>
<source>
PretrainedConfig
<source>
<source>
PretrainedConfig
<source>
<source>
SQuADHead
<source>
PretrainedConfig
<source>
transformers.modeling_utils.SquadHeadOutput
ModelOutput
transformers.modeling_utils.SquadHeadOutput
transformers.modeling_utils.SquadHeadOutput
<source>
PretrainedConfig
<source>
<source>
<source>
<source>
Conv1D
Conv1D
<source>
Conv1D
Conv1D
Conv1D
<source>
<source>
<source>
PretrainedConfig
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>
<source>