Use model-specific APIs

Create a custom architecture

An AutoClassarrow-up-right automatically infers the model architecture and downloads pretrained configuration and weights. Generally, we recommend using an AutoClass to produce checkpoint-agnostic code. But users who want more control over specific model parameters can create a custom 🌍 Transformers model from just a few base classes. This could be particularly useful for anyone who is interested in studying, training or experimenting with a 🌍 Transformers model. In this guide, dive deeper into creating a custom model without an AutoClass. Learn how to:

  • Load and customize a model configuration.

  • Create a model architecture.

  • Create a slow and fast tokenizer for text.

  • Create an image processor for vision tasks.

  • Create a feature extractor for audio tasks.

  • Create a processor for multimodal tasks.

Configuration

A configurationarrow-up-right refers to a model’s specific attributes. Each model configuration has different attributes; for instance, all NLP models have the hidden_size, num_attention_heads, num_hidden_layers and vocab_size attributes in common. These attributes specify the number of attention heads or hidden layers to construct a model with.

Get a closer look at DistilBERTarrow-up-right by accessing DistilBertConfigarrow-up-right to inspect it’s attributes:

Copied

>>> from transformers import DistilBertConfig

>>> config = DistilBertConfig()
>>> print(config)
DistilBertConfig {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "transformers_version": "4.16.2",
  "vocab_size": 30522
}

DistilBertConfigarrow-up-right displays all the default attributes used to build a base DistilBertModelarrow-up-right. All attributes are customizable, creating space for experimentation. For example, you can customize a default model to:

  • Try a different activation function with the activation parameter.

  • Use a higher dropout ratio for the attention probabilities with the attention_dropout parameter.

Copied

Pretrained model attributes can be modified in the from_pretrained()arrow-up-right function:

Copied

Once you are satisfied with your model configuration, you can save it with save_pretrained()arrow-up-right. Your configuration file is stored as a JSON file in the specified save directory:

Copied

To reuse the configuration file, load it with from_pretrained()arrow-up-right:

Copied

You can also save your configuration file as a dictionary or even just the difference between your custom configuration attributes and the default configuration attributes! See the configurationarrow-up-right documentation for more details.

Model

The next step is to create a modelarrow-up-right. The model - also loosely referred to as the architecture - defines what each layer is doing and what operations are happening. Attributes like num_hidden_layers from the configuration are used to define the architecture. Every model shares the base class PreTrainedModelarrow-up-right and a few common methods like resizing input embeddings and pruning self-attention heads. In addition, all models are also either a torch.nn.Modulearrow-up-right, tf.keras.Modelarrow-up-right or flax.linen.Modulearrow-up-right subclass. This means models are compatible with each of their respective framework’s usage.

PytorchHide Pytorch content

Load your custom configuration attributes into the model:

Copied

This creates a model with random values instead of pretrained weights. You won’t be able to use this model for anything useful yet until you train it. Training is a costly and time-consuming process. It is generally better to use a pretrained model to obtain better results faster, while using only a fraction of the resources required for training.

Create a pretrained model with from_pretrained()arrow-up-right:

Copied

When you load pretrained weights, the default model configuration is automatically loaded if the model is provided by 🌍 Transformers. However, you can still replace - some or all of - the default model configuration attributes with your own if you’d like:

Copied

TensorFlowHide TensorFlow content

Load your custom configuration attributes into the model:

Copied

This creates a model with random values instead of pretrained weights. You won’t be able to use this model for anything useful yet until you train it. Training is a costly and time-consuming process. It is generally better to use a pretrained model to obtain better results faster, while using only a fraction of the resources required for training.

Create a pretrained model with from_pretrained()arrow-up-right:

Copied

When you load pretrained weights, the default model configuration is automatically loaded if the model is provided by 🌍 Transformers. However, you can still replace - some or all of - the default model configuration attributes with your own if you’d like:

Copied

Model heads

At this point, you have a base DistilBERT model which outputs the hidden states. The hidden states are passed as inputs to a model head to produce the final output. 🌍 Transformers provides a different model head for each task as long as a model supports the task (i.e., you can’t use DistilBERT for a sequence-to-sequence task like translation).

PytorchHide Pytorch content

For example, DistilBertForSequenceClassificationarrow-up-right is a base DistilBERT model with a sequence classification head. The sequence classification head is a linear layer on top of the pooled outputs.

Copied

Easily reuse this checkpoint for another task by switching to a different model head. For a question answering task, you would use the DistilBertForQuestionAnsweringarrow-up-right model head. The question answering head is similar to the sequence classification head except it is a linear layer on top of the hidden states output.

Copied

TensorFlowHide TensorFlow content

For example, TFDistilBertForSequenceClassificationarrow-up-right is a base DistilBERT model with a sequence classification head. The sequence classification head is a linear layer on top of the pooled outputs.

Copied

Easily reuse this checkpoint for another task by switching to a different model head. For a question answering task, you would use the TFDistilBertForQuestionAnsweringarrow-up-right model head. The question answering head is similar to the sequence classification head except it is a linear layer on top of the hidden states output.

Copied

Tokenizer

The last base class you need before using a model for textual data is a tokenizerarrow-up-right to convert raw text to tensors. There are two types of tokenizers you can use with 🌍 Transformers:

Both tokenizers support common methods such as encoding and decoding, adding new tokens, and managing special tokens.

Not every model supports a fast tokenizer. Take a look at this tablearrow-up-right to check if a model has fast tokenizer support.

If you trained your own tokenizer, you can create one from your vocabulary file:

Copied

It is important to remember the vocabulary from a custom tokenizer will be different from the vocabulary generated by a pretrained model’s tokenizer. You need to use a pretrained model’s vocabulary if you are using a pretrained model, otherwise the inputs won’t make sense. Create a tokenizer with a pretrained model’s vocabulary with the DistilBertTokenizerarrow-up-right class:

Copied

Create a fast tokenizer with the DistilBertTokenizerFastarrow-up-right class:

Copied

By default, AutoTokenizerarrow-up-right will try to load a fast tokenizer. You can disable this behavior by setting use_fast=False in from_pretrained.

Image Processor

An image processor processes vision inputs. It inherits from the base ImageProcessingMixinarrow-up-right class.

To use, create an image processor associated with the model you’re using. For example, create a default ViTImageProcessorarrow-up-right if you are using ViTarrow-up-right for image classification:

Copied

If you aren’t looking for any customization, just use the from_pretrained method to load a model’s default image processor parameters.

Modify any of the ViTImageProcessorarrow-up-right parameters to create your custom image processor:

Copied

Feature Extractor

A feature extractor processes audio inputs. It inherits from the base FeatureExtractionMixinarrow-up-right class, and may also inherit from the SequenceFeatureExtractorarrow-up-right class for processing audio inputs.

To use, create a feature extractor associated with the model you’re using. For example, create a default Wav2Vec2FeatureExtractorarrow-up-right if you are using Wav2Vec2arrow-up-right for audio classification:

Copied

If you aren’t looking for any customization, just use the from_pretrained method to load a model’s default feature extractor parameters.

Modify any of the Wav2Vec2FeatureExtractorarrow-up-right parameters to create your custom feature extractor:

Copied

Processor

For models that support multimodal tasks, 🌍 Transformers offers a processor class that conveniently wraps processing classes such as a feature extractor and a tokenizer into a single object. For example, let’s use the Wav2Vec2Processorarrow-up-right for an automatic speech recognition task (ASR). ASR transcribes audio to text, so you will need a feature extractor and a tokenizer.

Create a feature extractor to handle the audio inputs:

Copied

Create a tokenizer to handle the text inputs:

Copied

Combine the feature extractor and tokenizer in Wav2Vec2Processorarrow-up-right:

Copied

With two basic classes - configuration and model - and an additional preprocessing class (tokenizer, image processor, feature extractor, or processor), you can create any of the models supported by 🌍 Transformers. Each of these base classes are configurable, allowing you to use the specific attributes you want. You can easily setup a model for training or modify an existing pretrained model to fine-tune.

Last updated