Transformers
  • 🌍GET STARTED
    • Transformers
    • Quick tour
    • Installation
  • 🌍TUTORIALS
    • Run inference with pipelines
    • Write portable code with AutoClass
    • Preprocess data
    • Fine-tune a pretrained model
    • Train with a script
    • Set up distributed training with BOINC AI Accelerate
    • Load and train adapters with BOINC AI PEFT
    • Share your model
    • Agents
    • Generation with LLMs
  • 🌍TASK GUIDES
    • 🌍NATURAL LANGUAGE PROCESSING
      • Text classification
      • Token classification
      • Question answering
      • Causal language modeling
      • Masked language modeling
      • Translation
      • Summarization
      • Multiple choice
    • 🌍AUDIO
      • Audio classification
      • Automatic speech recognition
    • 🌍COMPUTER VISION
      • Image classification
      • Semantic segmentation
      • Video classification
      • Object detection
      • Zero-shot object detection
      • Zero-shot image classification
      • Depth estimation
    • 🌍MULTIMODAL
      • Image captioning
      • Document Question Answering
      • Visual Question Answering
      • Text to speech
    • 🌍GENERATION
      • Customize the generation strategy
    • 🌍PROMPTING
      • Image tasks with IDEFICS
  • 🌍DEVELOPER GUIDES
    • Use fast tokenizers from BOINC AI Tokenizers
    • Run inference with multilingual models
    • Use model-specific APIs
    • Share a custom model
    • Templates for chat models
    • Run training on Amazon SageMaker
    • Export to ONNX
    • Export to TFLite
    • Export to TorchScript
    • Benchmarks
    • Notebooks with examples
    • Community resources
    • Custom Tools and Prompts
    • Troubleshoot
  • 🌍PERFORMANCE AND SCALABILITY
    • Overview
    • 🌍EFFICIENT TRAINING TECHNIQUES
      • Methods and tools for efficient training on a single GPU
      • Multiple GPUs and parallelism
      • Efficient training on CPU
      • Distributed CPU training
      • Training on TPUs
      • Training on TPU with TensorFlow
      • Training on Specialized Hardware
      • Custom hardware for training
      • Hyperparameter Search using Trainer API
    • 🌍OPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • 🌍CONTRIBUTE
    • How to contribute to transformers?
    • How to add a model to BOINC AI Transformers?
    • How to convert a BOINC AI Transformers model to TensorFlow?
    • How to add a pipeline to BOINC AI Transformers?
    • Testing
    • Checks on a Pull Request
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Glossary
    • What BOINC AI Transformers can do
    • How BOINC AI Transformers solve tasks
    • The Transformer model family
    • Summary of the tokenizers
    • Attention mechanisms
    • Padding and truncation
    • BERTology
    • Perplexity of fixed-length models
    • Pipelines for webserver inference
    • Model training anatomy
  • 🌍API
    • 🌍MAIN CLASSES
      • Agents and Tools
      • 🌍Auto Classes
        • Extending the Auto Classes
        • AutoConfig
        • AutoTokenizer
        • AutoFeatureExtractor
        • AutoImageProcessor
        • AutoProcessor
        • Generic model classes
          • AutoModel
          • TFAutoModel
          • FlaxAutoModel
        • Generic pretraining classes
          • AutoModelForPreTraining
          • TFAutoModelForPreTraining
          • FlaxAutoModelForPreTraining
        • Natural Language Processing
          • AutoModelForCausalLM
          • TFAutoModelForCausalLM
          • FlaxAutoModelForCausalLM
          • AutoModelForMaskedLM
          • TFAutoModelForMaskedLM
          • FlaxAutoModelForMaskedLM
          • AutoModelForMaskGenerationge
          • TFAutoModelForMaskGeneration
          • AutoModelForSeq2SeqLM
          • TFAutoModelForSeq2SeqLM
          • FlaxAutoModelForSeq2SeqLM
          • AutoModelForSequenceClassification
          • TFAutoModelForSequenceClassification
          • FlaxAutoModelForSequenceClassification
          • AutoModelForMultipleChoice
          • TFAutoModelForMultipleChoice
          • FlaxAutoModelForMultipleChoice
          • AutoModelForNextSentencePrediction
          • TFAutoModelForNextSentencePrediction
          • FlaxAutoModelForNextSentencePrediction
          • AutoModelForTokenClassification
          • TFAutoModelForTokenClassification
          • FlaxAutoModelForTokenClassification
          • AutoModelForQuestionAnswering
          • TFAutoModelForQuestionAnswering
          • FlaxAutoModelForQuestionAnswering
          • AutoModelForTextEncoding
          • TFAutoModelForTextEncoding
        • Computer vision
          • AutoModelForDepthEstimation
          • AutoModelForImageClassification
          • TFAutoModelForImageClassification
          • FlaxAutoModelForImageClassification
          • AutoModelForVideoClassification
          • AutoModelForMaskedImageModeling
          • TFAutoModelForMaskedImageModeling
          • AutoModelForObjectDetection
          • AutoModelForImageSegmentation
          • AutoModelForImageToImage
          • AutoModelForSemanticSegmentation
          • TFAutoModelForSemanticSegmentation
          • AutoModelForInstanceSegmentation
          • AutoModelForUniversalSegmentation
          • AutoModelForZeroShotImageClassification
          • TFAutoModelForZeroShotImageClassification
          • AutoModelForZeroShotObjectDetection
        • Audio
          • AutoModelForAudioClassification
          • AutoModelForAudioFrameClassification
          • TFAutoModelForAudioFrameClassification
          • AutoModelForCTC
          • AutoModelForSpeechSeq2Seq
          • TFAutoModelForSpeechSeq2Seq
          • FlaxAutoModelForSpeechSeq2Seq
          • AutoModelForAudioXVector
          • AutoModelForTextToSpectrogram
          • AutoModelForTextToWaveform
        • Multimodal
          • AutoModelForTableQuestionAnswering
          • TFAutoModelForTableQuestionAnswering
          • AutoModelForDocumentQuestionAnswering
          • TFAutoModelForDocumentQuestionAnswering
          • AutoModelForVisualQuestionAnswering
          • AutoModelForVision2Seq
          • TFAutoModelForVision2Seq
          • FlaxAutoModelForVision2Seq
      • Callbacks
      • Configuration
      • Data Collator
      • Keras callbacks
      • Logging
      • Models
      • Text Generation
      • ONNX
      • Optimization
      • Model outputs
      • Pipelines
      • Processors
      • Quantization
      • Tokenizer
      • Trainer
      • DeepSpeed Integration
      • Feature Extractor
      • Image Processor
    • 🌍MODELS
      • 🌍TEXT MODELS
        • ALBERT
        • BART
        • BARThez
        • BARTpho
        • BERT
        • BertGeneration
        • BertJapanese
        • Bertweet
        • BigBird
        • BigBirdPegasus
        • BioGpt
        • Blenderbot
        • Blenderbot Small
        • BLOOM
        • BORT
        • ByT5
        • CamemBERT
        • CANINE
        • CodeGen
        • CodeLlama
        • ConvBERT
        • CPM
        • CPMANT
        • CTRL
        • DeBERTa
        • DeBERTa-v2
        • DialoGPT
        • DistilBERT
        • DPR
        • ELECTRA
        • Encoder Decoder Models
        • ERNIE
        • ErnieM
        • ESM
        • Falcon
        • FLAN-T5
        • FLAN-UL2
        • FlauBERT
        • FNet
        • FSMT
        • Funnel Transformer
        • GPT
        • GPT Neo
        • GPT NeoX
        • GPT NeoX Japanese
        • GPT-J
        • GPT2
        • GPTBigCode
        • GPTSAN Japanese
        • GPTSw3
        • HerBERT
        • I-BERT
        • Jukebox
        • LED
        • LLaMA
        • LLama2
        • Longformer
        • LongT5
        • LUKE
        • M2M100
        • MarianMT
        • MarkupLM
        • MBart and MBart-50
        • MEGA
        • MegatronBERT
        • MegatronGPT2
        • Mistral
        • mLUKE
        • MobileBERT
        • MPNet
        • MPT
        • MRA
        • MT5
        • MVP
        • NEZHA
        • NLLB
        • NLLB-MoE
        • Nyströmformer
        • Open-Llama
        • OPT
        • Pegasus
        • PEGASUS-X
        • Persimmon
        • PhoBERT
        • PLBart
        • ProphetNet
        • QDQBert
        • RAG
        • REALM
        • Reformer
        • RemBERT
        • RetriBERT
        • RoBERTa
        • RoBERTa-PreLayerNorm
        • RoCBert
        • RoFormer
        • RWKV
        • Splinter
        • SqueezeBERT
        • SwitchTransformers
        • T5
        • T5v1.1
        • TAPEX
        • Transformer XL
        • UL2
        • UMT5
        • X-MOD
        • XGLM
        • XLM
        • XLM-ProphetNet
        • XLM-RoBERTa
        • XLM-RoBERTa-XL
        • XLM-V
        • XLNet
        • YOSO
      • 🌍VISION MODELS
        • BEiT
        • BiT
        • Conditional DETR
        • ConvNeXT
        • ConvNeXTV2
        • CvT
        • Deformable DETR
        • DeiT
        • DETA
        • DETR
        • DiNAT
        • DINO V2
        • DiT
        • DPT
        • EfficientFormer
        • EfficientNet
        • FocalNet
        • GLPN
        • ImageGPT
        • LeViT
        • Mask2Former
        • MaskFormer
        • MobileNetV1
        • MobileNetV2
        • MobileViT
        • MobileViTV2
        • NAT
        • PoolFormer
        • Pyramid Vision Transformer (PVT)
        • RegNet
        • ResNet
        • SegFormer
        • SwiftFormer
        • Swin Transformer
        • Swin Transformer V2
        • Swin2SR
        • Table Transformer
        • TimeSformer
        • UperNet
        • VAN
        • VideoMAE
        • Vision Transformer (ViT)
        • ViT Hybrid
        • ViTDet
        • ViTMAE
        • ViTMatte
        • ViTMSN
        • ViViT
        • YOLOS
      • 🌍AUDIO MODELS
        • Audio Spectrogram Transformer
        • Bark
        • CLAP
        • EnCodec
        • Hubert
        • MCTCT
        • MMS
        • MusicGen
        • Pop2Piano
        • SEW
        • SEW-D
        • Speech2Text
        • Speech2Text2
        • SpeechT5
        • UniSpeech
        • UniSpeech-SAT
        • VITS
        • Wav2Vec2
        • Wav2Vec2-Conformer
        • Wav2Vec2Phoneme
        • WavLM
        • Whisper
        • XLS-R
        • XLSR-Wav2Vec2
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
        • Decision Transformer
        • Trajectory Transformer
      • 🌍TIME SERIES MODELS
        • Autoformer
        • Informer
        • Time Series Transformer
      • 🌍GRAPH MODELS
        • Graphormer
  • 🌍INTERNAL HELPERS
    • Custom Layers and Utilities
    • Utilities for pipelines
    • Utilities for Tokenizers
    • Utilities for Trainer
    • Utilities for Generation
    • Utilities for Image Processors
    • Utilities for Audio processing
    • General Utilities
    • Utilities for Time Series
Powered by GitBook
On this page
  1. API
  2. MAIN CLASSES
  3. Auto Classes
  4. Generic model classes

AutoModel

PreviousGeneric model classesNextTFAutoModel

Last updated 1 year ago

AutoModel

class transformers.AutoModel

( *args**kwargs )

This is a generic model class that will be instantiated as one of the base model classes of the library when created with the class method or the class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_config

( **kwargs )

Parameters

  • config () — The model class to instantiate is selected based on the configuration class:

    • configuration class: (Audio Spectrogram Transformer model)

    • configuration class: (ALBERT model)

    • configuration class: (ALIGN model)

    • configuration class: (AltCLIP model)

    • configuration class: (Autoformer model)

    • configuration class: (Bark model)

    • configuration class: (BART model)

    • configuration class: (BEiT model)

    • configuration class: (BERT model)

    • configuration class: (Bert Generation model)

    • configuration class: (BigBird model)

    • configuration class: (BigBird-Pegasus model)

    • configuration class: (BioGpt model)

    • configuration class: (BiT model)

    • configuration class: (Blenderbot model)

    • configuration class: (BlenderbotSmall model)

    • configuration class: (BLIP-2 model)

    • configuration class: (BLIP model)

    • configuration class: (BLOOM model)

    • configuration class: (BridgeTower model)

    • configuration class: (BROS model)

    • configuration class: (CLIP model)

    • configuration class: (CLIPSeg model)

    • configuration class: (CTRL model)

    • configuration class: (CamemBERT model)

    • configuration class: (CANINE model)

    • configuration class: (Chinese-CLIP model)

    • configuration class: (CLAP model)

    • configuration class: (CodeGen model)

    • configuration class: (Conditional DETR model)

    • configuration class: (ConvBERT model)

    • configuration class: (ConvNeXT model)

    • configuration class: (ConvNeXTV2 model)

    • configuration class: (CPM-Ant model)

    • configuration class: (CvT model)

    • configuration class: (DPR model)

    • configuration class: (DPT model)

    • configuration class: (Data2VecAudio model)

    • configuration class: (Data2VecText model)

    • configuration class: (Data2VecVision model)

    • configuration class: (DeBERTa model)

    • configuration class: (DeBERTa-v2 model)

    • configuration class: (Decision Transformer model)

    • configuration class: (Deformable DETR model)

    • configuration class: (DeiT model)

    • configuration class: (DETA model)

    • configuration class: (DETR model)

    • configuration class: (DiNAT model)

    • configuration class: (DINOv2 model)

    • configuration class: (DistilBERT model)

    • configuration class: (DonutSwin model)

    • configuration class: (EfficientFormer model)

    • configuration class: (EfficientNet model)

    • configuration class: (ELECTRA model)

    • configuration class: (EnCodec model)

    • configuration class: (ERNIE model)

    • configuration class: (ErnieM model)

    • configuration class: (ESM model)

    • configuration class: (FNet model)

    • configuration class: (FairSeq Machine-Translation model)

    • configuration class: (Falcon model)

    • configuration class: (FlauBERT model)

    • configuration class: (FLAVA model)

    • configuration class: (FocalNet model)

    • configuration class: or (Funnel Transformer model)

    • configuration class: (GLPN model)

    • configuration class: (OpenAI GPT-2 model)

    • configuration class: (GPTBigCode model)

    • configuration class: (GPT-J model)

    • configuration class: (GPT Neo model)

    • configuration class: (GPT NeoX model)

    • configuration class: (GPT NeoX Japanese model)

    • configuration class: (GPTSAN-japanese model)

    • configuration class: (GIT model)

    • configuration class: (Graphormer model)

    • configuration class: (GroupViT model)

    • configuration class: (Hubert model)

    • configuration class: (I-BERT model)

    • configuration class: (IDEFICS model)

    • configuration class: (ImageGPT model)

    • configuration class: (Informer model)

    • configuration class: (Jukebox model)

    • configuration class: (LED model)

    • configuration class: (LayoutLM model)

    • configuration class: (LayoutLMv2 model)

    • configuration class: (LayoutLMv3 model)

    • configuration class: (LeViT model)

    • configuration class: (LiLT model)

    • configuration class: (LLaMA model)

    • configuration class: (LongT5 model)

    • configuration class: (Longformer model)

    • configuration class: (LUKE model)

    • configuration class: (LXMERT model)

    • configuration class: (M2M100 model)

    • configuration class: (mBART model)

    • configuration class: (M-CTC-T model)

    • configuration class: (MPNet model)

    • configuration class: (MT5 model)

    • configuration class: (Marian model)

    • configuration class: (MarkupLM model)

    • configuration class: (Mask2Former model)

    • configuration class: (MaskFormer model)

    • MaskFormerSwinConfig configuration class: MaskFormerSwinModel (MaskFormerSwin model)

    • configuration class: (MEGA model)

    • configuration class: (Megatron-BERT model)

    • configuration class: (MGP-STR model)

    • configuration class: (Mistral model)

    • configuration class: (MobileBERT model)

    • configuration class: (MobileNetV1 model)

    • configuration class: (MobileNetV2 model)

    • configuration class: (MobileViT model)

    • configuration class: (MobileViTV2 model)

    • configuration class: (MPT model)

    • configuration class: (MRA model)

    • configuration class: (MVP model)

    • configuration class: (NAT model)

    • configuration class: (Nezha model)

    • configuration class: (NLLB-MOE model)

    • configuration class: (Nyströmformer model)

    • configuration class: (OPT model)

    • configuration class: (OneFormer model)

    • configuration class: (OpenAI GPT model)

    • configuration class: (OpenLlama model)

    • configuration class: (OWL-ViT model)

    • configuration class: (PLBart model)

    • configuration class: (Pegasus model)

    • configuration class: (PEGASUS-X model)

    • configuration class: (Perceiver model)

    • configuration class: (Persimmon model)

    • configuration class: (PoolFormer model)

    • configuration class: (ProphetNet model)

    • configuration class: (PVT model)

    • configuration class: (QDQBert model)

    • configuration class: (Reformer model)

    • configuration class: (RegNet model)

    • configuration class: (RemBERT model)

    • configuration class: (ResNet model)

    • configuration class: (RetriBERT model)

    • configuration class: (RoCBert model)

    • configuration class: (RoFormer model)

    • configuration class: (RoBERTa model)

    • configuration class: (RoBERTa-PreLayerNorm model)

    • configuration class: (RWKV model)

    • configuration class: (SEW model)

    • configuration class: (SEW-D model)

    • configuration class: (SAM model)

    • configuration class: (SegFormer model)

    • configuration class: (Speech2Text model)

    • configuration class: (SpeechT5 model)

    • configuration class: (Splinter model)

    • configuration class: (SqueezeBERT model)

    • configuration class: (SwiftFormer model)

    • configuration class: (Swin2SR model)

    • configuration class: (Swin Transformer model)

    • configuration class: (Swin Transformer V2 model)

    • configuration class: (SwitchTransformers model)

    • configuration class: (T5 model)

    • configuration class: (Table Transformer model)

    • configuration class: (TAPAS model)

    • configuration class: (Time Series Transformer model)

    • configuration class: (TimeSformer model)

    • TimmBackboneConfig configuration class: TimmBackbone (TimmBackbone model)

    • configuration class: (Trajectory Transformer model)

    • configuration class: (Transformer-XL model)

    • configuration class: (TVLT model)

    • configuration class: (UMT5 model)

    • configuration class: (UniSpeech model)

    • configuration class: (UniSpeechSat model)

    • configuration class: (VAN model)

    • configuration class: (ViT model)

    • configuration class: (ViT Hybrid model)

    • configuration class: (ViTMAE model)

    • configuration class: (ViTMSN model)

    • configuration class: (VideoMAE model)

    • configuration class: (ViLT model)

    • configuration class: (VisionTextDualEncoder model)

    • configuration class: (VisualBERT model)

    • configuration class: (VitDet model)

    • configuration class: (VITS model)

    • configuration class: (ViViT model)

    • configuration class: (Wav2Vec2 model)

    • configuration class: (Wav2Vec2-Conformer model)

    • configuration class: (WavLM model)

    • configuration class: (Whisper model)

    • configuration class: (X-CLIP model)

    • configuration class: (XGLM model)

    • configuration class: (XLM model)

    • configuration class: (XLM-ProphetNet model)

    • configuration class: (XLM-RoBERTa model)

    • configuration class: (XLM-RoBERTa-XL model)

    • configuration class: (XLNet model)

    • configuration class: (X-MOD model)

    • configuration class: (YOLOS model)

    • configuration class: (YOSO model)

Instantiates one of the base model classes of the library from a configuration.

Examples:

Copied

>>> from transformers import AutoConfig, AutoModel

>>> # Download configuration from huggingface.co and cache.
>>> config = AutoConfig.from_pretrained("bert-base-cased")
>>> model = AutoModel.from_config(config)

from_pretrained

( *model_args**kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.

    • A path or url to a tensorflow index checkpoint file (e.g, ./tf_model/model.ckpt.index). In this case, from_tf should be set to True and a configuration object should be provided as config argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.

  • model_args (additional positional arguments, optional) — Will be passed along to the underlying model __init__() method.

    • The model is a model provided by the library (loaded with the model id string of a pretrained model).

    • The model is loaded by supplying a local directory as pretrained_model_name_or_path and a configuration JSON file named config.json is found in the directory.

  • state_dict (Dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.

  • from_tf (bool, optional, defaults to False) — Load the model weights from a TensorFlow checkpoint save file (see docstring of pretrained_model_name_or_path argument).

  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.

  • resume_download (bool, optional, defaults to False) — Whether or not to delete incompletely received files. Will attempt to resume the download if such a file exists.

  • proxies (Dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.

  • output_loading_info(bool, optional, defaults to False) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.

  • local_files_only(bool, optional, defaults to False) — Whether or not to only look at local files (e.g., not try downloading the model).

  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.

  • code_revision (str, optional, defaults to "main") — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

  • kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., output_attentions=True). Behaves differently depending on whether a config is provided or automatically loaded:

    • If a configuration is provided with config, **kwargs will be directly passed to the underlying model’s __init__ method (we assume all relevant updates to the configuration have already been done)

Instantiate one of the base model classes of the library from a pretrained model.

The model class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

  • maskformer-swin — MaskFormerSwinModel (MaskFormerSwin model)

  • timm_backbone — TimmBackbone (TimmBackbone model)

The model is set in evaluation mode by default using model.eval() (so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()

Examples:

Copied

>>> from transformers import AutoConfig, AutoModel

>>> # Download model and configuration from huggingface.co and cache.
>>> model = AutoModel.from_pretrained("bert-base-cased")

>>> # Update configuration during loading
>>> model = AutoModel.from_pretrained("bert-base-cased", output_attentions=True)
>>> model.config.output_attentions
True

>>> # Loading from a TF checkpoint file instead of a PyTorch model (slower)
>>> config = AutoConfig.from_pretrained("./tf_model/bert_tf_model_config.json")
>>> model = AutoModel.from_pretrained(
...     "./tf_model/bert_tf_checkpoint.ckpt.index", from_tf=True, config=config
... )

Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use to load the model weights.

A path to a directory containing model weights saved using , e.g., ./my_model_directory/.

config (, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:

The model was saved using and is reloaded by supplying the save directory.

This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using and is not a simpler option.

If a configuration is not provided, kwargs will be first passed to the configuration class initialization function (). Each key of kwargs that corresponds to a configuration attribute will be used to override said attribute with the supplied kwargs value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s __init__ function.

albert — (ALBERT model)

align — (ALIGN model)

altclip — (AltCLIP model)

audio-spectrogram-transformer — (Audio Spectrogram Transformer model)

autoformer — (Autoformer model)

bark — (Bark model)

bart — (BART model)

beit — (BEiT model)

bert — (BERT model)

bert-generation — (Bert Generation model)

big_bird — (BigBird model)

bigbird_pegasus — (BigBird-Pegasus model)

biogpt — (BioGpt model)

bit — (BiT model)

blenderbot — (Blenderbot model)

blenderbot-small — (BlenderbotSmall model)

blip — (BLIP model)

blip-2 — (BLIP-2 model)

bloom — (BLOOM model)

bridgetower — (BridgeTower model)

bros — (BROS model)

camembert — (CamemBERT model)

canine — (CANINE model)

chinese_clip — (Chinese-CLIP model)

clap — (CLAP model)

clip — (CLIP model)

clipseg — (CLIPSeg model)

code_llama — (CodeLlama model)

codegen — (CodeGen model)

conditional_detr — (Conditional DETR model)

convbert — (ConvBERT model)

convnext — (ConvNeXT model)

convnextv2 — (ConvNeXTV2 model)

cpmant — (CPM-Ant model)

ctrl — (CTRL model)

cvt — (CvT model)

data2vec-audio — (Data2VecAudio model)

data2vec-text — (Data2VecText model)

data2vec-vision — (Data2VecVision model)

deberta — (DeBERTa model)

deberta-v2 — (DeBERTa-v2 model)

decision_transformer — (Decision Transformer model)

deformable_detr — (Deformable DETR model)

deit — (DeiT model)

deta — (DETA model)

detr — (DETR model)

dinat — (DiNAT model)

dinov2 — (DINOv2 model)

distilbert — (DistilBERT model)

donut-swin — (DonutSwin model)

dpr — (DPR model)

dpt — (DPT model)

efficientformer — (EfficientFormer model)

efficientnet — (EfficientNet model)

electra — (ELECTRA model)

encodec — (EnCodec model)

ernie — (ERNIE model)

ernie_m — (ErnieM model)

esm — (ESM model)

falcon — (Falcon model)

flaubert — (FlauBERT model)

flava — (FLAVA model)

fnet — (FNet model)

focalnet — (FocalNet model)

fsmt — (FairSeq Machine-Translation model)

funnel — or (Funnel Transformer model)

git — (GIT model)

glpn — (GLPN model)

gpt-sw3 — (GPT-Sw3 model)

gpt2 — (OpenAI GPT-2 model)

gpt_bigcode — (GPTBigCode model)

gpt_neo — (GPT Neo model)

gpt_neox — (GPT NeoX model)

gpt_neox_japanese — (GPT NeoX Japanese model)

gptj — (GPT-J model)

gptsan-japanese — (GPTSAN-japanese model)

graphormer — (Graphormer model)

groupvit — (GroupViT model)

hubert — (Hubert model)

ibert — (I-BERT model)

idefics — (IDEFICS model)

imagegpt — (ImageGPT model)

informer — (Informer model)

jukebox — (Jukebox model)

layoutlm — (LayoutLM model)

layoutlmv2 — (LayoutLMv2 model)

layoutlmv3 — (LayoutLMv3 model)

led — (LED model)

levit — (LeViT model)

lilt — (LiLT model)

llama — (LLaMA model)

longformer — (Longformer model)

longt5 — (LongT5 model)

luke — (LUKE model)

lxmert — (LXMERT model)

m2m_100 — (M2M100 model)

marian — (Marian model)

markuplm — (MarkupLM model)

mask2former — (Mask2Former model)

maskformer — (MaskFormer model)

mbart — (mBART model)

mctct — (M-CTC-T model)

mega — (MEGA model)

megatron-bert — (Megatron-BERT model)

mgp-str — (MGP-STR model)

mistral — (Mistral model)

mobilebert — (MobileBERT model)

mobilenet_v1 — (MobileNetV1 model)

mobilenet_v2 — (MobileNetV2 model)

mobilevit — (MobileViT model)

mobilevitv2 — (MobileViTV2 model)

mpnet — (MPNet model)

mpt — (MPT model)

mra — (MRA model)

mt5 — (MT5 model)

mvp — (MVP model)

nat — (NAT model)

nezha — (Nezha model)

nllb-moe — (NLLB-MOE model)

nystromformer — (Nyströmformer model)

oneformer — (OneFormer model)

open-llama — (OpenLlama model)

openai-gpt — (OpenAI GPT model)

opt — (OPT model)

owlvit — (OWL-ViT model)

pegasus — (Pegasus model)

pegasus_x — (PEGASUS-X model)

perceiver — (Perceiver model)

persimmon — (Persimmon model)

plbart — (PLBart model)

poolformer — (PoolFormer model)

prophetnet — (ProphetNet model)

pvt — (PVT model)

qdqbert — (QDQBert model)

reformer — (Reformer model)

regnet — (RegNet model)

rembert — (RemBERT model)

resnet — (ResNet model)

retribert — (RetriBERT model)

roberta — (RoBERTa model)

roberta-prelayernorm — (RoBERTa-PreLayerNorm model)

roc_bert — (RoCBert model)

roformer — (RoFormer model)

rwkv — (RWKV model)

sam — (SAM model)

segformer — (SegFormer model)

sew — (SEW model)

sew-d — (SEW-D model)

speech_to_text — (Speech2Text model)

speecht5 — (SpeechT5 model)

splinter — (Splinter model)

squeezebert — (SqueezeBERT model)

swiftformer — (SwiftFormer model)

swin — (Swin Transformer model)

swin2sr — (Swin2SR model)

swinv2 — (Swin Transformer V2 model)

switch_transformers — (SwitchTransformers model)

t5 — (T5 model)

table-transformer — (Table Transformer model)

tapas — (TAPAS model)

time_series_transformer — (Time Series Transformer model)

timesformer — (TimeSformer model)

trajectory_transformer — (Trajectory Transformer model)

transfo-xl — (Transformer-XL model)

tvlt — (TVLT model)

umt5 — (UMT5 model)

unispeech — (UniSpeech model)

unispeech-sat — (UniSpeechSat model)

van — (VAN model)

videomae — (VideoMAE model)

vilt — (ViLT model)

vision-text-dual-encoder — (VisionTextDualEncoder model)

visual_bert — (VisualBERT model)

vit — (ViT model)

vit_hybrid — (ViT Hybrid model)

vit_mae — (ViTMAE model)

vit_msn — (ViTMSN model)

vitdet — (VitDet model)

vits — (VITS model)

vivit — (ViViT model)

wav2vec2 — (Wav2Vec2 model)

wav2vec2-conformer — (Wav2Vec2-Conformer model)

wavlm — (WavLM model)

whisper — (Whisper model)

xclip — (X-CLIP model)

xglm — (XGLM model)

xlm — (XLM model)

xlm-prophetnet — (XLM-ProphetNet model)

xlm-roberta — (XLM-RoBERTa model)

xlm-roberta-xl — (XLM-RoBERTa-XL model)

xlnet — (XLNet model)

xmod — (X-MOD model)

yolos — (YOLOS model)

yoso — (YOSO model)

🌍
🌍
🌍
<source>
from_pretrained()
from_config()
<source>
PretrainedConfig
ASTConfig
ASTModel
AlbertConfig
AlbertModel
AlignConfig
AlignModel
AltCLIPConfig
AltCLIPModel
AutoformerConfig
AutoformerModel
BarkConfig
BarkModel
BartConfig
BartModel
BeitConfig
BeitModel
BertConfig
BertModel
BertGenerationConfig
BertGenerationEncoder
BigBirdConfig
BigBirdModel
BigBirdPegasusConfig
BigBirdPegasusModel
BioGptConfig
BioGptModel
BitConfig
BitModel
BlenderbotConfig
BlenderbotModel
BlenderbotSmallConfig
BlenderbotSmallModel
Blip2Config
Blip2Model
BlipConfig
BlipModel
BloomConfig
BloomModel
BridgeTowerConfig
BridgeTowerModel
BrosConfig
BrosModel
CLIPConfig
CLIPModel
CLIPSegConfig
CLIPSegModel
CTRLConfig
CTRLModel
CamembertConfig
CamembertModel
CanineConfig
CanineModel
ChineseCLIPConfig
ChineseCLIPModel
ClapConfig
ClapModel
CodeGenConfig
CodeGenModel
ConditionalDetrConfig
ConditionalDetrModel
ConvBertConfig
ConvBertModel
ConvNextConfig
ConvNextModel
ConvNextV2Config
ConvNextV2Model
CpmAntConfig
CpmAntModel
CvtConfig
CvtModel
DPRConfig
DPRQuestionEncoder
DPTConfig
DPTModel
Data2VecAudioConfig
Data2VecAudioModel
Data2VecTextConfig
Data2VecTextModel
Data2VecVisionConfig
Data2VecVisionModel
DebertaConfig
DebertaModel
DebertaV2Config
DebertaV2Model
DecisionTransformerConfig
DecisionTransformerModel
DeformableDetrConfig
DeformableDetrModel
DeiTConfig
DeiTModel
DetaConfig
DetaModel
DetrConfig
DetrModel
DinatConfig
DinatModel
Dinov2Config
Dinov2Model
DistilBertConfig
DistilBertModel
DonutSwinConfig
DonutSwinModel
EfficientFormerConfig
EfficientFormerModel
EfficientNetConfig
EfficientNetModel
ElectraConfig
ElectraModel
EncodecConfig
EncodecModel
ErnieConfig
ErnieModel
ErnieMConfig
ErnieMModel
EsmConfig
EsmModel
FNetConfig
FNetModel
FSMTConfig
FSMTModel
FalconConfig
FalconModel
FlaubertConfig
FlaubertModel
FlavaConfig
FlavaModel
FocalNetConfig
FocalNetModel
FunnelConfig
FunnelModel
FunnelBaseModel
GLPNConfig
GLPNModel
GPT2Config
GPT2Model
GPTBigCodeConfig
GPTBigCodeModel
GPTJConfig
GPTJModel
GPTNeoConfig
GPTNeoModel
GPTNeoXConfig
GPTNeoXModel
GPTNeoXJapaneseConfig
GPTNeoXJapaneseModel
GPTSanJapaneseConfig
GPTSanJapaneseForConditionalGeneration
GitConfig
GitModel
GraphormerConfig
GraphormerModel
GroupViTConfig
GroupViTModel
HubertConfig
HubertModel
IBertConfig
IBertModel
IdeficsConfig
IdeficsModel
ImageGPTConfig
ImageGPTModel
InformerConfig
InformerModel
JukeboxConfig
JukeboxModel
LEDConfig
LEDModel
LayoutLMConfig
LayoutLMModel
LayoutLMv2Config
LayoutLMv2Model
LayoutLMv3Config
LayoutLMv3Model
LevitConfig
LevitModel
LiltConfig
LiltModel
LlamaConfig
LlamaModel
LongT5Config
LongT5Model
LongformerConfig
LongformerModel
LukeConfig
LukeModel
LxmertConfig
LxmertModel
M2M100Config
M2M100Model
MBartConfig
MBartModel
MCTCTConfig
MCTCTModel
MPNetConfig
MPNetModel
MT5Config
MT5Model
MarianConfig
MarianModel
MarkupLMConfig
MarkupLMModel
Mask2FormerConfig
Mask2FormerModel
MaskFormerConfig
MaskFormerModel
MegaConfig
MegaModel
MegatronBertConfig
MegatronBertModel
MgpstrConfig
MgpstrForSceneTextRecognition
MistralConfig
MistralModel
MobileBertConfig
MobileBertModel
MobileNetV1Config
MobileNetV1Model
MobileNetV2Config
MobileNetV2Model
MobileViTConfig
MobileViTModel
MobileViTV2Config
MobileViTV2Model
MptConfig
MptModel
MraConfig
MraModel
MvpConfig
MvpModel
NatConfig
NatModel
NezhaConfig
NezhaModel
NllbMoeConfig
NllbMoeModel
NystromformerConfig
NystromformerModel
OPTConfig
OPTModel
OneFormerConfig
OneFormerModel
OpenAIGPTConfig
OpenAIGPTModel
OpenLlamaConfig
OpenLlamaModel
OwlViTConfig
OwlViTModel
PLBartConfig
PLBartModel
PegasusConfig
PegasusModel
PegasusXConfig
PegasusXModel
PerceiverConfig
PerceiverModel
PersimmonConfig
PersimmonModel
PoolFormerConfig
PoolFormerModel
ProphetNetConfig
ProphetNetModel
PvtConfig
PvtModel
QDQBertConfig
QDQBertModel
ReformerConfig
ReformerModel
RegNetConfig
RegNetModel
RemBertConfig
RemBertModel
ResNetConfig
ResNetModel
RetriBertConfig
RetriBertModel
RoCBertConfig
RoCBertModel
RoFormerConfig
RoFormerModel
RobertaConfig
RobertaModel
RobertaPreLayerNormConfig
RobertaPreLayerNormModel
RwkvConfig
RwkvModel
SEWConfig
SEWModel
SEWDConfig
SEWDModel
SamConfig
SamModel
SegformerConfig
SegformerModel
Speech2TextConfig
Speech2TextModel
SpeechT5Config
SpeechT5Model
SplinterConfig
SplinterModel
SqueezeBertConfig
SqueezeBertModel
SwiftFormerConfig
SwiftFormerModel
Swin2SRConfig
Swin2SRModel
SwinConfig
SwinModel
Swinv2Config
Swinv2Model
SwitchTransformersConfig
SwitchTransformersModel
T5Config
T5Model
TableTransformerConfig
TableTransformerModel
TapasConfig
TapasModel
TimeSeriesTransformerConfig
TimeSeriesTransformerModel
TimesformerConfig
TimesformerModel
TrajectoryTransformerConfig
TrajectoryTransformerModel
TransfoXLConfig
TransfoXLModel
TvltConfig
TvltModel
UMT5Config
UMT5Model
UniSpeechConfig
UniSpeechModel
UniSpeechSatConfig
UniSpeechSatModel
VanConfig
VanModel
ViTConfig
ViTModel
ViTHybridConfig
ViTHybridModel
ViTMAEConfig
ViTMAEModel
ViTMSNConfig
ViTMSNModel
VideoMAEConfig
VideoMAEModel
ViltConfig
ViltModel
VisionTextDualEncoderConfig
VisionTextDualEncoderModel
VisualBertConfig
VisualBertModel
VitDetConfig
VitDetModel
VitsConfig
VitsModel
VivitConfig
VivitModel
Wav2Vec2Config
Wav2Vec2Model
Wav2Vec2ConformerConfig
Wav2Vec2ConformerModel
WavLMConfig
WavLMModel
WhisperConfig
WhisperModel
XCLIPConfig
XCLIPModel
XGLMConfig
XGLMModel
XLMConfig
XLMModel
XLMProphetNetConfig
XLMProphetNetModel
XLMRobertaConfig
XLMRobertaModel
XLMRobertaXLConfig
XLMRobertaXLModel
XLNetConfig
XLNetModel
XmodConfig
XmodModel
YolosConfig
YolosModel
YosoConfig
YosoModel
from_pretrained()
<source>
save_pretrained()
PretrainedConfig
save_pretrained()
save_pretrained()
from_pretrained()
from_pretrained()
AlbertModel
AlignModel
AltCLIPModel
ASTModel
AutoformerModel
BarkModel
BartModel
BeitModel
BertModel
BertGenerationEncoder
BigBirdModel
BigBirdPegasusModel
BioGptModel
BitModel
BlenderbotModel
BlenderbotSmallModel
BlipModel
Blip2Model
BloomModel
BridgeTowerModel
BrosModel
CamembertModel
CanineModel
ChineseCLIPModel
ClapModel
CLIPModel
CLIPSegModel
LlamaModel
CodeGenModel
ConditionalDetrModel
ConvBertModel
ConvNextModel
ConvNextV2Model
CpmAntModel
CTRLModel
CvtModel
Data2VecAudioModel
Data2VecTextModel
Data2VecVisionModel
DebertaModel
DebertaV2Model
DecisionTransformerModel
DeformableDetrModel
DeiTModel
DetaModel
DetrModel
DinatModel
Dinov2Model
DistilBertModel
DonutSwinModel
DPRQuestionEncoder
DPTModel
EfficientFormerModel
EfficientNetModel
ElectraModel
EncodecModel
ErnieModel
ErnieMModel
EsmModel
FalconModel
FlaubertModel
FlavaModel
FNetModel
FocalNetModel
FSMTModel
FunnelModel
FunnelBaseModel
GitModel
GLPNModel
GPT2Model
GPT2Model
GPTBigCodeModel
GPTNeoModel
GPTNeoXModel
GPTNeoXJapaneseModel
GPTJModel
GPTSanJapaneseForConditionalGeneration
GraphormerModel
GroupViTModel
HubertModel
IBertModel
IdeficsModel
ImageGPTModel
InformerModel
JukeboxModel
LayoutLMModel
LayoutLMv2Model
LayoutLMv3Model
LEDModel
LevitModel
LiltModel
LlamaModel
LongformerModel
LongT5Model
LukeModel
LxmertModel
M2M100Model
MarianModel
MarkupLMModel
Mask2FormerModel
MaskFormerModel
MBartModel
MCTCTModel
MegaModel
MegatronBertModel
MgpstrForSceneTextRecognition
MistralModel
MobileBertModel
MobileNetV1Model
MobileNetV2Model
MobileViTModel
MobileViTV2Model
MPNetModel
MptModel
MraModel
MT5Model
MvpModel
NatModel
NezhaModel
NllbMoeModel
NystromformerModel
OneFormerModel
OpenLlamaModel
OpenAIGPTModel
OPTModel
OwlViTModel
PegasusModel
PegasusXModel
PerceiverModel
PersimmonModel
PLBartModel
PoolFormerModel
ProphetNetModel
PvtModel
QDQBertModel
ReformerModel
RegNetModel
RemBertModel
ResNetModel
RetriBertModel
RobertaModel
RobertaPreLayerNormModel
RoCBertModel
RoFormerModel
RwkvModel
SamModel
SegformerModel
SEWModel
SEWDModel
Speech2TextModel
SpeechT5Model
SplinterModel
SqueezeBertModel
SwiftFormerModel
SwinModel
Swin2SRModel
Swinv2Model
SwitchTransformersModel
T5Model
TableTransformerModel
TapasModel
TimeSeriesTransformerModel
TimesformerModel
TrajectoryTransformerModel
TransfoXLModel
TvltModel
UMT5Model
UniSpeechModel
UniSpeechSatModel
VanModel
VideoMAEModel
ViltModel
VisionTextDualEncoderModel
VisualBertModel
ViTModel
ViTHybridModel
ViTMAEModel
ViTMSNModel
VitDetModel
VitsModel
VivitModel
Wav2Vec2Model
Wav2Vec2ConformerModel
WavLMModel
WhisperModel
XCLIPModel
XGLMModel
XLMModel
XLMProphetNetModel
XLMRobertaModel
XLMRobertaXLModel
XLNetModel
XmodModel
YolosModel
YosoModel