Transformers
  • 🌍GET STARTED
    • Transformers
    • Quick tour
    • Installation
  • 🌍TUTORIALS
    • Run inference with pipelines
    • Write portable code with AutoClass
    • Preprocess data
    • Fine-tune a pretrained model
    • Train with a script
    • Set up distributed training with BOINC AI Accelerate
    • Load and train adapters with BOINC AI PEFT
    • Share your model
    • Agents
    • Generation with LLMs
  • 🌍TASK GUIDES
    • 🌍NATURAL LANGUAGE PROCESSING
      • Text classification
      • Token classification
      • Question answering
      • Causal language modeling
      • Masked language modeling
      • Translation
      • Summarization
      • Multiple choice
    • 🌍AUDIO
      • Audio classification
      • Automatic speech recognition
    • 🌍COMPUTER VISION
      • Image classification
      • Semantic segmentation
      • Video classification
      • Object detection
      • Zero-shot object detection
      • Zero-shot image classification
      • Depth estimation
    • 🌍MULTIMODAL
      • Image captioning
      • Document Question Answering
      • Visual Question Answering
      • Text to speech
    • 🌍GENERATION
      • Customize the generation strategy
    • 🌍PROMPTING
      • Image tasks with IDEFICS
  • 🌍DEVELOPER GUIDES
    • Use fast tokenizers from BOINC AI Tokenizers
    • Run inference with multilingual models
    • Use model-specific APIs
    • Share a custom model
    • Templates for chat models
    • Run training on Amazon SageMaker
    • Export to ONNX
    • Export to TFLite
    • Export to TorchScript
    • Benchmarks
    • Notebooks with examples
    • Community resources
    • Custom Tools and Prompts
    • Troubleshoot
  • 🌍PERFORMANCE AND SCALABILITY
    • Overview
    • 🌍EFFICIENT TRAINING TECHNIQUES
      • Methods and tools for efficient training on a single GPU
      • Multiple GPUs and parallelism
      • Efficient training on CPU
      • Distributed CPU training
      • Training on TPUs
      • Training on TPU with TensorFlow
      • Training on Specialized Hardware
      • Custom hardware for training
      • Hyperparameter Search using Trainer API
    • 🌍OPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • 🌍CONTRIBUTE
    • How to contribute to transformers?
    • How to add a model to BOINC AI Transformers?
    • How to convert a BOINC AI Transformers model to TensorFlow?
    • How to add a pipeline to BOINC AI Transformers?
    • Testing
    • Checks on a Pull Request
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Glossary
    • What BOINC AI Transformers can do
    • How BOINC AI Transformers solve tasks
    • The Transformer model family
    • Summary of the tokenizers
    • Attention mechanisms
    • Padding and truncation
    • BERTology
    • Perplexity of fixed-length models
    • Pipelines for webserver inference
    • Model training anatomy
  • 🌍API
    • 🌍MAIN CLASSES
      • Agents and Tools
      • 🌍Auto Classes
        • Extending the Auto Classes
        • AutoConfig
        • AutoTokenizer
        • AutoFeatureExtractor
        • AutoImageProcessor
        • AutoProcessor
        • Generic model classes
          • AutoModel
          • TFAutoModel
          • FlaxAutoModel
        • Generic pretraining classes
          • AutoModelForPreTraining
          • TFAutoModelForPreTraining
          • FlaxAutoModelForPreTraining
        • Natural Language Processing
          • AutoModelForCausalLM
          • TFAutoModelForCausalLM
          • FlaxAutoModelForCausalLM
          • AutoModelForMaskedLM
          • TFAutoModelForMaskedLM
          • FlaxAutoModelForMaskedLM
          • AutoModelForMaskGenerationge
          • TFAutoModelForMaskGeneration
          • AutoModelForSeq2SeqLM
          • TFAutoModelForSeq2SeqLM
          • FlaxAutoModelForSeq2SeqLM
          • AutoModelForSequenceClassification
          • TFAutoModelForSequenceClassification
          • FlaxAutoModelForSequenceClassification
          • AutoModelForMultipleChoice
          • TFAutoModelForMultipleChoice
          • FlaxAutoModelForMultipleChoice
          • AutoModelForNextSentencePrediction
          • TFAutoModelForNextSentencePrediction
          • FlaxAutoModelForNextSentencePrediction
          • AutoModelForTokenClassification
          • TFAutoModelForTokenClassification
          • FlaxAutoModelForTokenClassification
          • AutoModelForQuestionAnswering
          • TFAutoModelForQuestionAnswering
          • FlaxAutoModelForQuestionAnswering
          • AutoModelForTextEncoding
          • TFAutoModelForTextEncoding
        • Computer vision
          • AutoModelForDepthEstimation
          • AutoModelForImageClassification
          • TFAutoModelForImageClassification
          • FlaxAutoModelForImageClassification
          • AutoModelForVideoClassification
          • AutoModelForMaskedImageModeling
          • TFAutoModelForMaskedImageModeling
          • AutoModelForObjectDetection
          • AutoModelForImageSegmentation
          • AutoModelForImageToImage
          • AutoModelForSemanticSegmentation
          • TFAutoModelForSemanticSegmentation
          • AutoModelForInstanceSegmentation
          • AutoModelForUniversalSegmentation
          • AutoModelForZeroShotImageClassification
          • TFAutoModelForZeroShotImageClassification
          • AutoModelForZeroShotObjectDetection
        • Audio
          • AutoModelForAudioClassification
          • AutoModelForAudioFrameClassification
          • TFAutoModelForAudioFrameClassification
          • AutoModelForCTC
          • AutoModelForSpeechSeq2Seq
          • TFAutoModelForSpeechSeq2Seq
          • FlaxAutoModelForSpeechSeq2Seq
          • AutoModelForAudioXVector
          • AutoModelForTextToSpectrogram
          • AutoModelForTextToWaveform
        • Multimodal
          • AutoModelForTableQuestionAnswering
          • TFAutoModelForTableQuestionAnswering
          • AutoModelForDocumentQuestionAnswering
          • TFAutoModelForDocumentQuestionAnswering
          • AutoModelForVisualQuestionAnswering
          • AutoModelForVision2Seq
          • TFAutoModelForVision2Seq
          • FlaxAutoModelForVision2Seq
      • Callbacks
      • Configuration
      • Data Collator
      • Keras callbacks
      • Logging
      • Models
      • Text Generation
      • ONNX
      • Optimization
      • Model outputs
      • Pipelines
      • Processors
      • Quantization
      • Tokenizer
      • Trainer
      • DeepSpeed Integration
      • Feature Extractor
      • Image Processor
    • 🌍MODELS
      • 🌍TEXT MODELS
        • ALBERT
        • BART
        • BARThez
        • BARTpho
        • BERT
        • BertGeneration
        • BertJapanese
        • Bertweet
        • BigBird
        • BigBirdPegasus
        • BioGpt
        • Blenderbot
        • Blenderbot Small
        • BLOOM
        • BORT
        • ByT5
        • CamemBERT
        • CANINE
        • CodeGen
        • CodeLlama
        • ConvBERT
        • CPM
        • CPMANT
        • CTRL
        • DeBERTa
        • DeBERTa-v2
        • DialoGPT
        • DistilBERT
        • DPR
        • ELECTRA
        • Encoder Decoder Models
        • ERNIE
        • ErnieM
        • ESM
        • Falcon
        • FLAN-T5
        • FLAN-UL2
        • FlauBERT
        • FNet
        • FSMT
        • Funnel Transformer
        • GPT
        • GPT Neo
        • GPT NeoX
        • GPT NeoX Japanese
        • GPT-J
        • GPT2
        • GPTBigCode
        • GPTSAN Japanese
        • GPTSw3
        • HerBERT
        • I-BERT
        • Jukebox
        • LED
        • LLaMA
        • LLama2
        • Longformer
        • LongT5
        • LUKE
        • M2M100
        • MarianMT
        • MarkupLM
        • MBart and MBart-50
        • MEGA
        • MegatronBERT
        • MegatronGPT2
        • Mistral
        • mLUKE
        • MobileBERT
        • MPNet
        • MPT
        • MRA
        • MT5
        • MVP
        • NEZHA
        • NLLB
        • NLLB-MoE
        • Nyströmformer
        • Open-Llama
        • OPT
        • Pegasus
        • PEGASUS-X
        • Persimmon
        • PhoBERT
        • PLBart
        • ProphetNet
        • QDQBert
        • RAG
        • REALM
        • Reformer
        • RemBERT
        • RetriBERT
        • RoBERTa
        • RoBERTa-PreLayerNorm
        • RoCBert
        • RoFormer
        • RWKV
        • Splinter
        • SqueezeBERT
        • SwitchTransformers
        • T5
        • T5v1.1
        • TAPEX
        • Transformer XL
        • UL2
        • UMT5
        • X-MOD
        • XGLM
        • XLM
        • XLM-ProphetNet
        • XLM-RoBERTa
        • XLM-RoBERTa-XL
        • XLM-V
        • XLNet
        • YOSO
      • 🌍VISION MODELS
        • BEiT
        • BiT
        • Conditional DETR
        • ConvNeXT
        • ConvNeXTV2
        • CvT
        • Deformable DETR
        • DeiT
        • DETA
        • DETR
        • DiNAT
        • DINO V2
        • DiT
        • DPT
        • EfficientFormer
        • EfficientNet
        • FocalNet
        • GLPN
        • ImageGPT
        • LeViT
        • Mask2Former
        • MaskFormer
        • MobileNetV1
        • MobileNetV2
        • MobileViT
        • MobileViTV2
        • NAT
        • PoolFormer
        • Pyramid Vision Transformer (PVT)
        • RegNet
        • ResNet
        • SegFormer
        • SwiftFormer
        • Swin Transformer
        • Swin Transformer V2
        • Swin2SR
        • Table Transformer
        • TimeSformer
        • UperNet
        • VAN
        • VideoMAE
        • Vision Transformer (ViT)
        • ViT Hybrid
        • ViTDet
        • ViTMAE
        • ViTMatte
        • ViTMSN
        • ViViT
        • YOLOS
      • 🌍AUDIO MODELS
        • Audio Spectrogram Transformer
        • Bark
        • CLAP
        • EnCodec
        • Hubert
        • MCTCT
        • MMS
        • MusicGen
        • Pop2Piano
        • SEW
        • SEW-D
        • Speech2Text
        • Speech2Text2
        • SpeechT5
        • UniSpeech
        • UniSpeech-SAT
        • VITS
        • Wav2Vec2
        • Wav2Vec2-Conformer
        • Wav2Vec2Phoneme
        • WavLM
        • Whisper
        • XLS-R
        • XLSR-Wav2Vec2
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
        • Decision Transformer
        • Trajectory Transformer
      • 🌍TIME SERIES MODELS
        • Autoformer
        • Informer
        • Time Series Transformer
      • 🌍GRAPH MODELS
        • Graphormer
  • 🌍INTERNAL HELPERS
    • Custom Layers and Utilities
    • Utilities for pipelines
    • Utilities for Tokenizers
    • Utilities for Trainer
    • Utilities for Generation
    • Utilities for Image Processors
    • Utilities for Audio processing
    • General Utilities
    • Utilities for Time Series
Powered by GitBook
On this page
  1. API
  2. MAIN CLASSES
  3. Auto Classes

AutoTokenizer

PreviousAutoConfigNextAutoFeatureExtractor

Last updated 1 year ago

AutoTokenizer

class transformers.AutoTokenizer

( )

This is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when created with the class method.

This class cannot be instantiated directly using __init__() (throws an error).

from_pretrained

( pretrained_model_name_or_path*inputs**kwargs )

Parameters

  • pretrained_model_name_or_path (str or os.PathLike) — Can be either:

    • A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like bert-base-uncased, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased.

    • A path to a directory containing vocabulary files required by the tokenizer, for instance saved using the method, e.g., ./my_model_directory/.

    • A path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (like Bert or XLNet), e.g.: ./my_model_directory/vocab.txt. (Not applicable to all derived classes)

  • inputs (additional positional arguments, optional) — Will be passed along to the Tokenizer __init__() method.

  • config (, optional) — The configuration object used to determine the tokenizer class to instantiate.

  • cache_dir (str or os.PathLike, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.

  • force_download (bool, optional, defaults to False) — Whether or not to force the (re-)download the model weights and configuration files and override the cached versions if they exist.

  • resume_download (bool, optional, defaults to False) — Whether or not to delete incompletely received files. Will attempt to resume the download if such a file exists.

  • proxies (Dict[str, str], optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g., {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}. The proxies are used on each request.

  • revision (str, optional, defaults to "main") — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, so revision can be any identifier allowed by git.

  • subfolder (str, optional) — In case the relevant files are located inside a subfolder of the model repo on huggingface.co (e.g. for facebook/rag-token-base), specify it here.

  • use_fast (bool, optional, defaults to True) — Use a if it is supported for a given model. If a fast tokenizer is not available for a given model, a normal Python-based tokenizer is returned instead.

  • tokenizer_type (str, optional) — Tokenizer type to be loaded.

  • trust_remote_code (bool, optional, defaults to False) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.

  • kwargs (additional keyword arguments, optional) — Will be passed to the Tokenizer __init__() method. Can be used to set special tokens like bos_token, eos_token, unk_token, sep_token, pad_token, cls_token, mask_token, additional_special_tokens. See parameters in the __init__() for more details.

Instantiate one of the tokenizer classes of the library from a pretrained model vocabulary.

The tokenizer class to instantiate is selected based on the model_type property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path:

Examples:

Copied

>>> from transformers import AutoTokenizer

>>> # Download vocabulary from huggingface.co and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

>>> # Download vocabulary from huggingface.co (user-uploaded) and cache.
>>> tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-german-cased")

>>> # If vocabulary files are in a directory (e.g. tokenizer was saved using *save_pretrained('./test/saved_model/')*)
>>> # tokenizer = AutoTokenizer.from_pretrained("./test/bert_saved_model/")

>>> # Download vocabulary from huggingface.co and define model-specific arguments
>>> tokenizer = AutoTokenizer.from_pretrained("roberta-base", add_prefix_space=True)

register

( config_classslow_tokenizer_class = Nonefast_tokenizer_class = Noneexist_ok = False )

Parameters

  • slow_tokenizer_class (PretrainedTokenizer, optional) — The slow tokenizer to register.

  • fast_tokenizer_class (PretrainedTokenizerFast, optional) — The fast tokenizer to register.

Register a new tokenizer in this mapping.

albert — or (ALBERT model)

align — or (ALIGN model)

bark — or (Bark model)

bart — or (BART model)

barthez — or (BARThez model)

bartpho — (BARTpho model)

bert — or (BERT model)

bert-generation — (Bert Generation model)

bert-japanese — (BertJapanese model)

bertweet — (BERTweet model)

big_bird — or (BigBird model)

bigbird_pegasus — or (BigBird-Pegasus model)

biogpt — (BioGpt model)

blenderbot — or (Blenderbot model)

blenderbot-small — (BlenderbotSmall model)

blip — or (BLIP model)

blip-2 — or (BLIP-2 model)

bloom — (BLOOM model)

bridgetower — or (BridgeTower model)

bros — or (BROS model)

byt5 — (ByT5 model)

camembert — or (CamemBERT model)

canine — (CANINE model)

chinese_clip — or (Chinese-CLIP model)

clap — or (CLAP model)

clip — or (CLIP model)

clipseg — or (CLIPSeg model)

code_llama — or (CodeLlama model)

codegen — or (CodeGen model)

convbert — or (ConvBERT model)

cpm — or (CPM model)

cpmant — (CPM-Ant model)

ctrl — (CTRL model)

data2vec-audio — (Data2VecAudio model)

data2vec-text — or (Data2VecText model)

deberta — or (DeBERTa model)

deberta-v2 — or (DeBERTa-v2 model)

distilbert — or (DistilBERT model)

dpr — or (DPR model)

electra — or (ELECTRA model)

ernie — or (ERNIE model)

ernie_m — (ErnieM model)

esm — (ESM model)

flaubert — (FlauBERT model)

fnet — or (FNet model)

fsmt — (FairSeq Machine-Translation model)

funnel — or (Funnel Transformer model)

git — or (GIT model)

gpt-sw3 — (GPT-Sw3 model)

gpt2 — or (OpenAI GPT-2 model)

gpt_bigcode — or (GPTBigCode model)

gpt_neo — or (GPT Neo model)

gpt_neox — (GPT NeoX model)

gpt_neox_japanese — (GPT NeoX Japanese model)

gptj — or (GPT-J model)

gptsan-japanese — (GPTSAN-japanese model)

groupvit — or (GroupViT model)

herbert — or (HerBERT model)

hubert — (Hubert model)

ibert — or (I-BERT model)

idefics — (IDEFICS model)

instructblip — or (InstructBLIP model)

jukebox — (Jukebox model)

layoutlm — or (LayoutLM model)

layoutlmv2 — or (LayoutLMv2 model)

layoutlmv3 — or (LayoutLMv3 model)

layoutxlm — or (LayoutXLM model)

led — or (LED model)

lilt — or (LiLT model)

llama — or (LLaMA model)

longformer — or (Longformer model)

longt5 — or (LongT5 model)

luke — (LUKE model)

lxmert — or (LXMERT model)

m2m_100 — (M2M100 model)

marian — (Marian model)

mbart — or (mBART model)

mbart50 — or (mBART-50 model)

mega — or (MEGA model)

megatron-bert — or (Megatron-BERT model)

mgp-str — (MGP-STR model)

mistral — or (Mistral model)

mluke — (mLUKE model)

mobilebert — or (MobileBERT model)

mpnet — or (MPNet model)

mpt — (MPT model)

mra — or (MRA model)

mt5 — or (MT5 model)

musicgen — or (MusicGen model)

mvp — or (MVP model)

nezha — or (Nezha model)

nllb — or (NLLB model)

nllb-moe — or (NLLB-MOE model)

nystromformer — or (Nyströmformer model)

oneformer — or (OneFormer model)

openai-gpt — or (OpenAI GPT model)

opt — or (OPT model)

owlvit — or (OWL-ViT model)

pegasus — or (Pegasus model)

pegasus_x — or (PEGASUS-X model)

perceiver — (Perceiver model)

persimmon — or (Persimmon model)

phobert — (PhoBERT model)

pix2struct — or (Pix2Struct model)

plbart — (PLBart model)

prophetnet — (ProphetNet model)

qdqbert — or (QDQBert model)

rag — (RAG model)

realm — or (REALM model)

reformer — or (Reformer model)

rembert — or (RemBERT model)

retribert — or (RetriBERT model)

roberta — or (RoBERTa model)

roberta-prelayernorm — or (RoBERTa-PreLayerNorm model)

roc_bert — (RoCBert model)

roformer — or (RoFormer model)

rwkv — (RWKV model)

speech_to_text — (Speech2Text model)

speech_to_text_2 — (Speech2Text2 model)

speecht5 — (SpeechT5 model)

splinter — or (Splinter model)

squeezebert — or (SqueezeBERT model)

switch_transformers — or (SwitchTransformers model)

t5 — or (T5 model)

tapas — (TAPAS model)

tapex — (TAPEX model)

transfo-xl — (Transformer-XL model)

umt5 — or (UMT5 model)

vilt — or (ViLT model)

visual_bert — or (VisualBERT model)

vits — (VITS model)

wav2vec2 — (Wav2Vec2 model)

wav2vec2-conformer — (Wav2Vec2-Conformer model)

wav2vec2_phoneme — (Wav2Vec2Phoneme model)

whisper — or (Whisper model)

xclip — or (X-CLIP model)

xglm — or (XGLM model)

xlm — (XLM model)

xlm-prophetnet — (XLM-ProphetNet model)

xlm-roberta — or (XLM-RoBERTa model)

xlm-roberta-xl — or (XLM-RoBERTa-XL model)

xlnet — or (XLNet model)

xmod — or (X-MOD model)

yoso — or (YOSO model)

config_class () — The configuration corresponding to the model to register.

🌍
🌍
🌍
<source>
AutoTokenizer.from_pretrained()
<source>
save_pretrained()
PretrainedConfig
fast Rust-based tokenizer
AlbertTokenizer
AlbertTokenizerFast
BertTokenizer
BertTokenizerFast
BertTokenizer
BertTokenizerFast
BartTokenizer
BartTokenizerFast
BarthezTokenizer
BarthezTokenizerFast
BartphoTokenizer
BertTokenizer
BertTokenizerFast
BertGenerationTokenizer
BertJapaneseTokenizer
BertweetTokenizer
BigBirdTokenizer
BigBirdTokenizerFast
PegasusTokenizer
PegasusTokenizerFast
BioGptTokenizer
BlenderbotTokenizer
BlenderbotTokenizerFast
BlenderbotSmallTokenizer
BertTokenizer
BertTokenizerFast
GPT2Tokenizer
GPT2TokenizerFast
BloomTokenizerFast
RobertaTokenizer
RobertaTokenizerFast
BertTokenizer
BertTokenizerFast
ByT5Tokenizer
CamembertTokenizer
CamembertTokenizerFast
CanineTokenizer
BertTokenizer
BertTokenizerFast
RobertaTokenizer
RobertaTokenizerFast
CLIPTokenizer
CLIPTokenizerFast
CLIPTokenizer
CLIPTokenizerFast
CodeLlamaTokenizer
CodeLlamaTokenizerFast
CodeGenTokenizer
CodeGenTokenizerFast
ConvBertTokenizer
ConvBertTokenizerFast
CpmTokenizer
CpmTokenizerFast
CpmAntTokenizer
CTRLTokenizer
Wav2Vec2CTCTokenizer
RobertaTokenizer
RobertaTokenizerFast
DebertaTokenizer
DebertaTokenizerFast
DebertaV2Tokenizer
DebertaV2TokenizerFast
DistilBertTokenizer
DistilBertTokenizerFast
DPRQuestionEncoderTokenizer
DPRQuestionEncoderTokenizerFast
ElectraTokenizer
ElectraTokenizerFast
BertTokenizer
BertTokenizerFast
ErnieMTokenizer
EsmTokenizer
FlaubertTokenizer
FNetTokenizer
FNetTokenizerFast
FSMTTokenizer
FunnelTokenizer
FunnelTokenizerFast
BertTokenizer
BertTokenizerFast
GPTSw3Tokenizer
GPT2Tokenizer
GPT2TokenizerFast
GPT2Tokenizer
GPT2TokenizerFast
GPT2Tokenizer
GPT2TokenizerFast
GPTNeoXTokenizerFast
GPTNeoXJapaneseTokenizer
GPT2Tokenizer
GPT2TokenizerFast
GPTSanJapaneseTokenizer
CLIPTokenizer
CLIPTokenizerFast
HerbertTokenizer
HerbertTokenizerFast
Wav2Vec2CTCTokenizer
RobertaTokenizer
RobertaTokenizerFast
LlamaTokenizerFast
GPT2Tokenizer
GPT2TokenizerFast
JukeboxTokenizer
LayoutLMTokenizer
LayoutLMTokenizerFast
LayoutLMv2Tokenizer
LayoutLMv2TokenizerFast
LayoutLMv3Tokenizer
LayoutLMv3TokenizerFast
LayoutXLMTokenizer
LayoutXLMTokenizerFast
LEDTokenizer
LEDTokenizerFast
LayoutLMv3Tokenizer
LayoutLMv3TokenizerFast
LlamaTokenizer
LlamaTokenizerFast
LongformerTokenizer
LongformerTokenizerFast
T5Tokenizer
T5TokenizerFast
LukeTokenizer
LxmertTokenizer
LxmertTokenizerFast
M2M100Tokenizer
MarianTokenizer
MBartTokenizer
MBartTokenizerFast
MBart50Tokenizer
MBart50TokenizerFast
RobertaTokenizer
RobertaTokenizerFast
BertTokenizer
BertTokenizerFast
MgpstrTokenizer
LlamaTokenizer
LlamaTokenizerFast
MLukeTokenizer
MobileBertTokenizer
MobileBertTokenizerFast
MPNetTokenizer
MPNetTokenizerFast
GPTNeoXTokenizerFast
RobertaTokenizer
RobertaTokenizerFast
MT5Tokenizer
MT5TokenizerFast
T5Tokenizer
T5TokenizerFast
MvpTokenizer
MvpTokenizerFast
BertTokenizer
BertTokenizerFast
NllbTokenizer
NllbTokenizerFast
NllbTokenizer
NllbTokenizerFast
AlbertTokenizer
AlbertTokenizerFast
CLIPTokenizer
CLIPTokenizerFast
OpenAIGPTTokenizer
OpenAIGPTTokenizerFast
GPT2Tokenizer
GPT2TokenizerFast
CLIPTokenizer
CLIPTokenizerFast
PegasusTokenizer
PegasusTokenizerFast
PegasusTokenizer
PegasusTokenizerFast
PerceiverTokenizer
LlamaTokenizer
LlamaTokenizerFast
PhobertTokenizer
T5Tokenizer
T5TokenizerFast
PLBartTokenizer
ProphetNetTokenizer
BertTokenizer
BertTokenizerFast
RagTokenizer
RealmTokenizer
RealmTokenizerFast
ReformerTokenizer
ReformerTokenizerFast
RemBertTokenizer
RemBertTokenizerFast
RetriBertTokenizer
RetriBertTokenizerFast
RobertaTokenizer
RobertaTokenizerFast
RobertaTokenizer
RobertaTokenizerFast
RoCBertTokenizer
RoFormerTokenizer
RoFormerTokenizerFast
GPTNeoXTokenizerFast
Speech2TextTokenizer
Speech2Text2Tokenizer
SpeechT5Tokenizer
SplinterTokenizer
SplinterTokenizerFast
SqueezeBertTokenizer
SqueezeBertTokenizerFast
T5Tokenizer
T5TokenizerFast
T5Tokenizer
T5TokenizerFast
TapasTokenizer
TapexTokenizer
TransfoXLTokenizer
T5Tokenizer
T5TokenizerFast
BertTokenizer
BertTokenizerFast
BertTokenizer
BertTokenizerFast
VitsTokenizer
Wav2Vec2CTCTokenizer
Wav2Vec2CTCTokenizer
Wav2Vec2PhonemeCTCTokenizer
WhisperTokenizer
WhisperTokenizerFast
CLIPTokenizer
CLIPTokenizerFast
XGLMTokenizer
XGLMTokenizerFast
XLMTokenizer
XLMProphetNetTokenizer
XLMRobertaTokenizer
XLMRobertaTokenizerFast
XLMRobertaTokenizer
XLMRobertaTokenizerFast
XLNetTokenizer
XLNetTokenizerFast
XLMRobertaTokenizer
XLMRobertaTokenizerFast
AlbertTokenizer
AlbertTokenizerFast
<source>
PretrainedConfig