Transformers
  • 🌍GET STARTED
    • Transformers
    • Quick tour
    • Installation
  • 🌍TUTORIALS
    • Run inference with pipelines
    • Write portable code with AutoClass
    • Preprocess data
    • Fine-tune a pretrained model
    • Train with a script
    • Set up distributed training with BOINC AI Accelerate
    • Load and train adapters with BOINC AI PEFT
    • Share your model
    • Agents
    • Generation with LLMs
  • 🌍TASK GUIDES
    • 🌍NATURAL LANGUAGE PROCESSING
      • Text classification
      • Token classification
      • Question answering
      • Causal language modeling
      • Masked language modeling
      • Translation
      • Summarization
      • Multiple choice
    • 🌍AUDIO
      • Audio classification
      • Automatic speech recognition
    • 🌍COMPUTER VISION
      • Image classification
      • Semantic segmentation
      • Video classification
      • Object detection
      • Zero-shot object detection
      • Zero-shot image classification
      • Depth estimation
    • 🌍MULTIMODAL
      • Image captioning
      • Document Question Answering
      • Visual Question Answering
      • Text to speech
    • 🌍GENERATION
      • Customize the generation strategy
    • 🌍PROMPTING
      • Image tasks with IDEFICS
  • 🌍DEVELOPER GUIDES
    • Use fast tokenizers from BOINC AI Tokenizers
    • Run inference with multilingual models
    • Use model-specific APIs
    • Share a custom model
    • Templates for chat models
    • Run training on Amazon SageMaker
    • Export to ONNX
    • Export to TFLite
    • Export to TorchScript
    • Benchmarks
    • Notebooks with examples
    • Community resources
    • Custom Tools and Prompts
    • Troubleshoot
  • 🌍PERFORMANCE AND SCALABILITY
    • Overview
    • 🌍EFFICIENT TRAINING TECHNIQUES
      • Methods and tools for efficient training on a single GPU
      • Multiple GPUs and parallelism
      • Efficient training on CPU
      • Distributed CPU training
      • Training on TPUs
      • Training on TPU with TensorFlow
      • Training on Specialized Hardware
      • Custom hardware for training
      • Hyperparameter Search using Trainer API
    • 🌍OPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • 🌍CONTRIBUTE
    • How to contribute to transformers?
    • How to add a model to BOINC AI Transformers?
    • How to convert a BOINC AI Transformers model to TensorFlow?
    • How to add a pipeline to BOINC AI Transformers?
    • Testing
    • Checks on a Pull Request
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Glossary
    • What BOINC AI Transformers can do
    • How BOINC AI Transformers solve tasks
    • The Transformer model family
    • Summary of the tokenizers
    • Attention mechanisms
    • Padding and truncation
    • BERTology
    • Perplexity of fixed-length models
    • Pipelines for webserver inference
    • Model training anatomy
  • 🌍API
    • 🌍MAIN CLASSES
      • Agents and Tools
      • 🌍Auto Classes
        • Extending the Auto Classes
        • AutoConfig
        • AutoTokenizer
        • AutoFeatureExtractor
        • AutoImageProcessor
        • AutoProcessor
        • Generic model classes
          • AutoModel
          • TFAutoModel
          • FlaxAutoModel
        • Generic pretraining classes
          • AutoModelForPreTraining
          • TFAutoModelForPreTraining
          • FlaxAutoModelForPreTraining
        • Natural Language Processing
          • AutoModelForCausalLM
          • TFAutoModelForCausalLM
          • FlaxAutoModelForCausalLM
          • AutoModelForMaskedLM
          • TFAutoModelForMaskedLM
          • FlaxAutoModelForMaskedLM
          • AutoModelForMaskGenerationge
          • TFAutoModelForMaskGeneration
          • AutoModelForSeq2SeqLM
          • TFAutoModelForSeq2SeqLM
          • FlaxAutoModelForSeq2SeqLM
          • AutoModelForSequenceClassification
          • TFAutoModelForSequenceClassification
          • FlaxAutoModelForSequenceClassification
          • AutoModelForMultipleChoice
          • TFAutoModelForMultipleChoice
          • FlaxAutoModelForMultipleChoice
          • AutoModelForNextSentencePrediction
          • TFAutoModelForNextSentencePrediction
          • FlaxAutoModelForNextSentencePrediction
          • AutoModelForTokenClassification
          • TFAutoModelForTokenClassification
          • FlaxAutoModelForTokenClassification
          • AutoModelForQuestionAnswering
          • TFAutoModelForQuestionAnswering
          • FlaxAutoModelForQuestionAnswering
          • AutoModelForTextEncoding
          • TFAutoModelForTextEncoding
        • Computer vision
          • AutoModelForDepthEstimation
          • AutoModelForImageClassification
          • TFAutoModelForImageClassification
          • FlaxAutoModelForImageClassification
          • AutoModelForVideoClassification
          • AutoModelForMaskedImageModeling
          • TFAutoModelForMaskedImageModeling
          • AutoModelForObjectDetection
          • AutoModelForImageSegmentation
          • AutoModelForImageToImage
          • AutoModelForSemanticSegmentation
          • TFAutoModelForSemanticSegmentation
          • AutoModelForInstanceSegmentation
          • AutoModelForUniversalSegmentation
          • AutoModelForZeroShotImageClassification
          • TFAutoModelForZeroShotImageClassification
          • AutoModelForZeroShotObjectDetection
        • Audio
          • AutoModelForAudioClassification
          • AutoModelForAudioFrameClassification
          • TFAutoModelForAudioFrameClassification
          • AutoModelForCTC
          • AutoModelForSpeechSeq2Seq
          • TFAutoModelForSpeechSeq2Seq
          • FlaxAutoModelForSpeechSeq2Seq
          • AutoModelForAudioXVector
          • AutoModelForTextToSpectrogram
          • AutoModelForTextToWaveform
        • Multimodal
          • AutoModelForTableQuestionAnswering
          • TFAutoModelForTableQuestionAnswering
          • AutoModelForDocumentQuestionAnswering
          • TFAutoModelForDocumentQuestionAnswering
          • AutoModelForVisualQuestionAnswering
          • AutoModelForVision2Seq
          • TFAutoModelForVision2Seq
          • FlaxAutoModelForVision2Seq
      • Callbacks
      • Configuration
      • Data Collator
      • Keras callbacks
      • Logging
      • Models
      • Text Generation
      • ONNX
      • Optimization
      • Model outputs
      • Pipelines
      • Processors
      • Quantization
      • Tokenizer
      • Trainer
      • DeepSpeed Integration
      • Feature Extractor
      • Image Processor
    • 🌍MODELS
      • 🌍TEXT MODELS
        • ALBERT
        • BART
        • BARThez
        • BARTpho
        • BERT
        • BertGeneration
        • BertJapanese
        • Bertweet
        • BigBird
        • BigBirdPegasus
        • BioGpt
        • Blenderbot
        • Blenderbot Small
        • BLOOM
        • BORT
        • ByT5
        • CamemBERT
        • CANINE
        • CodeGen
        • CodeLlama
        • ConvBERT
        • CPM
        • CPMANT
        • CTRL
        • DeBERTa
        • DeBERTa-v2
        • DialoGPT
        • DistilBERT
        • DPR
        • ELECTRA
        • Encoder Decoder Models
        • ERNIE
        • ErnieM
        • ESM
        • Falcon
        • FLAN-T5
        • FLAN-UL2
        • FlauBERT
        • FNet
        • FSMT
        • Funnel Transformer
        • GPT
        • GPT Neo
        • GPT NeoX
        • GPT NeoX Japanese
        • GPT-J
        • GPT2
        • GPTBigCode
        • GPTSAN Japanese
        • GPTSw3
        • HerBERT
        • I-BERT
        • Jukebox
        • LED
        • LLaMA
        • LLama2
        • Longformer
        • LongT5
        • LUKE
        • M2M100
        • MarianMT
        • MarkupLM
        • MBart and MBart-50
        • MEGA
        • MegatronBERT
        • MegatronGPT2
        • Mistral
        • mLUKE
        • MobileBERT
        • MPNet
        • MPT
        • MRA
        • MT5
        • MVP
        • NEZHA
        • NLLB
        • NLLB-MoE
        • Nyströmformer
        • Open-Llama
        • OPT
        • Pegasus
        • PEGASUS-X
        • Persimmon
        • PhoBERT
        • PLBart
        • ProphetNet
        • QDQBert
        • RAG
        • REALM
        • Reformer
        • RemBERT
        • RetriBERT
        • RoBERTa
        • RoBERTa-PreLayerNorm
        • RoCBert
        • RoFormer
        • RWKV
        • Splinter
        • SqueezeBERT
        • SwitchTransformers
        • T5
        • T5v1.1
        • TAPEX
        • Transformer XL
        • UL2
        • UMT5
        • X-MOD
        • XGLM
        • XLM
        • XLM-ProphetNet
        • XLM-RoBERTa
        • XLM-RoBERTa-XL
        • XLM-V
        • XLNet
        • YOSO
      • 🌍VISION MODELS
        • BEiT
        • BiT
        • Conditional DETR
        • ConvNeXT
        • ConvNeXTV2
        • CvT
        • Deformable DETR
        • DeiT
        • DETA
        • DETR
        • DiNAT
        • DINO V2
        • DiT
        • DPT
        • EfficientFormer
        • EfficientNet
        • FocalNet
        • GLPN
        • ImageGPT
        • LeViT
        • Mask2Former
        • MaskFormer
        • MobileNetV1
        • MobileNetV2
        • MobileViT
        • MobileViTV2
        • NAT
        • PoolFormer
        • Pyramid Vision Transformer (PVT)
        • RegNet
        • ResNet
        • SegFormer
        • SwiftFormer
        • Swin Transformer
        • Swin Transformer V2
        • Swin2SR
        • Table Transformer
        • TimeSformer
        • UperNet
        • VAN
        • VideoMAE
        • Vision Transformer (ViT)
        • ViT Hybrid
        • ViTDet
        • ViTMAE
        • ViTMatte
        • ViTMSN
        • ViViT
        • YOLOS
      • 🌍AUDIO MODELS
        • Audio Spectrogram Transformer
        • Bark
        • CLAP
        • EnCodec
        • Hubert
        • MCTCT
        • MMS
        • MusicGen
        • Pop2Piano
        • SEW
        • SEW-D
        • Speech2Text
        • Speech2Text2
        • SpeechT5
        • UniSpeech
        • UniSpeech-SAT
        • VITS
        • Wav2Vec2
        • Wav2Vec2-Conformer
        • Wav2Vec2Phoneme
        • WavLM
        • Whisper
        • XLS-R
        • XLSR-Wav2Vec2
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
        • Decision Transformer
        • Trajectory Transformer
      • 🌍TIME SERIES MODELS
        • Autoformer
        • Informer
        • Time Series Transformer
      • 🌍GRAPH MODELS
        • Graphormer
  • 🌍INTERNAL HELPERS
    • Custom Layers and Utilities
    • Utilities for pipelines
    • Utilities for Tokenizers
    • Utilities for Trainer
    • Utilities for Generation
    • Utilities for Image Processors
    • Utilities for Audio processing
    • General Utilities
    • Utilities for Time Series
Powered by GitBook
On this page
  1. GET STARTED

Transformers

PreviousGET STARTEDNextQuick tour

Last updated 1 year ago

State-of-the-art Machine Learning for , , and . By BOINC AI team.

🌍Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:

📝 Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. 🖼️ Computer Vision: image classification, object detection, and segmentation. 🗣️ Audio: automatic speech recognition and audio classification. 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

🌍Transformers support framework interoperability between PyTorch, TensorFlow, and JAX. This provides the flexibility to use a different framework at each stage of a model’s life; train a model in three lines of code in one framework, and load it for inference in another. Models can also be exported to a format like ONNX and TorchScript for deployment in production environments.

Contents

The documentation is organized into five sections:

  • GET STARTED provides a quick tour of the library and installation instructions to get up and running.

  • TUTORIALS are a great place to start if you’re a beginner. This section will help you gain the basic skills you need to start using the library.

  • HOW-TO GUIDES show you how to achieve a specific goal, like finetuning a pretrained model for language modeling or how to write and share a custom model.

  • CONCEPTUAL GUIDES offers more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🌍 Transformers.

  • API describes all classes and functions:

    • MAIN CLASSES details the most important classes like configuration, model, tokenizer, and pipeline.

    • MODELS details the classes and functions related to each model implemented in the library.

    • INTERNAL HELPERS details utility classes and functions used internally.

Supported models

  1. (from Google Research and the Toyota Technological Institute at Chicago) released with the paper , by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.

  2. (from Google Research) released with the paper by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.

  3. (from BAAI) released with the paper by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.

  4. (from MIT) released with the paper by Yuan Gong, Yu-An Chung, James Glass.

  5. (from Tsinghua University) released with the paper by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.

  6. (from Suno) released in the repository by Suno AI team.

  7. (from Facebook) released with the paper by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.

  8. (from École polytechnique) released with the paper by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.

  9. (from VinAI Research) released with the paper by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.

  10. (from Microsoft) released with the paper by Hangbo Bao, Li Dong, Furu Wei.

  11. (from Google) released with the paper by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

  12. (from Google) released with the paper by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.

  13. (from VinAI Research) released with the paper by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.

  14. (from Google Research) released with the paper by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.

  15. (from Google Research) released with the paper by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.

  16. (from Microsoft Research AI4Science) released with the paper by Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu.

  17. (from Google AI) released with the paper by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby.

  18. (from Facebook) released with the paper by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.

  19. (from Facebook) released with the paper by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.

  20. (from Salesforce) released with the paper by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi.

  21. (from Salesforce) released with the paper by Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi.

  22. (from BigScience workshop) released by the .

  23. (from Alexa) released with the paper by Adrian de Wynter and Daniel J. Perry.

  24. (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper by Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan.

  25. (from NAVER CLOVA) released with the paper by Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park.

  26. (from Google Research) released with the paper by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.

  27. (from Inria/Facebook/Sorbonne) released with the paper by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.

  28. (from Google Research) released with the paper by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.

  29. (from OFA-Sys) released with the paper by An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou.

  30. (from LAION-AI) released with the paper by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.

  31. (from OpenAI) released with the paper by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.

  32. (from University of Göttingen) released with the paper by Timo Lüddecke and Alexander Ecker.

  33. (from Salesforce) released with the paper by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.

  34. (from MetaAI) released with the paper by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.

  35. (from Microsoft Research Asia) released with the paper by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

  36. (from YituTech) released with the paper by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.

  37. (from Facebook AI) released with the paper by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.

  38. (from Facebook AI) released with the paper by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.

  39. (from Tsinghua University) released with the paper by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.

  40. (from OpenBMB) released by the .

  41. (from Salesforce) released with the paper by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong and Richard Socher.

  42. (from Microsoft) released with the paper by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.

  43. (from Facebook) released with the paper by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.

  44. (from Microsoft) released with the paper by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.

  45. (from Microsoft) released with the paper by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.

  46. (from Berkeley/Facebook/Google) released with the paper by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.

  47. (from SenseTime Research) released with the paper by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

  48. (from Facebook) released with the paper by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.

  49. (from Google AI) released with the paper by Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun.

  50. (from The University of Texas at Austin) released with the paper by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl.

  51. (from Facebook) released with the paper by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.

  52. (from Microsoft Research) released with the paper by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.

  53. (from SHI Labs) released with the paper by Ali Hassani and Humphrey Shi.

  54. (from Meta AI) released with the paper by Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski.

  55. (from BOINC AI), released together with the paper by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into , RoBERTa into , Multilingual BERT into and a German version of DistilBERT.

  56. (from Microsoft Research) released with the paper by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.

  57. (from NAVER), released together with the paper by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.

  58. (from Facebook) released with the paper by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.

  59. (from Intel Labs) released with the paper by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.

  60. (from Snap Research) released with the paper by Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren.

  61. (from Google Brain) released with the paper by Mingxing Tan, Quoc V. Le.

  62. (from Google Research/Stanford University) released with the paper by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.

  63. (from Meta AI) released with the paper by Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.

  64. (from Google Research) released with the paper by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.

  65. (from Baidu) released with the paper by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.

  66. (from Baidu) released with the paper by Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang.

  67. (from Meta AI) are transformer protein language models. ESM-1b was released with the paper by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. ESM-1v was released with the paper by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. ESM-2 and ESMFold were released with the paper by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.

  68. (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.

  69. (from Google AI) released in the repository by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei

  70. (from Google AI) released in the repository by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei

  71. (from CNRS) released with the paper by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.

  72. (from Facebook AI) released with the paper by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.

  73. (from Google Research) released with the paper by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.

  74. (from Microsoft Research) released with the paper by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao.

  75. (from CMU/Google Brain) released with the paper by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.

  76. (from Microsoft Research) released with the paper by Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang.

  77. (from KAIST) released with the paper by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.

  78. (from OpenAI) released with the paper by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.

  79. (from EleutherAI) released in the repository by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.

  80. (from EleutherAI) released with the paper by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

  81. (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori.

  82. (from OpenAI) released with the paper by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodeiand Ilya Sutskever.

  83. (from EleutherAI) released in the repository by Ben Wang and Aran Komatsuzaki.

  84. (from AI-Sweden) released with the paper by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, Magnus Sahlgren.

  85. (from BigCode) released with the paper by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.

  86. released in the repository by Toshiyuki Sakamoto(tanreinama).

  87. (from Microsoft) released with the paper by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu.

  88. (from UCSD, NVIDIA) released with the paper by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.

  89. (from Allegro.pl, AGH University of Science and Technology) released with the paper by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik.

  90. (from Facebook) released with the paper by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.

  91. (from Berkeley) released with the paper by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.

  92. (from BOINC AI) released with the paper by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh.

  93. (from OpenAI) released with the paper by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.

  94. (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.

  95. (from Salesforce) released with the paper by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi.

  96. (from OpenAI) released with the paper by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.

  97. (from Microsoft Research Asia) released with the paper by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.

  98. (from Microsoft Research Asia) released with the paper by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.

  99. (from Microsoft Research Asia) released with the paper by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.

  100. (from Microsoft Research Asia) released with the paper by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.

  101. (from AllenAI) released with the paper by Iz Beltagy, Matthew E. Peters, Arman Cohan.

  102. (from Meta AI) released with the paper by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.

  103. (from South China University of Technology) released with the paper by Jiapeng Wang, Lianwen Jin, Kai Ding.

  104. (from The FAIR team of Meta AI) released with the paper by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.

  105. (from The FAIR team of Meta AI) released with the paper by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.

  106. (from AllenAI) released with the paper by Iz Beltagy, Matthew E. Peters, Arman Cohan.

  107. (from Google AI) released with the paper by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.

  108. (from Studio Ousia) released with the paper by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.

  109. (from UNC Chapel Hill) released with the paper by Hao Tan and Mohit Bansal.

  110. (from Facebook) released with the paper by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.

  111. (from Facebook) released with the paper by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.

  112. Machine translation models trained using data by Jörg Tiedemann. The is being developed by the Microsoft Translator Team.

  113. (from Microsoft Research Asia) released with the paper by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

  114. (from FAIR and UIUC) released with the paper by Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar.

  115. (from Meta and UIUC) released with the paper by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.

  116. (from Google AI) released with the paper by Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos.

  117. (from Facebook) released with the paper by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.

  118. (from Facebook) released with the paper by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.

  119. (from Meta/USC/CMU/SJTU) released with the paper by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.

  120. (from NVIDIA) released with the paper by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.

  121. (from NVIDIA) released with the paper by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.

  122. (from Alibaba Research) released with the paper by Peng Wang, Cheng Da, and Cong Yao.

  123. (from Mistral AI) by The team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.

  124. (from Studio Ousia) released with the paper by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.

  125. (from Facebook) released with the paper by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.

  126. (from CMU/Google Brain) released with the paper by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.

  127. (from Google Inc.) released with the paper by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.

  128. (from Google Inc.) released with the paper by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.

  129. (from Apple) released with the paper by Sachin Mehta and Mohammad Rastegari.

  130. (from Apple) released with the paper by Sachin Mehta and Mohammad Rastegari.

  131. (from Microsoft Research) released with the paper by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.

  132. (from MosaiML) released with the repository by the MosaicML NLP Team.

  133. (from the University of Wisconsin - Madison) released with the paper by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh.

  134. (from Google AI) released with the paper by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

  135. (from Meta) released with the paper by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.

  136. (from RUC AI Box) released with the paper by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.

  137. (from SHI Labs) released with the paper by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.

  138. (from Huawei Noah’s Ark Lab) released with the paper by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.

  139. (from Meta) released with the paper by the NLLB team.

  140. (from Meta) released with the paper by the NLLB team.

  141. (from Meta AI) released with the paper by Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic.

  142. (from the University of Wisconsin - Madison) released with the paper by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.

  143. (from SHI Labs) released with the paper by Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi.

  144. (from ) released in .

  145. (from Meta AI) released with the paper by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.

  146. (from Google AI) released with the paper by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.

  147. (from Google) released with the paper by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.

  148. (from Google) released with the paper by Jason Phang, Yao Zhao, and Peter J. Liu.

  149. (from Deepmind) released with the paper by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

  150. (from ADEPT) released in a by Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani.

  151. (from VinAI Research) released with the paper by Dat Quoc Nguyen and Anh Tuan Nguyen.

  152. (from Google) released with the paper by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.

  153. (from UCLA NLP) released with the paper by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.

  154. (from Sea AI Labs) released with the paper by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.

  155. released with the paper by Jongho Choi and Kyogu Lee.

  156. (from Microsoft Research) released with the paper by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.

  157. (from Nanjing University, The University of Hong Kong etc.) released with the paper by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao.

  158. (from NVIDIA) released with the paper by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.

  159. (from Facebook) released with the paper by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.

  160. (from Google Research) released with the paper by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.

  161. (from Google Research) released with the paper by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.

  162. (from META Platforms) released with the paper by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

  163. (from Google Research) released with the paper by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.

  164. (from Microsoft Research) released with the paper by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.

  165. (from Facebook), released together with the paper by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.

  166. (from Facebook) released with the paper by Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli.

  167. (from WeChatAI) released with the paper by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou.

  168. (from ZhuiyiTechnology), released together with the paper by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.

  169. (from Bo Peng), released on by Bo Peng.

  170. (from NVIDIA) released with the paper by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.

  171. (from Meta AI) released with the paper by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.

  172. (from ASAPP) released with the paper by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

  173. (from ASAPP) released with the paper by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

  174. (from Microsoft Research) released with the paper by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.

  175. (from Facebook), released together with the paper by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.

  176. (from Facebook), released together with the paper by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.

  177. (from Tel Aviv University), released together with the paper by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.

  178. (from Berkeley) released with the paper by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.

  179. (from MBZUAI) released with the paper by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan.

  180. (from Microsoft) released with the paper by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

  181. (from Microsoft) released with the paper by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

  182. (from University of Würzburg) released with the paper by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi, Radu Timofte.

  183. (from Google) released with the paper by William Fedus, Barret Zoph, Noam Shazeer.

  184. (from Google AI) released with the paper by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.

  185. (from Google AI) released in the repository by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.

  186. (from Microsoft Research) released with the paper by Brandon Smock, Rohith Pesala, Robin Abraham.

  187. (from Google AI) released with the paper by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.

  188. (from Microsoft Research) released with the paper by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.

  189. (from BOINC AI).

  190. (from Facebook) released with the paper by Gedas Bertasius, Heng Wang, Lorenzo Torresani.

  191. (from the University of California at Berkeley) released with the paper by Michael Janner, Qiyang Li, Sergey Levine

  192. (from Google/CMU) released with the paper by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.

  193. (from Microsoft), released together with the paper by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.

  194. (from UNC Chapel Hill) released with the paper by Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal.

  195. (from Google Research) released with the paper by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler

  196. (from Google Research) released with the paper by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.

  197. (from Microsoft Research) released with the paper by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.

  198. (from Microsoft Research) released with the paper by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.

  199. (from Peking University) released with the paper by Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun.

  200. (from Tsinghua University and Nankai University) released with the paper by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.

  201. (from Multimedia Computing Group, Nanjing University) released with the paper by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.

  202. (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper by Wonjae Kim, Bokyung Son, Ildoo Kim.

  203. (from Google AI) released with the paper by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.

  204. (from UCLA NLP) released with the paper by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.

  205. (from Google AI) released with the paper by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.

  206. (from Meta AI) released with the paper by Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He.

  207. (from Meta AI) released with the paper by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.

  208. (from HUST-VL) rreleased with the paper by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.

  209. (from Meta AI) released with the paper by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

  210. (from Kakao Enterprise) released with the paper by Jaehyeon Kim, Jungil Kong, Juhee Son.

  211. (from Google Research) released with the paper by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid.

  212. (from Facebook AI) released with the paper by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

  213. (from Facebook AI) released with the paper by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.

  214. (from Facebook AI) released with the paper by Qiantong Xu, Alexei Baevski, Michael Auli.

  215. (from Microsoft Research) released with the paper by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.

  216. (from OpenAI) released with the paper by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

  217. (from Microsoft Research) released with the paper by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling.

  218. (from Meta AI) released with the paper by Jonas Pfeiffer, Naman Goyal, Xi Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe.

  219. (From Facebook AI) released with the paper by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.

  220. (from Facebook) released together with the paper by Guillaume Lample and Alexis Conneau.

  221. (from Microsoft Research) released with the paper by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.

  222. (from Facebook AI), released together with the paper by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.

  223. (from Facebook AI), released together with the paper by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.

  224. (from Meta AI) released with the paper by Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa.

  225. (from Google/CMU) released with the paper by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.

  226. (from Facebook AI) released with the paper by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.

  227. (from Facebook AI) released with the paper by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.

  228. (from Huazhong University of Science & Technology) released with the paper by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.

  229. (from the University of Wisconsin - Madison) released with the paper by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.

Supported frameworks

The table below represents the current support in the library for each of those models, whether they have a Python tokenizer (called “slow”). A “fast” tokenizer backed by the 🌍 Tokenizers library, whether they have support in Jax (via Flax), PyTorch, and/or TensorFlow.

Model
PyTorch support
TensorFlow support
Flax Support

ALBERT

✅

✅

✅

ALIGN

✅

❌

❌

AltCLIP

✅

❌

❌

Audio Spectrogram Transformer

✅

❌

❌

Autoformer

✅

❌

❌

Bark

✅

❌

❌

BART

✅

✅

✅

BEiT

✅

❌

✅

BERT

✅

✅

✅

Bert Generation

✅

❌

❌

BigBird

✅

❌

✅

BigBird-Pegasus

✅

❌

❌

BioGpt

✅

❌

❌

BiT

✅

❌

❌

Blenderbot

✅

✅

✅

BlenderbotSmall

✅

✅

✅

BLIP

✅

✅

❌

BLIP-2

✅

❌

❌

BLOOM

✅

❌

✅

BridgeTower

✅

❌

❌

BROS

✅

❌

❌

CamemBERT

✅

✅

❌

CANINE

✅

❌

❌

Chinese-CLIP

✅

❌

❌

CLAP

✅

❌

❌

CLIP

✅

✅

✅

CLIPSeg

✅

❌

❌

CodeGen

✅

❌

❌

CodeLlama

✅

❌

❌

Conditional DETR

✅

❌

❌

ConvBERT

✅

✅

❌

ConvNeXT

✅

✅

❌

ConvNeXTV2

✅

❌

❌

CPM-Ant

✅

❌

❌

CTRL

✅

✅

❌

CvT

✅

✅

❌

Data2VecAudio

✅

❌

❌

Data2VecText

✅

❌

❌

Data2VecVision

✅

✅

❌

DeBERTa

✅

✅

❌

DeBERTa-v2

✅

✅

❌

Decision Transformer

✅

❌

❌

Deformable DETR

✅

❌

❌

DeiT

✅

✅

❌

DETA

✅

❌

❌

DETR

✅

❌

❌

DiNAT

✅

❌

❌

DINOv2

✅

❌

❌

DistilBERT

✅

✅

✅

DonutSwin

✅

❌

❌

DPR

✅

✅

❌

DPT

✅

❌

❌

EfficientFormer

✅

✅

❌

EfficientNet

✅

❌

❌

ELECTRA

✅

✅

✅

EnCodec

✅

❌

❌

Encoder decoder

✅

✅

✅

ERNIE

✅

❌

❌

ErnieM

✅

❌

❌

ESM

✅

✅

❌

FairSeq Machine-Translation

✅

❌

❌

Falcon

✅

❌

❌

FlauBERT

✅

✅

❌

FLAVA

✅

❌

❌

FNet

✅

❌

❌

FocalNet

✅

❌

❌

Funnel Transformer

✅

✅

❌

GIT

✅

❌

❌

GLPN

✅

❌

❌

GPT Neo

✅

❌

✅

GPT NeoX

✅

❌

❌

GPT NeoX Japanese

✅

❌

❌

GPT-J

✅

✅

✅

GPT-Sw3

✅

✅

✅

GPTBigCode

✅

❌

❌

GPTSAN-japanese

✅

❌

❌

Graphormer

✅

❌

❌

GroupViT

✅

✅

❌

Hubert

✅

✅

❌

I-BERT

✅

❌

❌

IDEFICS

✅

❌

❌

ImageGPT

✅

❌

❌

Informer

✅

❌

❌

InstructBLIP

✅

❌

❌

Jukebox

✅

❌

❌

LayoutLM

✅

✅

❌

LayoutLMv2

✅

❌

❌

LayoutLMv3

✅

✅

❌

LED

✅

✅

❌

LeViT

✅

❌

❌

LiLT

✅

❌

❌

LLaMA

✅

❌

❌

Longformer

✅

✅

❌

LongT5

✅

❌

✅

LUKE

✅

❌

❌

LXMERT

✅

✅

❌

M-CTC-T

✅

❌

❌

M2M100

✅

❌

❌

Marian

✅

✅

✅

MarkupLM

✅

❌

❌

Mask2Former

✅

❌

❌

MaskFormer

✅

❌

❌

mBART

✅

✅

✅

MEGA

✅

❌

❌

Megatron-BERT

✅

❌

❌

MGP-STR

✅

❌

❌

Mistral

✅

❌

❌

MobileBERT

✅

✅

❌

MobileNetV1

✅

❌

❌

MobileNetV2

✅

❌

❌

MobileViT

✅

✅

❌

MobileViTV2

✅

❌

❌

MPNet

✅

✅

❌

MPT

✅

❌

❌

MRA

✅

❌

❌

MT5

✅

✅

✅

MusicGen

✅

❌

❌

MVP

✅

❌

❌

NAT

✅

❌

❌

Nezha

✅

❌

❌

NLLB-MOE

✅

❌

❌

Nougat

✅

✅

✅

Nyströmformer

✅

❌

❌

OneFormer

✅

❌

❌

OpenAI GPT

✅

✅

❌

OpenAI GPT-2

✅

✅

✅

OpenLlama

✅

❌

❌

OPT

✅

✅

✅

OWL-ViT

✅

❌

❌

Pegasus

✅

✅

✅

PEGASUS-X

✅

❌

❌

Perceiver

✅

❌

❌

Persimmon

✅

❌

❌

Pix2Struct

✅

❌

❌

PLBart

✅

❌

❌

PoolFormer

✅

❌

❌

Pop2Piano

✅

❌

❌

ProphetNet

✅

❌

❌

PVT

✅

❌

❌

QDQBert

✅

❌

❌

RAG

✅

✅

❌

REALM

✅

❌

❌

Reformer

✅

❌

❌

RegNet

✅

✅

✅

RemBERT

✅

✅

❌

ResNet

✅

✅

✅

RetriBERT

✅

❌

❌

RoBERTa

✅

✅

✅

RoBERTa-PreLayerNorm

✅

✅

✅

RoCBert

✅

❌

❌

RoFormer

✅

✅

✅

RWKV

✅

❌

❌

SAM

✅

✅

❌

SegFormer

✅

✅

❌

SEW

✅

❌

❌

SEW-D

✅

❌

❌

Speech Encoder decoder

✅

❌

✅

Speech2Text

✅

✅

❌

Speech2Text2

❌

❌

❌

SpeechT5

✅

❌

❌

Splinter

✅

❌

❌

SqueezeBERT

✅

❌

❌

SwiftFormer

✅

❌

❌

Swin Transformer

✅

✅

❌

Swin Transformer V2

✅

❌

❌

Swin2SR

✅

❌

❌

SwitchTransformers

✅

❌

❌

T5

✅

✅

✅

Table Transformer

✅

❌

❌

TAPAS

✅

✅

❌

Time Series Transformer

✅

❌

❌

TimeSformer

✅

❌

❌

Trajectory Transformer

✅

❌

❌

Transformer-XL

✅

✅

❌

TrOCR

✅

❌

❌

TVLT

✅

❌

❌

UMT5

✅

❌

❌

UniSpeech

✅

❌

❌

UniSpeechSat

✅

❌

❌

UPerNet

✅

❌

❌

VAN

✅

❌

❌

VideoMAE

✅

❌

❌

ViLT

✅

❌

❌

Vision Encoder decoder

✅

✅

✅

VisionTextDualEncoder

✅

✅

✅

VisualBERT

✅

❌

❌

ViT

✅

✅

✅

ViT Hybrid

✅

❌

❌

VitDet

✅

❌

❌

ViTMAE

✅

✅

❌

ViTMatte

✅

❌

❌

ViTMSN

✅

❌

❌

VITS

✅

❌

❌

ViViT

✅

❌

❌

Wav2Vec2

✅

✅

✅

Wav2Vec2-Conformer

✅

❌

❌

WavLM

✅

❌

❌

Whisper

✅

✅

✅

X-CLIP

✅

❌

❌

X-MOD

✅

❌

❌

XGLM

✅

✅

✅

XLM

✅

✅

❌

XLM-ProphetNet

✅

❌

❌

XLM-RoBERTa

✅

✅

✅

XLM-RoBERTa-XL

✅

❌

❌

XLNet

✅

✅

❌

YOLOS

✅

❌

❌

YOSO

✅

❌

❌

🌍
PyTorch
TensorFlow
JAX
ALBERT
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
ALIGN
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
AltCLIP
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Audio Spectrogram Transformer
AST: Audio Spectrogram Transformer
Autoformer
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
Bark
suno-ai/bark
BART
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BARThez
BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
BARTpho
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
BEiT
BEiT: BERT Pre-Training of Image Transformers
BERT
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT For Sequence Generation
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
BERTweet
BERTweet: A pre-trained language model for English Tweets
BigBird-Pegasus
Big Bird: Transformers for Longer Sequences
BigBird-RoBERTa
Big Bird: Transformers for Longer Sequences
BioGpt
BioGPT: generative pre-trained transformer for biomedical text generation and mining
BiT
Big Transfer (BiT): General Visual Representation Learning
Blenderbot
Recipes for building an open-domain chatbot
BlenderbotSmall
Recipes for building an open-domain chatbot
BLIP
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BLIP-2
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLOOM
BigScience Workshop
BORT
Optimal Subarchitecture Extraction For BERT
BridgeTower
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
BROS
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
ByT5
ByT5: Towards a token-free future with pre-trained byte-to-byte models
CamemBERT
CamemBERT: a Tasty French Language Model
CANINE
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
Chinese-CLIP
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
CLAP
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
CLIP
Learning Transferable Visual Models From Natural Language Supervision
CLIPSeg
Image Segmentation Using Text and Image Prompts
CodeGen
A Conversational Paradigm for Program Synthesis
CodeLlama
Code Llama: Open Foundation Models for Code
Conditional DETR
Conditional DETR for Fast Training Convergence
ConvBERT
ConvBERT: Improving BERT with Span-based Dynamic Convolution
ConvNeXT
A ConvNet for the 2020s
ConvNeXTV2
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
CPM
CPM: A Large-scale Generative Chinese Pre-trained Language Model
CPM-Ant
OpenBMB
CTRL
CTRL: A Conditional Transformer Language Model for Controllable Generation
CvT
CvT: Introducing Convolutions to Vision Transformers
Data2Vec
Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
DeBERTa
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa-v2
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Decision Transformer
Decision Transformer: Reinforcement Learning via Sequence Modeling
Deformable DETR
Deformable DETR: Deformable Transformers for End-to-End Object Detection
DeiT
Training data-efficient image transformers & distillation through attention
DePlot
DePlot: One-shot visual language reasoning by plot-to-table translation
DETA
NMS Strikes Back
DETR
End-to-End Object Detection with Transformers
DialoGPT
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
DiNAT
Dilated Neighborhood Attention Transformer
DINOv2
DINOv2: Learning Robust Visual Features without Supervision
DistilBERT
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
DistilGPT2
DistilRoBERTa
DistilmBERT
DiT
DiT: Self-supervised Pre-training for Document Image Transformer
Donut
OCR-free Document Understanding Transformer
DPR
Dense Passage Retrieval for Open-Domain Question Answering
DPT
Vision Transformers for Dense Prediction
EfficientFormer
EfficientFormer: Vision Transformers at MobileNetSpeed
EfficientNet
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
ELECTRA
ELECTRA: Pre-training text encoders as discriminators rather than generators
EnCodec
High Fidelity Neural Audio Compression
EncoderDecoder
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
ERNIE
ERNIE: Enhanced Representation through Knowledge Integration
ErnieM
ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora
ESM
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
Language models enable zero-shot prediction of the effects of mutations on protein function
Language models of protein sequences at the scale of evolution enable accurate structure prediction
Falcon
FLAN-T5
google-research/t5x
FLAN-UL2
google-research/t5x
FlauBERT
FlauBERT: Unsupervised Language Model Pre-training for French
FLAVA
FLAVA: A Foundational Language And Vision Alignment Model
FNet
FNet: Mixing Tokens with Fourier Transforms
FocalNet
Focal Modulation Networks
Funnel Transformer
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
GIT
GIT: A Generative Image-to-text Transformer for Vision and Language
GLPN
Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
GPT
Improving Language Understanding by Generative Pre-Training
GPT Neo
EleutherAI/gpt-neo
GPT NeoX
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT NeoX Japanese
GPT-2
Language Models are Unsupervised Multitask Learners
GPT-J
kingoflolz/mesh-transformer-jax
GPT-Sw3
Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish
GPTBigCode
SantaCoder: don’t reach for the stars!
GPTSAN-japanese
tanreinama/GPTSAN
Graphormer
Do Transformers Really Perform Bad for Graph Representation?
GroupViT
GroupViT: Semantic Segmentation Emerges from Text Supervision
HerBERT
KLEJ: Comprehensive Benchmark for Polish Language Understanding
Hubert
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
I-BERT
I-BERT: Integer-only BERT Quantization
IDEFICS
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
ImageGPT
Generative Pretraining from Pixels
Informer
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
InstructBLIP
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Jukebox
Jukebox: A Generative Model for Music
LayoutLM
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
LayoutLMv2
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
LayoutLMv3
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
LayoutXLM
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
LED
Longformer: The Long-Document Transformer
LeViT
LeViT: A Vision Transformer in ConvNet’s Clothing for Faster Inference
LiLT
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LLaMA
LLaMA: Open and Efficient Foundation Language Models
Llama2
Llama2: Open Foundation and Fine-Tuned Chat Models
Longformer
Longformer: The Long-Document Transformer
LongT5
LongT5: Efficient Text-To-Text Transformer for Long Sequences
LUKE
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention
LXMERT
LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering
M-CTC-T
Pseudo-Labeling For Massively Multilingual Speech Recognition
M2M100
Beyond English-Centric Multilingual Machine Translation
MarianMT
OPUS
Marian Framework
MarkupLM
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Mask2Former
Masked-attention Mask Transformer for Universal Image Segmentation
MaskFormer
Per-Pixel Classification is Not All You Need for Semantic Segmentation
MatCha
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
mBART
Multilingual Denoising Pre-training for Neural Machine Translation
mBART-50
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
MEGA
Mega: Moving Average Equipped Gated Attention
Megatron-BERT
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Megatron-GPT2
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
MGP-STR
Multi-Granularity Prediction for Scene Text Recognition
Mistral
Mistral AI
mLUKE
mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
MMS
Scaling Speech Technology to 1,000+ Languages
MobileBERT
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileNetV1
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
MobileNetV2
MobileNetV2: Inverted Residuals and Linear Bottlenecks
MobileViT
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
MobileViTV2
Separable Self-attention for Mobile Vision Transformers
MPNet
MPNet: Masked and Permuted Pre-training for Language Understanding
MPT
llm-foundry
MRA
Multi Resolution Analysis (MRA) for Approximate Self-Attention
MT5
mT5: A massively multilingual pre-trained text-to-text transformer
MusicGen
Simple and Controllable Music Generation
MVP
MVP: Multi-task Supervised Pre-training for Natural Language Generation
NAT
Neighborhood Attention Transformer
Nezha
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
NLLB
No Language Left Behind: Scaling Human-Centered Machine Translation
NLLB-MOE
No Language Left Behind: Scaling Human-Centered Machine Translation
Nougat
Nougat: Neural Optical Understanding for Academic Documents
Nyströmformer
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
OneFormer
OneFormer: One Transformer to Rule Universal Image Segmentation
OpenLlama
s-JoL
Open-Llama
OPT
OPT: Open Pre-trained Transformer Language Models
OWL-ViT
Simple Open-Vocabulary Object Detection with Vision Transformers
Pegasus
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
PEGASUS-X
Investigating Efficiently Extending Transformers for Long Input Summarization
Perceiver IO
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Persimmon
blog post
PhoBERT
PhoBERT: Pre-trained language models for Vietnamese
Pix2Struct
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
PLBart
Unified Pre-training for Program Understanding and Generation
PoolFormer
MetaFormer is Actually What You Need for Vision
Pop2Piano
Pop2Piano : Pop Audio-based Piano Cover Generation
ProphetNet
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
PVT
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
QDQBert
Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
RAG
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
REALM
REALM: Retrieval-Augmented Language Model Pre-Training
Reformer
Reformer: The Efficient Transformer
RegNet
Designing Network Design Space
RemBERT
Rethinking embedding coupling in pre-trained language models
ResNet
Deep Residual Learning for Image Recognition
RoBERTa
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa-PreLayerNorm
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
RoCBert
RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining
RoFormer
RoFormer: Enhanced Transformer with Rotary Position Embedding
RWKV
this repo
SegFormer
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Segment Anything
Segment Anything
SEW
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
SEW-D
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
SpeechT5
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
SpeechToTextTransformer
fairseq S2T: Fast Speech-to-Text Modeling with fairseq
SpeechToTextTransformer2
Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Splinter
Few-Shot Question Answering by Pretraining Span Selection
SqueezeBERT
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
SwiftFormer
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Swin Transformer
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer V2
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin2SR
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration
SwitchTransformers
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
T5
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5v1.1
google-research/text-to-text-transfer-transformer
Table Transformer
PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents
TAPAS
TAPAS: Weakly Supervised Table Parsing via Pre-training
TAPEX
TAPEX: Table Pre-training via Learning a Neural SQL Executor
Time Series Transformer
TimeSformer
Is Space-Time Attention All You Need for Video Understanding?
Trajectory Transformer
Offline Reinforcement Learning as One Big Sequence Modeling Problem
Transformer-XL
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
TrOCR
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
TVLT
TVLT: Textless Vision-Language Transformer
UL2
Unifying Language Learning Paradigms
UMT5
UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining
UniSpeech
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
UniSpeechSat
UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING
UPerNet
Unified Perceptual Parsing for Scene Understanding
VAN
Visual Attention Network
VideoMAE
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
ViLT
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Vision Transformer (ViT)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
VisualBERT
VisualBERT: A Simple and Performant Baseline for Vision and Language
ViT Hybrid
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
VitDet
Exploring Plain Vision Transformer Backbones for Object Detection
ViTMAE
Masked Autoencoders Are Scalable Vision Learners
ViTMatte
ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
ViTMSN
Masked Siamese Networks for Label-Efficient Learning
VITS
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ViViT
ViViT: A Video Vision Transformer
Wav2Vec2
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Wav2Vec2-Conformer
FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ
Wav2Vec2Phoneme
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
WavLM
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Whisper
Robust Speech Recognition via Large-Scale Weak Supervision
X-CLIP
Expanding Language-Image Pretrained Models for General Video Recognition
X-MOD
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
XGLM
Few-shot Learning with Multilingual Language Models
XLM
Cross-lingual Language Model Pretraining
XLM-ProphetNet
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
XLM-RoBERTa
Unsupervised Cross-lingual Representation Learning at Scale
XLM-RoBERTa-XL
Larger-Scale Transformers for Multilingual Masked Language Modeling
XLM-V
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
XLNet
​XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLS-R
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
XLSR-Wav2Vec2
Unsupervised Cross-Lingual Representation Learning For Speech Recognition
YOLOS
You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
YOSO
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling