🌍Transformers.js

Transformers.js

State-of-the-art Machine Learning for the web. Run 🌍 Transformers directly in your browser, with no need for a server!

Transformers.js is designed to be functionally equivalent to BOINC AI’s transformersarrow-up-right python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:

  • 📝 Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.

  • 🖼️ Computer Vision: image classification, object detection, and segmentation.

  • 🗣️ Audio: automatic speech recognition and audio classification.

  • 🐙 Multimodal: zero-shot image classification.

Transformers.js uses ONNX Runtimearrow-up-right to run models in the browser. The best part about it, is that you can easily convertarrow-up-right your pretrained PyTorch, TensorFlow, or JAX models to ONNX using 🌍Optimumarrow-up-right.

For more information, check out the full documentationarrow-up-right.

Quick tour

It’s super simple to translate from existing code! Just like the python library, we support the pipeline API. Pipelines group together a pretrained model with preprocessing of inputs and postprocessing of outputs, making it the easiest way to run models with the library.

Python (original)
Javascript (ours)

Copied

Copied

You can also use a different model by specifying the model id or path as the second argument to the pipeline function. For example:

Copied

// Use a different model for sentiment-analysis
let pipe = await pipeline('sentiment-analysis', 'nlptown/bert-base-multilingual-uncased-sentiment');

Contents

The documentation is organized into 4 sections:

  1. GET STARTED provides a quick tour of the library and installation instructions to get up and running.

  2. TUTORIALS are a great place to start if you’re a beginner! We also include sample applications for you to play around with!

  3. DEVELOPER GUIDES show you how to use the library to achieve a specific goal.

  4. API REFERENCE describes all classes and functions, as well as their available parameters and types.

Supported tasks/models

Here is the list of all tasks and architectures currently supported by Transformers.js. If you don’t see your task/model listed here or it is not yet supported, feel free to open up a feature request herearrow-up-right.

To find compatible models on the Hub, select the “transformers.js” library tag in the filter menu (or visit this linkarrow-up-right). You can refine your search by selecting the task you’re interested in (e.g., text-classificationarrow-up-right).

Tasks

Natural Language Processing

Task
ID
Description
Supported?

conversational

Generating conversational text that is relevant, coherent and knowledgable given a prompt.

fill-mask

Masking some of the words in a sentence and predicting which words should replace those masks.

question-answering

Retrieve the answer to a question from a given text.

sentence-similarity

Determining how similar two texts are.

summarization

Producing a shorter version of a document while preserving its important information.

table-question-answering

Answering a question about information from a given table.

text-classification or sentiment-analysis

Assigning a label or class to a given text.

text-generation

Producing new text by predicting the next word in a sequence.

text2text-generation

Converting one text sequence into another text sequence.

token-classification or ner

Assigning a label to each token in a text.

translation

Converting text from one language to another.

zero-shot-classification

Classifying text into classes that are unseen during training.

Vision

Task
ID
Description
Supported?

depth-estimation

Predicting the depth of objects present in an image.

image-classification

Assigning a label or class to an entire image.

image-segmentation

Divides an image into segments where each pixel is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation.

image-to-image

Transforming a source image to match the characteristics of a target image or a target image domain.

mask-generation

Generate masks for the objects in an image.

object-detection

Identify objects of certain defined classes within an image.

n/a

Assigning a label or class to an entire video.

n/a

Generating images with no condition in any context (like a prompt text or another image).

Audio

Task
ID
Description
Supported?

audio-classification

Assigning a label or class to a given audio.

n/a

Generating audio from an input audio source.

automatic-speech-recognition

Transcribing a given audio into text.

text-to-speech or text-to-audio

Generating natural-sounding speech given text input.

Tabular

Task
ID
Description
Supported?

n/a

Classifying a target category (a group) based on set of attributes.

n/a

Predicting a numerical value given a set of attributes.

Multimodal

Task
ID
Description
Supported?

document-question-answering

Answering questions on document images.

feature-extraction

Transforming raw data into numerical features that can be processed while preserving the information in the original dataset.

image-to-text

Output text from a given image.

text-to-image

Generates images from input text.

visual-question-answering

Answering open-ended questions based on an image.

zero-shot-image-classification

Classifying images into classes that are unseen during training.

Reinforcement Learning

Task
ID
Description
Supported?

n/a

Learning from actions by interacting with an environment through trial and error and receiving rewards (negative or positive) as feedback.

Models

  1. ALBERTarrow-up-right (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representationsarrow-up-right, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.

  2. BARTarrow-up-right (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehensionarrow-up-right by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.

  3. BEiTarrow-up-right (from Microsoft) released with the paper BEiT: BERT Pre-Training of Image Transformersarrow-up-right by Hangbo Bao, Li Dong, Furu Wei.

  4. BERTarrow-up-right (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understandingarrow-up-right by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.

  5. Blenderbotarrow-up-right (from Facebook) released with the paper Recipes for building an open-domain chatbotarrow-up-right by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.

  6. BlenderbotSmallarrow-up-right (from Facebook) released with the paper Recipes for building an open-domain chatbotarrow-up-right by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.

  7. BLOOMarrow-up-right (from BigScience workshop) released by the BigScience Workshoparrow-up-right.

  8. CamemBERTarrow-up-right (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Modelarrow-up-right by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.

  9. CLIParrow-up-right (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervisionarrow-up-right by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.

  10. CodeGenarrow-up-right (from Salesforce) released with the paper A Conversational Paradigm for Program Synthesisarrow-up-right by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.

  11. CodeLlamaarrow-up-right (from MetaAI) released with the paper Code Llama: Open Foundation Models for Codearrow-up-right by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.

  12. DeBERTaarrow-up-right (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attentionarrow-up-right by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.

  13. DeBERTa-v2arrow-up-right (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attentionarrow-up-right by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.

  14. DeiTarrow-up-right (from Facebook) released with the paper Training data-efficient image transformers & distillation through attentionarrow-up-right by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.

  15. DETRarrow-up-right (from Facebook) released with the paper End-to-End Object Detection with Transformersarrow-up-right by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.

  16. DistilBERTarrow-up-right (from BOINC AI), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterarrow-up-right by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2arrow-up-right, RoBERTa into DistilRoBERTaarrow-up-right, Multilingual BERT into DistilmBERTarrow-up-right and a German version of DistilBERT.

  17. Donutarrow-up-right (from NAVER), released together with the paper OCR-free Document Understanding Transformerarrow-up-right by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.

  18. FLAN-T5arrow-up-right (from Google AI) released in the repository google-research/t5xarrow-up-right by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei

  19. GPT Neoarrow-up-right (from EleutherAI) released in the repository EleutherAI/gpt-neoarrow-up-right by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.

  20. GPT NeoXarrow-up-right (from EleutherAI) released with the paper GPT-NeoX-20B: An Open-Source Autoregressive Language Modelarrow-up-right by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

  21. GPT-2arrow-up-right (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learnersarrow-up-right by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodeiand Ilya Sutskever.

  22. GPT-Jarrow-up-right (from EleutherAI) released in the repository kingoflolz/mesh-transformer-jaxarrow-up-right by Ben Wang and Aran Komatsuzaki.

  23. GPTBigCodearrow-up-right (from BigCode) released with the paper SantaCoder: don’t reach for the stars!arrow-up-right by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.

  24. HerBERTarrow-up-right (from Allegro.pl, AGH University of Science and Technology) released with the paper KLEJ: Comprehensive Benchmark for Polish Language Understandingarrow-up-right by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik.

  25. LongT5arrow-up-right (from Google AI) released with the paper LongT5: Efficient Text-To-Text Transformer for Long Sequencesarrow-up-right by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.

  26. LLaMAarrow-up-right (from The FAIR team of Meta AI) released with the paper LLaMA: Open and Efficient Foundation Language Modelsarrow-up-right by Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample.

  27. Llama2arrow-up-right (from The FAIR team of Meta AI) released with the paper Llama2: Open Foundation and Fine-Tuned Chat Modelsarrow-up-right by Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom.

  28. M2M100arrow-up-right (from Facebook) released with the paper Beyond English-Centric Multilingual Machine Translationarrow-up-right by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.

  29. MarianMTarrow-up-right Machine translation models trained using OPUSarrow-up-right data by Jörg Tiedemann. The Marian Frameworkarrow-up-right is being developed by the Microsoft Translator Team.

  30. mBARTarrow-up-right (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translationarrow-up-right by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.

  31. mBART-50arrow-up-right (from Facebook) released with the paper Multilingual Translation with Extensible Multilingual Pretraining and Finetuningarrow-up-right by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.

  32. MMSarrow-up-right (from Facebook) released with the paper Scaling Speech Technology to 1,000+ Languagesarrow-up-right by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.

  33. MobileBERTarrow-up-right (from CMU/Google Brain) released with the paper MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devicesarrow-up-right by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.

  34. MobileViTarrow-up-right (from Apple) released with the paper MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformerarrow-up-right by Sachin Mehta and Mohammad Rastegari.

  35. MPNetarrow-up-right (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understandingarrow-up-right by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.

  36. MPTarrow-up-right (from MosaiML) released with the repository llm-foundryarrow-up-right by the MosaicML NLP Team.

  37. MT5arrow-up-right (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformerarrow-up-right by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.

  38. OPTarrow-up-right (from Meta AI) released with the paper OPT: Open Pre-trained Transformer Language Modelsarrow-up-right by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.

  39. ResNetarrow-up-right (from Microsoft Research) released with the paper Deep Residual Learning for Image Recognitionarrow-up-right by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.

  40. RoBERTaarrow-up-right (from Facebook), released together with the paper RoBERTa: A Robustly Optimized BERT Pretraining Approacharrow-up-right by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.

  41. SpeechT5arrow-up-right (from Microsoft Research) released with the paper SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processingarrow-up-right by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.

  42. SqueezeBERTarrow-up-right (from Berkeley) released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks?arrow-up-right by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.

  43. Swin Transformerarrow-up-right (from Microsoft) released with the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windowsarrow-up-right by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

  44. T5arrow-up-right (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformerarrow-up-right by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.

  45. T5v1.1arrow-up-right (from Google AI) released in the repository google-research/text-to-text-transfer-transformerarrow-up-right by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.

  46. Vision Transformer (ViT)arrow-up-right (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scalearrow-up-right by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.

  47. Wav2Vec2arrow-up-right (from Facebook AI) released with the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representationsarrow-up-right by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.

  48. WavLMarrow-up-right (from Microsoft Research) released with the paper WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processingarrow-up-right by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.

  49. Whisperarrow-up-right (from OpenAI) released with the paper Robust Speech Recognition via Large-Scale Weak Supervisionarrow-up-right by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

  50. XLMarrow-up-right (from Facebook) released together with the paper Cross-lingual Language Model Pretrainingarrow-up-right by Guillaume Lample and Alexis Conneau.

  51. XLM-RoBERTaarrow-up-right (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scalearrow-up-right by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.

  52. YOLOSarrow-up-right (from Huazhong University of Science & Technology) released with the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detectionarrow-up-right by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.

Last updated