GPT-J

Overview

The GPT-J model was released in the kingoflolz/mesh-transformer-jax repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like causal language model trained on the Pile dataset.

This model was contributed by Stella Biderman.

Tips:

To load GPT-J in float32 one would need at least 2x model size RAM: 1x for initial weights and another 1x to load the checkpoint. So for GPT-J it would take at least 48GB RAM to just load the model. To reduce the RAM usage there are a few options. The torch_dtype argument can be used to initialize the model in half-precision on a CUDA device only. There is also a fp16 branch which stores the fp16 weights, which could be used to further minimize the RAM usage:

Copied

>>> from transformers import GPTJForCausalLM
>>> import torch

>>> device = "cuda"
>>> model = GPTJForCausalLM.from_pretrained(
...     "EleutherAI/gpt-j-6B",
...     revision="float16",
...     torch_dtype=torch.float16,
... ).to(device)

The model should fit on 16GB GPU for inference. For training/fine-tuning it would take much more GPU RAM. Adam optimizer for example makes four copies of the model: model, gradients, average and squared average of the gradients. So it would need at least 4x model size GPU memory, even with mixed precision as gradient updates are in fp32. This is not including the activations and data batches, which would again require some more GPU RAM. So one should explore solutions such as DeepSpeed, to train/fine-tune the model. Another option is to use the original codebase to train/fine-tune the model on TPU and then convert the model to Transformers format for inference. Instructions for that could be found here
Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer. These extra tokens are added for the sake of efficiency on TPUs. To avoid the mismatch between embedding matrix size and vocab size, the tokenizer for GPT-J contains 143 extra tokens <|extratoken_1|>... <|extratoken_143|>, so the vocab_size of tokenizer also becomes 50400.

Generation

The generate() method can be used to generate text using GPT-J model.

Copied

>>> from transformers import AutoModelForCausalLM, AutoTokenizer

>>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

>>> prompt = (
...     "In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
...     "previously unexplored valley, in the Andes Mountains. Even more surprising to the "
...     "researchers was the fact that the unicorns spoke perfect English."
... )

>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids

>>> gen_tokens = model.generate(
...     input_ids,
...     do_sample=True,
...     temperature=0.9,
...     max_length=100,
... )
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]

…or in float16 precision:

Copied

>>> from transformers import GPTJForCausalLM, AutoTokenizer
>>> import torch

>>> device = "cuda"
>>> model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16).to(device)
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

>>> prompt = (
...     "In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
...     "previously unexplored valley, in the Andes Mountains. Even more surprising to the "
...     "researchers was the fact that the unicorns spoke perfect English."
... )

>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

>>> gen_tokens = model.generate(
...     input_ids,
...     do_sample=True,
...     temperature=0.9,
...     max_length=100,
... )
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]

Resources

A list of official BOINC AI and community (indicated by 🌎) resources to help you get started with GPT-J. If you’re interested in submitting a resource to be included here, please feel free to open a Pull Request and we’ll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

Text Generation

Description of GPT-J.
A blog on how to Deploy GPT-J 6B for inference using BOINC AI Transformers and Amazon SageMaker.
A blog on how to Accelerate GPT-J inference with DeepSpeed-Inference on GPUs.
A blog post introducing GPT-J-6B: 6B JAX-Based Transformer. 🌎
A notebook for GPT-J-6B Inference Demo. 🌎
Another notebook demonstrating Inference with GPT-J-6B.
Causal language modeling chapter of the 🌎 BOINC AI Course.
GPTJForCausalLM is supported by this causal language modeling example script, text generation example script, and notebook.
TFGPTJForCausalLM is supported by this causal language modeling example script and notebook.
FlaxGPTJForCausalLM is supported by this causal language modeling example script and notebook.

Documentation resources

hashtagGPT-J

hashtagOverview

hashtagGeneration

hashtagResources

hashtagGPTJConfig

hashtagclass transformers.GPTJConfig

hashtagGPTJModel

hashtagclass transformers.GPTJModel

hashtagGPTJForCausalLM

hashtagclass transformers.GPTJForCausalLM

hashtagGPTJForSequenceClassification

hashtagclass transformers.GPTJForSequenceClassification

hashtagGPTJForQuestionAnswering

hashtagclass transformers.GPTJForQuestionAnswering

hashtagTFGPTJModel

hashtagclass transformers.TFGPTJModel

hashtagTFGPTJForCausalLM

hashtagclass transformers.TFGPTJForCausalLM

hashtagTFGPTJForSequenceClassification

hashtagclass transformers.TFGPTJForSequenceClassification

hashtagTFGPTJForQuestionAnswering

hashtagclass transformers.TFGPTJForQuestionAnswering

hashtagFlaxGPTJModel

hashtagclass transformers.FlaxGPTJModel

hashtagFlaxGPTJForCausalLM

hashtagclass transformers.FlaxGPTJForCausalLM

GPT-J

Overview

Generation

Resources

GPTJConfig

class transformers.GPTJConfig

GPTJModel

class transformers.GPTJModel

GPTJForCausalLM

class transformers.GPTJForCausalLM

GPTJForSequenceClassification

class transformers.GPTJForSequenceClassification

GPTJForQuestionAnswering

class transformers.GPTJForQuestionAnswering

TFGPTJModel

class transformers.TFGPTJModel

TFGPTJForCausalLM

class transformers.TFGPTJForCausalLM

TFGPTJForSequenceClassification

class transformers.TFGPTJForSequenceClassification

TFGPTJForQuestionAnswering

class transformers.TFGPTJForQuestionAnswering

FlaxGPTJModel

class transformers.FlaxGPTJModel

FlaxGPTJForCausalLM

class transformers.FlaxGPTJForCausalLM