Prompt tuning for causal language modeling
Prompting helps guide language model behavior by adding some input text specific to a task. Prompt tuning is an additive method for only training and updating the newly added prompt tokens to a pretrained model. This way, you can use one pretrained model whose weights are frozen, and train and update a smaller set of prompt parameters for each downstream task instead of fully finetuning a separate model. As models grow larger and larger, prompt tuning can be more efficient, and results are even better as model parameters scale.
π‘ Read The Power of Scale for Parameter-Efficient Prompt Tuning to learn more about prompt tuning.
This guide will show you how to apply prompt tuning to train a bloomz-560m model on the twitter_complaints subset of the RAFT dataset.
Before you begin, make sure you have all the necessary libraries installed:
Copied
!pip install -q peft transformers datasetsSetup
Start by defining the model and tokenizer, the dataset and the dataset columns to train on, some training hyperparameters, and the PromptTuningConfig. The PromptTuningConfig contains information about the task type, the text to initialize the prompt embedding, the number of virtual tokens, and the tokenizer to use:
Copied
from transformers import AutoModelForCausalLM, AutoTokenizer, default_data_collator, get_linear_schedule_with_warmup
from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType
import torch
from datasets import load_dataset
import os
from torch.utils.data import DataLoader
from tqdm import tqdm
device = "cuda"
model_name_or_path = "bigscience/bloomz-560m"
tokenizer_name_or_path = "bigscience/bloomz-560m"
peft_config = PromptTuningConfig(
task_type=TaskType.CAUSAL_LM,
prompt_tuning_init=PromptTuningInit.TEXT,
num_virtual_tokens=8,
prompt_tuning_init_text="Classify if the tweet is a complaint or not:",
tokenizer_name_or_path=model_name_or_path,
)
dataset_name = "twitter_complaints"
checkpoint_name = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace(
"/", "_"
)
text_column = "Tweet text"
label_column = "text_label"
max_length = 64
lr = 3e-2
num_epochs = 50
batch_size = 8Load dataset
For this guide, youβll load the twitter_complaints subset of the RAFT dataset. This subset contains tweets that are labeled either complaint or no complaint:
Copied
To make the Label column more readable, replace the Label value with the corresponding label text and store them in a text_label column. You can use the map function to apply this change over the entire dataset in one step:
Copied
Preprocess dataset
Next, youβll setup a tokenizer; configure the appropriate padding token to use for padding sequences, and determine the maximum length of the tokenized labels:
Copied
Create a preprocess_function to:
Tokenize the input text and labels.
For each example in a batch, pad the labels with the tokenizers
pad_token_id.Concatenate the input text and labels into the
model_inputs.Create a separate attention mask for
labelsandmodel_inputs.Loop through each example in the batch again to pad the input ids, labels, and attention mask to the
max_lengthand convert them to PyTorch tensors.
Copied
Use the map function to apply the preprocess_function to the entire dataset. You can remove the unprocessed columns since the model wonβt need them:
Copied
Create a DataLoader from the train and eval datasets. Set pin_memory=True to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU.
Copied
Train
Youβre almost ready to setup your model and start training!
Initialize a base model from AutoModelForCausalLM, and pass it and peft_config to the get_peft_model() function to create a PeftModel. You can print the new PeftModelβs trainable parameters to see how much more efficient it is than training the full parameters of the original model!
Copied
Setup an optimizer and learning rate scheduler:
Copied
Move the model to the GPU, then write a training loop to start training!
Copied
Share model
You can store and share your model on the Hub if youβd like. Log in to your BOINC AI account and enter your token when prompted:
Copied
Use the push_to_hub function to upload your model to a model repository on the Hub:
Copied
Once the model is uploaded, youβll see the model file size is only 33.5kB! π€
Inference
Letβs try the model on a sample input for inference. If you look at the repository you uploaded the model to, youβll see a adapter_config.json file. Load this file into PeftConfig to specify the peft_type and task_type. Then you can load the prompt tuned model weights, and the configuration into from_pretrained() to create the PeftModel:
Copied
Grab a tweet and tokenize it:
Copied
Put the model on a GPU and generate the predicted label:
Copied
Last updated