P-tuning for sequence classification
It is challenging to finetune large language models for downstream tasks because they have so many parameters. To work around this, you can use prompts to steer the model toward a particular downstream task without fully finetuning a model. Typically, these prompts are handcrafted, which may be impractical because you need very large validation sets to find the best prompts. P-tuning is a method for automatically searching and optimizing for better prompts in a continuous space.
π‘ Read GPT Understands, Too to learn more about p-tuning.
This guide will show you how to train a roberta-large model (but you can also use any of the GPT, OPT, or BLOOM models) with p-tuning on the mrpc configuration of the GLUE benchmark.
Before you begin, make sure you have all the necessary libraries installed:
Copied
!pip install -q peft transformers datasets evaluateSetup
To get started, import π Transformers to create the base model, π Datasets to load a dataset, π Evaluate to load an evaluation metric, and π PEFT to create a PeftModel and setup the configuration for p-tuning.
Define the model, dataset, and some basic training hyperparameters:
Copied
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
DataCollatorWithPadding,
TrainingArguments,
Trainer,
)
from peft import (
get_peft_config,
get_peft_model,
get_peft_model_state_dict,
set_peft_model_state_dict,
PeftType,
PromptEncoderConfig,
)
from datasets import load_dataset
import evaluate
import torch
model_name_or_path = "roberta-large"
task = "mrpc"
num_epochs = 20
lr = 1e-3
batch_size = 32Load dataset and metric
Next, load the mrpc configuration - a corpus of sentence pairs labeled according to whether theyβre semantically equivalent or not - from the GLUE benchmark:
Copied
From π Evaluate, load a metric for evaluating the modelβs performance. The evaluation module returns the accuracy and F1 scores associated with this specific task.
Copied
Now you can use the metric to write a function that computes the accuracy and F1 scores. The compute_metric function calculates the scores from the model predictions and labels:
Copied
Preprocess dataset
Initialize the tokenizer and configure the padding token to use. If youβre using a GPT, OPT, or BLOOM model, you should set the padding_side to the left; otherwise itβll be set to the right. Tokenize the sentence pairs and truncate them to the maximum length.
Copied
Use map to apply the tokenize_function to the dataset, and remove the unprocessed columns because the model wonβt need those. You should also rename the label column to labels because that is the expected name for the labels by models in the π Transformers library.
Copied
Create a collator function with DataCollatorWithPadding to pad the examples in the batches to the longest sequence in the batch:
Copied
Train
P-tuning uses a prompt encoder to optimize the prompt parameters, so youβll need to initialize the PromptEncoderConfig with several arguments:
task_type: the type of task youβre training on, in this case it is sequence classification orSEQ_CLSnum_virtual_tokens: the number of virtual tokens to use, or in other words, the promptencoder_hidden_size: the hidden size of the encoder used to optimize the prompt parameters
Copied
Create the base roberta-large model from AutoModelForSequenceClassification, and then wrap the base model and peft_config with get_peft_model() to create a PeftModel. If youβre curious to see how many parameters youβre actually training compared to training on all the model parameters, you can print it out with print_trainable_parameters():
Copied
From the π Transformers library, set up the TrainingArguments class with where you want to save the model to, the training hyperparameters, how to evaluate the model, and when to save the checkpoints:
Copied
Then pass the model, TrainingArguments, datasets, tokenizer, data collator, and evaluation function to the Trainer class, whichβll handle the entire training loop for you. Once youβre ready, call train to start training!
Copied
Share model
You can store and share your model on the Hub if youβd like. Log in to your BOINC AI account and enter your token when prompted:
Copied
Upload the model to a specifc model repository on the Hub with the push_to_hub function:
Copied
Inference
Once the model has been uploaded to the Hub, anyone can easily use it for inference. Load the configuration and model:
Copied
Get some text and tokenize it:
Copied
Pass the inputs to the model to classify the sentences:
Copied
Last updated