PEFT
  • 🌍GET STARTED
    • BOINC AI PEFT
    • Quicktour
    • Installation
  • 🌍TASK GUIDES
    • Image classification using LoRA
    • Prefix tuning for conditional generation
    • Prompt tuning for causal language modeling
    • Semantic segmentation using LoRA
    • P-tuning for sequence classification
    • Dreambooth fine-tuning with LoRA
    • LoRA for token classification
    • int8 training for automatic speech recognition
    • Semantic similarity with LoRA
  • 🌍DEVELOPER GUIDES
    • Working with custom models
    • PEFT low level API
    • Contributing to PEFT
    • Troubleshooting
  • 🌍ACCELERATE INTEGRATIONS
    • DeepSpeed
    • PagFully Sharded Data Parallele 2
  • 🌍CONCEPTUAL GUIDES
    • LoRA
    • Prompting
    • IA3
  • 🌍REFERENCE
    • PEFT model
    • Configuration
    • Tuners
Powered by GitBook
On this page
  • Setup
  • Load dataset and metric
  • Preprocess dataset
  • Train
  • Share model
  • Inference
  1. TASK GUIDES

P-tuning for sequence classification

PreviousSemantic segmentation using LoRANextDreambooth fine-tuning with LoRA

Last updated 1 year ago

It is challenging to finetune large language models for downstream tasks because they have so many parameters. To work around this, you can use prompts to steer the model toward a particular downstream task without fully finetuning a model. Typically, these prompts are handcrafted, which may be impractical because you need very large validation sets to find the best prompts. P-tuning is a method for automatically searching and optimizing for better prompts in a continuous space.

đź’ˇ Read to learn more about p-tuning.

This guide will show you how to train a model (but you can also use any of the GPT, OPT, or BLOOM models) with p-tuning on the mrpc configuration of the benchmark.

Before you begin, make sure you have all the necessary libraries installed:

Copied

!pip install -q peft transformers datasets evaluate

Setup

To get started, import 🌍 Transformers to create the base model, 🌍 Datasets to load a dataset, 🌍 Evaluate to load an evaluation metric, and 🌍 PEFT to create a and setup the configuration for p-tuning.

Define the model, dataset, and some basic training hyperparameters:

Copied

from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    TrainingArguments,
    Trainer,
)
from peft import (
    get_peft_config,
    get_peft_model,
    get_peft_model_state_dict,
    set_peft_model_state_dict,
    PeftType,
    PromptEncoderConfig,
)
from datasets import load_dataset
import evaluate
import torch

model_name_or_path = "roberta-large"
task = "mrpc"
num_epochs = 20
lr = 1e-3
batch_size = 32

Load dataset and metric

Copied

dataset = load_dataset("glue", task)
dataset["train"][0]
{
    "sentence1": 'Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence .',
    "sentence2": 'Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence .',
    "label": 1,
    "idx": 0,
}

From 🌍 Evaluate, load a metric for evaluating the model’s performance. The evaluation module returns the accuracy and F1 scores associated with this specific task.

Copied

metric = evaluate.load("glue", task)

Now you can use the metric to write a function that computes the accuracy and F1 scores. The compute_metric function calculates the scores from the model predictions and labels:

Copied

import numpy as np


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

Preprocess dataset

Initialize the tokenizer and configure the padding token to use. If you’re using a GPT, OPT, or BLOOM model, you should set the padding_side to the left; otherwise it’ll be set to the right. Tokenize the sentence pairs and truncate them to the maximum length.

Copied

if any(k in model_name_or_path for k in ("gpt", "opt", "bloom")):
    padding_side = "left"
else:
    padding_side = "right"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side=padding_side)
if getattr(tokenizer, "pad_token_id") is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id


def tokenize_function(examples):
    # max_length=None => use the model max length (it's actually the default)
    outputs = tokenizer(examples["sentence1"], examples["sentence2"], truncation=True, max_length=None)
    return outputs

Copied

tokenized_datasets = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["idx", "sentence1", "sentence2"],
)

tokenized_datasets = tokenized_datasets.rename_column("label", "labels")

Copied

data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding="longest")

Train

  • task_type: the type of task you’re training on, in this case it is sequence classification or SEQ_CLS

  • num_virtual_tokens: the number of virtual tokens to use, or in other words, the prompt

  • encoder_hidden_size: the hidden size of the encoder used to optimize the prompt parameters

Copied

peft_config = PromptEncoderConfig(task_type="SEQ_CLS", num_virtual_tokens=20, encoder_hidden_size=128)

Copied

model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, return_dict=True)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 1351938 || all params: 355662082 || trainable%: 0.38011867680626127"

Copied

training_args = TrainingArguments(
    output_dir="your-name/roberta-large-peft-p-tuning",
    learning_rate=1e-3,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=2,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

Copied

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

Share model

You can store and share your model on the Hub if you’d like. Log in to your BOINC AI account and enter your token when prompted:

Copied

from boincai_hub import notebook_login

notebook_login()

Copied

model.push_to_hub("your-name/roberta-large-peft-p-tuning", use_auth_token=True)

Inference

Once the model has been uploaded to the Hub, anyone can easily use it for inference. Load the configuration and model:

Copied

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSequenceClassification, AutoTokenizer

peft_model_id = "smangrul/roberta-large-peft-p-tuning"
config = PeftConfig.from_pretrained(peft_model_id)
inference_model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(inference_model, peft_model_id)

Get some text and tokenize it:

Copied

classes = ["not equivalent", "equivalent"]

sentence1 = "Coast redwood trees are the tallest trees on the planet and can grow over 300 feet tall."
sentence2 = "The coast redwood trees, which can attain a height of over 300 feet, are the tallest trees on earth."

inputs = tokenizer(sentence1, sentence2, truncation=True, padding="longest", return_tensors="pt")

Pass the inputs to the model to classify the sentences:

Copied

with torch.no_grad():
    outputs = model(**inputs).logits
    print(outputs)

paraphrased_text = torch.softmax(outputs, dim=1).tolist()[0]
for i in range(len(classes)):
    print(f"{classes[i]}: {int(round(paraphrased_text[i] * 100))}%")
"not equivalent: 4%"
"equivalent: 96%"

Next, load the mrpc configuration - a corpus of sentence pairs labeled according to whether they’re semantically equivalent or not - from the benchmark:

Use to apply the tokenize_function to the dataset, and remove the unprocessed columns because the model won’t need those. You should also rename the label column to labels because that is the expected name for the labels by models in the 🌍 Transformers library.

Create a collator function with to pad the examples in the batches to the longest sequence in the batch:

P-tuning uses a prompt encoder to optimize the prompt parameters, so you’ll need to initialize the with several arguments:

Create the base roberta-large model from , and then wrap the base model and peft_config with get_peft_model() to create a . If you’re curious to see how many parameters you’re actually training compared to training on all the model parameters, you can print it out with :

From the 🌍 Transformers library, set up the class with where you want to save the model to, the training hyperparameters, how to evaluate the model, and when to save the checkpoints:

Then pass the model, TrainingArguments, datasets, tokenizer, data collator, and evaluation function to the class, which’ll handle the entire training loop for you. Once you’re ready, call to start training!

Upload the model to a specifc model repository on the Hub with the function:

🌍
GPT Understands, Too
roberta-large
GLUE
PeftModel
GLUE
map
DataCollatorWithPadding
PromptEncoderConfig
AutoModelForSequenceClassification
PeftModel
print_trainable_parameters()
TrainingArguments
Trainer
train
push_to_hub