PEFT
  • ๐ŸŒGET STARTED
    • BOINC AI PEFT
    • Quicktour
    • Installation
  • ๐ŸŒTASK GUIDES
    • Image classification using LoRA
    • Prefix tuning for conditional generation
    • Prompt tuning for causal language modeling
    • Semantic segmentation using LoRA
    • P-tuning for sequence classification
    • Dreambooth fine-tuning with LoRA
    • LoRA for token classification
    • int8 training for automatic speech recognition
    • Semantic similarity with LoRA
  • ๐ŸŒDEVELOPER GUIDES
    • Working with custom models
    • PEFT low level API
    • Contributing to PEFT
    • Troubleshooting
  • ๐ŸŒACCELERATE INTEGRATIONS
    • DeepSpeed
    • PagFully Sharded Data Parallele 2
  • ๐ŸŒCONCEPTUAL GUIDES
    • LoRA
    • Prompting
    • IA3
  • ๐ŸŒREFERENCE
    • PEFT model
    • Configuration
    • Tuners
Powered by GitBook
On this page
  • Multilayer perceptron
  • timm model
  1. DEVELOPER GUIDES

Working with custom models

PreviousDEVELOPER GUIDESNextPEFT low level API

Last updated 1 year ago

Some fine-tuning techniques, such as prompt tuning, are specific to language models. That means in ๐ŸŒ PEFT, it is assumed a ๐ŸŒ Transformers model is being used. However, other fine-tuning techniques - like - are not restricted to specific model types.

In this guide, we will see how LoRA can be applied to a multilayer perceptron and a computer vision model from the library.

Multilayer perceptron

Letโ€™s assume that we want to fine-tune a multilayer perceptron with LoRA. Here is the definition:

Copied

from torch import nn


class MLP(nn.Module):
    def __init__(self, num_units_hidden=2000):
        super().__init__()
        self.seq = nn.Sequential(
            nn.Linear(20, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, 2),
            nn.LogSoftmax(dim=-1),
        )

    def forward(self, X):
        return self.seq(X)

This is a straightforward multilayer perceptron with an input layer, a hidden layer, and an output layer.

For this toy example, we choose an exceedingly large number of hidden units to highlight the efficiency gains from PEFT, but those gains are in line with more realistic examples.

There are a few linear layers in this model that could be tuned with LoRA. When working with common ๐ŸŒ Transformers models, PEFT will know which layers to apply LoRA to, but in this case, it is up to us as a user to choose the layers. To determine the names of the layers to tune:

Copied

print([(n, type(m)) for n, m in MLP().named_modules()])

This should print:

Copied

[('', __main__.MLP),
 ('seq', torch.nn.modules.container.Sequential),
 ('seq.0', torch.nn.modules.linear.Linear),
 ('seq.1', torch.nn.modules.activation.ReLU),
 ('seq.2', torch.nn.modules.linear.Linear),
 ('seq.3', torch.nn.modules.activation.ReLU),
 ('seq.4', torch.nn.modules.linear.Linear),
 ('seq.5', torch.nn.modules.activation.LogSoftmax)]

Letโ€™s say we want to apply LoRA to the input layer and to the hidden layer, those are 'seq.0' and 'seq.2'. Moreover, letโ€™s assume we want to update the output layer without LoRA, that would be 'seq.4'. The corresponding config would be:

Copied

from peft import LoraConfig

config = LoraConfig(
    target_modules=["seq.0", "seq.2"],
    modules_to_save=["seq.4"],
)

With that, we can create our PEFT model and check the fraction of parameters trained:

Copied

from peft import get_peft_model

model = MLP()
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 56,164 || all params: 4,100,164 || trainable%: 1.369798866581922

Finally, we can use any training framework we like, or write our own fit loop, to train the peft_model.

timm model

To start, ensure that timm is installed in the Python environment:

Copied

python -m pip install -U timm

Next we load a timm model for an image classification task:

Copied

import timm

num_classes = ...
model_id = "timm/poolformer_m36.sail_in1k"
model = timm.create_model(model_id, pretrained=True, num_classes=num_classes)

Again, we need to make a decision about what layers to apply LoRA to. Since LoRA supports 2D conv layers, and since those are a major building block of this model, we should apply LoRA to the 2D conv layers. To identify the names of those layers, letโ€™s look at all the layer names:

Copied

print([(n, type(m)) for n, m in MLP().named_modules()])

This will print a very long list, weโ€™ll only show the first few:

Copied

[('', timm.models.metaformer.MetaFormer),
 ('stem', timm.models.metaformer.Stem),
 ('stem.conv', torch.nn.modules.conv.Conv2d),
 ('stem.norm', torch.nn.modules.linear.Identity),
 ('stages', torch.nn.modules.container.Sequential),
 ('stages.0', timm.models.metaformer.MetaFormerStage),
 ('stages.0.downsample', torch.nn.modules.linear.Identity),
 ('stages.0.blocks', torch.nn.modules.container.Sequential),
 ('stages.0.blocks.0', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.0.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.0.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ('stages.0.blocks.0.drop_path1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale1', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.norm2', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.mlp', timm.layers.mlp.Mlp),
 ('stages.0.blocks.0.mlp.fc1', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.act', torch.nn.modules.activation.GELU),
 ('stages.0.blocks.0.mlp.drop1', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.mlp.norm', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.mlp.fc2', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.drop2', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.drop_path2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale2', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.1', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.1.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.1.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.1.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ...
 ('head.global_pool.flatten', torch.nn.modules.linear.Identity),
 ('head.norm', timm.layers.norm.LayerNorm2d),
 ('head.flatten', torch.nn.modules.flatten.Flatten),
 ('head.drop', torch.nn.modules.linear.Identity),
 ('head.fc', torch.nn.modules.linear.Linear)]
 ]

Furthermore, as in the first example, we should ensure that the output layer, in this case the classification head, is also updated. Looking at the end of the list printed above, we can see that itโ€™s named 'head.fc'. With that in mind, here is our LoRA config:

Copied

config = LoraConfig(target_modules=r".*\.mlp\.fc\d", modules_to_save=["head.fc"])

Then we only need to create the PEFT model by passing our base model and the config to get_peft_model:

Copied

peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 1,064,454 || all params: 56,467,974 || trainable%: 1.88505789139876

This shows us that we only need to train less than 2% of all parameters, which is a huge efficiency gain.

For a complete example, check out .

The library contains a large number of pretrained computer vision models. Those can also be fine-tuned with PEFT. Letโ€™s check out how this works in practice.

Upon closer inspection, we see that the 2D conv layers have names such as "stages.0.blocks.0.mlp.fc1" and "stages.0.blocks.0.mlp.fc2". How can we match those layer names specifically? You can write a to match the layer names. For our case, the regex r".*\.mlp\.fc\d" should do the job.

For a complete example, check out .

๐ŸŒ
LoRA
timm
this notebook
timm
regular expressions
this notebook