Low-Rank Adaptation of Large Language Models (LoRA)
Last updated
Last updated
This is an experimental feature. Its APIs can change in future.
is a training method that accelerates the training of large models while consuming less memory. It adds pairs of rank-decomposition weight matrices (called update matrices) to existing weights, and only trains those newly added weights. This has a couple of advantages:
Previous pretrained weights are kept frozen so the model is not as prone to .
Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
LoRA matrices are generally added to the attention layers of the original model. 𧨠Diffusers provides the method to load the LoRA weights into a modelβs attention layers. You can control the extent to which the model is adapted toward new training images via a scale
parameter.
The greater memory-efficiency allows you to run fine-tuning on consumer GPUs like the Tesla T4, RTX 3080 or even the RTX 2080 Ti! GPUs like the T4 are free and readily accessible in Kaggle or Google Colab notebooks.
π‘ LoRA is not only limited to attention layers. The authors found that amending the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why itβs common to just add the LoRA weights to the attention layers of a model. Check out the blog for more information about how LoRA works!
was the first to try out LoRA training for Stable Diffusion in the popular GitHub repository. 𧨠Diffusers now supports finetuning with LoRA for and . This guide will show you how to do both.
If youβd like to store or share your model with the community, login to your BOINC AI account (create if you donβt have one already):
Copied
Finetuning a model like Stable Diffusion, which has billions of parameters, can be slow and difficult. With LoRA, it is much easier and faster to finetune a diffusion model. It can run on hardware with as little as 11GB of GPU RAM without resorting to tricks such as 8-bit optimizers.
The OUTPUT_DIR
and HUB_MODEL_ID
variables are optional and specify where to save the model to on the Hub:
Copied
There are some flags to be aware of before you start training:
--push_to_hub
stores the trained LoRA embeddings on the Hub.
--learning_rate=1e-04
, you can afford to use a higher learning rate than you normally would with LoRA.
Copied
Copied
Load the LoRA weights from your finetuned model on top of the base model weights, and then move the pipeline to a GPU for faster inference. When you merge the LoRA weights with the frozen pretrained model weights, you can optionally adjust how much of the weights to merge with the scale
parameter:
π‘ A scale
value of 0
is the same as not using your LoRA weights and youβre only using the base model weights, and a scale
value of 1
means youβre only using the fully finetuned LoRA weights. Values between 0
and 1
interpolates between the two weights.
Copied
Copied
The OUTPUT_DIR
variables is optional and specifies where to save the model to on the Hub:
Copied
There are some flags to be aware of before you start training:
--push_to_hub
stores the trained LoRA embeddings on the Hub.
--learning_rate=1e-04
, you can afford to use a higher learning rate than you normally would with LoRA.
Itβs also possible to additionally fine-tune the text encoder with LoRA. This, in most cases, leads to better results with a slight increase in the compute. To allow fine-tuning the text encoder with LoRA, specify the --train_text_encoder
while launching the train_dreambooth_lora.py
script.
Copied
Copied
Load the LoRA weights from your finetuned DreamBooth model on top of the base model weights, and then move the pipeline to a GPU for faster inference. When you merge the LoRA weights with the frozen pretrained model weights, you can optionally adjust how much of the weights to merge with the scale
parameter:
π‘ A scale
value of 0
is the same as not using your LoRA weights and youβre only using the base model weights, and a scale
value of 1
means youβre only using the fully finetuned LoRA weights. Values between 0
and 1
interpolates between the two weights.
Copied
If you used --train_text_encoder
during training, then use pipe.load_lora_weights()
to load the LoRA weights. For example:
Copied
If your LoRA parameters involve the UNet as well as the Text Encoder, then passing cross_attention_kwargs={"scale": 0.5}
will apply the scale
value to both the UNet and the Text Encoder.
Copied
If you need to use scale
when working with fuse_lora()
to control the influence of the LoRA parameters on the outputs, you should specify lora_scale
within fuse_lora()
. Passing the scale
parameter to cross_attention_kwargs
when you call the pipeline wonβt work.
To use a different lora_scale
with fuse_lora()
, you should first call unfuse_lora()
on the corresponding pipeline and call fuse_lora()
again with the expected lora_scale
.
Copied
Copied
Copied
We then load the checkpoint downloaded from CivitAI:
Copied
If youβre loading a checkpoint in the safetensors
format, please ensure you have safetensors
installed.
And then itβs time for running inference:
Copied
Below is a comparison between the LoRA and the non-LoRA results:
Copied
Here are some example checkpoints we tried out:
SDXL 0.9:
SDXL 1.0:
Here is an example of how to perform inference with these checkpoints in diffusers
:
Copied
If you notice carefully, the inference UX is exactly identical to what we presented in the sections above.
Known limitations specific to the Kohya LoRAs:
Here is an example:
Copied
Letβs finetune on the dataset to generate your own PokΓ©mon.
Specify the MODEL_NAME
environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the argument. Youβll also need to set the DATASET_NAME
environment variable to the name of the dataset you want to train on. To use your own dataset, take a look at the guide.
--report_to=wandb
reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this ).
Now youβre ready to launch the training (you can find the full training script ). Training takes about 5 hours on a 2080 Ti GPU with 11GB of RAM, and itβll create and save model checkpoints and the pytorch_lora_weights
in your repository.
Now you can use the model for inference by loading the base model in the and then the :
If you are loading the LoRA parameters from the Hub and if the Hub repository has a base_model
tag (such as ), then you can do:
is a finetuning technique for personalizing a text-to-image model like Stable Diffusion to generate photorealistic images of a subject in different contexts, given a few images of the subject. However, DreamBooth is very sensitive to hyperparameters and it is easy to overfit. Some important hyperparameters to consider include those that affect the training time (learning rate, number of training steps), and inference time (number of steps, scheduler type).
π‘ Take a look at the blog for an in-depth analysis of DreamBooth experiments and recommended settings.
Letβs finetune with DreamBooth and LoRA with some πΆ . Download and save these images to a directory. To use your own dataset, take a look at the guide.
To start, specify the MODEL_NAME
environment variable (either a Hub model repository id or a path to the directory containing the model weights) and pass it to the argument. Youβll also need to set INSTANCE_DIR
to the path of the directory containing the images.
--report_to=wandb
reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this ).
Now youβre ready to launch the training (you can find the full training script ). The script creates and saves model checkpoints and the pytorch_lora_weights.bin
file in your repository.
Now you can use the model for inference by loading the base model in the :
Note that the use of is preferred to for loading LoRA parameters. This is because can handle the following situations:
LoRA parameters that donβt have separate identifiers for the UNet and the text encoder (such as ). So, you can just do:
LoRA parameters that have separate identifiers for the UNet and the text encoder such as: .
You can also provide a local directory path to as well as .
We support fine-tuning with . Please refer to the following docs:
You can call on a pipeline to unload the LoRA parameters.
You can call on a pipeline to merge the LoRA parameters with the original parameters of the underlying model(s). This can lead to a potential speedup in the inference latency.
To undo fuse_lora
, call on a pipeline.
π Diffusers supports loading checkpoints from popular LoRA trainers such as and . In this section, we outline the current APIβs details and limitations.
This support was made possible because of the amazing contributors: and .
We support loading Kohya LoRA checkpoints using . In this section, we explain how to load such a checkpoint from in Diffusers and perform inference with it.
First, download a checkpoint. Weβll use for demonstration purposes.
Next, we initialize a :
You have a similar checkpoint stored on the BOINC AI Hub, you can load it directly with like so:
After the release of , the community contributed some amazing LoRA checkpoints trained on top of it with the Kohya trainer.
Kamepan.safetensors
comes from .
Thanks to for helping us on integrating this feature.
When images donβt looks similar to other UIs, such as ComfyUI, it can be because of multiple reasons, as explained .
We donβt fully support . To the best of our knowledge, our current load_lora_weights()
should support LyCORIS checkpoints that have LoRA and LoCon modules but not the other ones, such as Hada, LoKR, etc.