Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  • How to use optimum and BetterTransformer ?
  • Install dependencies
  • Step 1: Load your model
  • Step 2: Set your model on your preferred device
  • Step 3: Convert your model to BetterTransformer!
  • Pipeline compatibility
  • Training compatibility
  1. BETTERTRANSFORMER
  2. TUTORIALS

Convert Transformers models to use BetterTransformer

PreviousTUTORIALSNextHow to add support for new architectures?

Last updated 1 year ago

How to use optimum and BetterTransformer ?

Install dependencies

You can easily use the BetterTransformer integration with 🌍 Optimum, first install the dependencies as follows:

Copied

pip install transformers accelerate optimum

Also, make sure to install the latest version of PyTorch by following the guidelines on the . Note that BetterTransformer API is only compatible with torch>=1.13, so make sure to have this version installed on your environement before starting. If you want to benefit from the scaled_dot_product_attention function (for decoder-based models), make sure to use at least torch>=2.0.

Step 1: Load your model

First, load your BOINC AI model using 🌍 Transformers. Make sure to download one of the models that is supported by the BetterTransformer API:

Copied

>>> from transformers import AutoModel

>>> model_id = "roberta-base"
>>> model = AutoModel.from_pretrained(model_id)

Sometimes you can directly load your model on your GPU devices using `accelerate` library, therefore you can optionally try out the following command:Copied

>>> from transformers import AutoModel

>>> model_id = "roberta-base"
>>> model = AutoModel.from_pretrained(model_id, device_map="auto")

Step 2: Set your model on your preferred device

If you did not used device_map="auto" to load your model (or if your model does not support device_map="auto"), you can manually set your model to a GPU:

Copied

>>> model = model.to(0) # or model.to("cuda:0")

Step 3: Convert your model to BetterTransformer!

Now time to convert your model using BetterTransformer API! You can run the commands below:

Copied

>>> from optimum.bettertransformer import BetterTransformer

>>> model = BetterTransformer.transform(model)

By default, BetterTransformer.transform will overwrite your model, which means that your previous native model cannot be used anymore. If you want to keep it for some reasons, just add the flag keep_original_model=True!

Copied

>>> from optimum.bettertransformer import BetterTransformer

>>> model_bt = BetterTransformer.transform(model, keep_original_model=True)

If your model does not support the BetterTransformer API, this will be displayed on an error trace. Note also that decoder-based models (OPT, BLOOM, etc.) are not supported yet but this is in the roadmap of PyTorch for the future.

Pipeline compatibility

Copied

>>> from optimum.pipelines import pipeline

>>> pipe = pipeline("fill-mask", "distilbert-base-uncased", accelerator="bettertransformer")
>>> pipe("I am a student at [MASK] University.")

If you want to run a pipeline on a GPU device, run:

Copied

>>> from optimum.pipelines import pipeline

>>> pipe = pipeline("fill-mask", "distilbert-base-uncased", accelerator="bettertransformer", device=0)
>>> ...

You can also use transformers.pipeline as usual and pass the converted model directly:

Copied

>>> from transformers import pipeline

>>> pipe = pipeline("fill-mask", model=model_bt, tokenizer=tokenizer, device=0)
>>> ...

Training compatibility

You can now benefit from the BetterTransformer API for your training scripts. Just make sure to convert back your model to its original version by calling BetterTransformer.reverse before saving your model. The code snippet below shows how:

Copied

from optimum.bettertransformer import BetterTransformer
from transformers import AutoModelForCausalLM

with torch.device(β€œcuda”):
    model = AutoModelForCausalLM.from_pretrained(β€œgpt2-large”, torch_dtype=torch.float16)

model = BetterTransformer.transform(model)

# do your inference or training here

# if training and want to save the model
model = BetterTransformer.reverse(model)
model.save_pretrained("fine_tuned_model")
model.push_to_hub("fine_tuned_model")

is also compatible with this integration and you can use BetterTransformer as an accelerator for your pipelines. The code snippet below shows how:

Please refer to the for further usage. If you face into any issue, do not hesitate to open an isse on GitHub!

🌍
🌍
PyTorch official website
Transformer’s pipeline
official documentation of pipeline