Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  • Optimum Inference with ONNX Runtime
  • Switching from Transformers to Optimum
  • Sequence-to-sequence models
  • Stable Diffusion
  • Stable Diffusion XL
  • Latent Consistency Models
  1. ONNX RUNTIME
  2. HOW-TO GUIDES

Models for inference

PreviousInference pipelinesNextHow to apply graph optimization

Last updated 1 year ago

Optimum Inference with ONNX Runtime

Optimum is a utility package for building and running inference with accelerated runtime like ONNX Runtime. Optimum can be used to load optimized models from the and create pipelines to run accelerated inference without rewriting your APIs.

Switching from Transformers to Optimum

The optimum.onnxruntime.ORTModelForXXX model classes are API compatible with BOINC AI Transformers models. This means you can just replace your AutoModelForXXX class with the corresponding ORTModelForXXX class in optimum.onnxruntime.

You do not need to adapt your code to get it to work with ORTModelForXXX classes:

Copied

from transformers import AutoTokenizer, pipeline
-from transformers import AutoModelForQuestionAnswering
+from optimum.onnxruntime import ORTModelForQuestionAnswering

-model = AutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2") # PyTorch checkpoint
+model = ORTModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2") # ONNX checkpoint
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

onnx_qa = pipeline("question-answering",model=model,tokenizer=tokenizer)

question = "What's my name?"
context = "My name is Philipp and I live in Nuremberg."
pred = onnx_qa(question, context)

Loading a vanilla Transformers model

Copied

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model = ORTModelForSequenceClassification.from_pretrained(
...     "distilbert-base-uncased-finetuned-sst-2-english", export=True
... )

Pushing ONNX models to the BOINC AI Hub

Copied

>>> from optimum.onnxruntime import ORTModelForSequenceClassification

>>> # Load the model from the hub and export it to the ONNX format
>>> model = ORTModelForSequenceClassification.from_pretrained(
...     "distilbert-base-uncased-finetuned-sst-2-english", export=True
... )

>>> # Save the converted model
>>> model.save_pretrained("a_local_path_for_convert_onnx_model")

# Push the onnx model to BA Hub
>>> model.push_to_hub(
...   "a_local_path_for_convert_onnx_model", repository_id="my-onnx-repo", use_auth_token=True
... )

Sequence-to-sequence models

Sequence-to-sequence (Seq2Seq) models can also be used when running inference with ONNX Runtime. When Seq2Seq models are exported to the ONNX format, they are decomposed into three parts that are later combined during inference:

  • The encoder part of the model

  • The decoder part of the model + the language modeling head

  • The same decoder part of the model + language modeling head but taking and using pre-computed key / values as inputs and outputs. This makes inference faster.

Here is an example of how you can load a T5 model to the ONNX format and run inference for a translation task:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.onnxruntime import ORTModelForSeq2SeqLM

# Load the model from the hub and export it to the ONNX format
>>> model_name = "t5-small"
>>> model = ORTModelForSeq2SeqLM.from_pretrained(model_name, export=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a pipeline
>>> onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
>>> text = "He never went out without a book under his arm, and he often came back with two."
>>> result = onnx_translation(text)
>>> # [{'translation_text': "Il n'est jamais sorti sans un livre sous son bras, et il est souvent revenu avec deux."}]

Stable Diffusion

Stable Diffusion models can also be used when running inference with ONNX Runtime. When Stable Diffusion models are exported to the ONNX format, they are split into four components that are later combined during inference:

  • The text encoder

  • The U-NET

  • The VAE encoder

  • The VAE decoder

Make sure you have 🤗 Diffusers installed.

To install diffusers:

Copied

pip install diffusers

Text-to-Image

Here is an example of how you can load an ONNX Stable Diffusion model and run inference using ONNX Runtime:

Copied

from optimum.onnxruntime import ORTStableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, revision="onnx")
prompt = "sailing ship in storm by Leonardo da Vinci"
image = pipeline(prompt).images[0]

To load your PyTorch model and convert it to ONNX on-the-fly, you can set export=True.

Copied

pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, export=True)

# Don't forget to save the ONNX model
save_directory = "a_local_path"
pipeline.save_pretrained(save_directory)

Image-to-Image

Copied

import requests
import torch
from PIL import Image
from io import BytesIO
from optimum.onnxruntime import ORTStableDiffusionImg2ImgPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipeline = ORTStableDiffusionImg2ImgPipeline.from_pretrained(model_id, revision="onnx")

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))

prompt = "A fantasy landscape, trending on artstation"

image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("fantasy_landscape.png")

Inpaint

Copied

import PIL
import requests
import torch
from io import BytesIO
from optimum.onnxruntime import ORTStableDiffusionInpaintPipeline

model_id = "runwayml/stable-diffusion-inpainting"
pipeline = ORTStableDiffusionInpaintPipeline.from_pretrained(model_id, revision="onnx")

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

Stable Diffusion XL

Before using ORTStableDiffusionXLPipeline make sure to have diffusers and invisible_watermark installed. You can install the libraries as follows:

Copied

pip install diffusers
pip install invisible-watermark>=0.2.0

Text-to-Image

Copied

from optimum.onnxruntime import ORTStableDiffusionXLPipeline

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
base = ORTStableDiffusionXLPipeline.from_pretrained(model_id)
prompt = "sailing ship in storm by Leonardo da Vinci"
image = base(prompt).images[0]

# Don't forget to save the ONNX model
save_directory = "sd_xl_base"
base.save_pretrained(save_directory)

Image-to-Image

Here is an example of how you can load a PyTorch SDXL model, convert it to ONNX on-the-fly and run inference using ONNX Runtime for image-to-image :

Copied

from optimum.onnxruntime import ORTStableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image

model_id = "stabilityai/stable-diffusion-xl-refiner-1.0"
pipeline = ORTStableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, export=True)

url = "https://boincai.com/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png"
image = load_image(url).convert("RGB")
prompt = "medieval castle by Caspar David Friedrich"
image = pipeline(prompt, image=image).images[0]
image.save("medieval_castle.png")

Refining the image output

Copied

from optimum.onnxruntime import ORTStableDiffusionXLImg2ImgPipeline

model_id = "stabilityai/stable-diffusion-xl-refiner-1.0"
refiner = ORTStableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, export=True)

image = base(prompt=prompt, output_type="latent").images[0]
image = refiner(prompt=prompt, image=image[None, :]).images[0]
image.save("sailing_ship.png")

Latent Consistency Models

Text-to-Image

Copied

from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

model_id = "SimianLuo/LCM_Dreamshaper_v7"
pipeline = ORTLatentConsistencyModelPipeline.from_pretrained(model_id, export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipeline(prompt, num_inference_steps=4, guidance_scale=8.0).images

Because the model you want to work with might not be already converted to ONNX, includes a method to convert vanilla Transformers models to ONNX ones. Simply pass export=True to the method, and your model will be loaded and converted to ONNX on-the-fly:

It is also possible, just as with regular s, to push your ORTModelForXXX to the :

Here is an example of how you can load a SDXL ONNX model from and run inference using ONNX Runtime :

The image can be refined by making use of a model like . In this case, you only have to output the latents from the base model.

Here is an example of how you can load a Latent Consistency Models (LCMs) from and run inference using ONNX Runtime :

🌍
🌍
BOINC AI Hub
ORTModel
from_pretrained()
PreTrainedModel
BOINC AI Model Hub
stabilityai/stable-diffusion-xl-base-1.0
stabilityai/stable-diffusion-xl-refiner-1.0
SimianLuo/LCM_Dreamshaper_v7