Export a model to ONNX

Export a model to ONNX with optimum.exporters.onnx

Summary

Exporting a model to ONNX is as simple as

Copied

optimum-cli export onnx --model gpt2 gpt2_onnx/

Check out the help for more options:

Copied

optimum-cli export onnx --help

Why use ONNX?

If you need to deploy 🌍 Transformers or 🌍 Diffusers models in production environments, we recommend exporting them to a serialized format that can be loaded and executed on specialized runtimes and hardware. In this guide, we’ll show you how to export these models to ONNX (Open Neural Network eXchange).

ONNX is an open standard that defines a common set of operators and a common file format to represent deep learning models in a wide variety of frameworks, including PyTorch and TensorFlow. When a model is exported to the ONNX format, these operators are used to construct a computational graph (often called an intermediate representation) which represents the flow of data through the neural network.

By exposing a graph with standardized operators and data types, ONNX makes it easy to switch between frameworks. For example, a model trained in PyTorch can be exported to ONNX format and then imported in TensorRT or OpenVINO.

Once exported, a model can be optimized for inference via techniques such as graph optimization and quantization. Check the optimum.onnxruntime subpackage to optimize and run ONNX models!

🌍 Optimum provides support for the ONNX export by leveraging configuration objects. These configuration objects come ready made for a number of model architectures, and are designed to be easily extendable to other architectures.

To check the supported architectures, go to the configuration reference page.

Exporting a model to ONNX using the CLI

To export a 🌍 Transformers or 🌍 Diffusers model to ONNX, you’ll first need to install some extra dependencies:

Copied

The Optimum ONNX export can be used through Optimum command-line:

Copied

Exporting a checkpoint can be done as follows:

Copied

You should see the following logs (along with potential logs from PyTorch / TensorFlow that were hidden here for clarity):

Copied

This exports an ONNX graph of the checkpoint defined by the --model argument. As you can see, the task was automatically detected. This was possible because the model was on the Hub.

For local models, providing the --task argument is needed or it will default to the model architecture without any task specific head:

Copied

Note that providing the --task argument for a model on the Hub will disable the automatic task detection.

The resulting model.onnx file can then be run on one of the many accelerators that support the ONNX standard. For example, we can load and run the model with ONNX Runtime using the optimum.onnxruntime package as follows:

Copied

Printing the outputs would give that:

Copied

As you can see, converting a model to ONNX does not mean leaving the BOINC AI ecosystem. You end up with a similar API as regular 🌍 Transformers models!

It is also possible to export the model to ONNX directly from the ORTModelForQuestionAnswering class by doing the following:

Copied

For more information, check the optimum.onnxruntime documentation page on this topic.

The process is identical for TensorFlow checkpoints on the Hub. For example, we can export a pure TensorFlow checkpoint from the Keras organization as follows:

Copied

Exporting a model to be used with Optimum’s ORTModel

Models exported through optimum-cli export onnx can be used directly in ORTModel. This is especially useful for encoder-decoder models, where in this case the export will split the encoder and decoder into two .onnx files, as the encoder is usually only run once while the decoder may be run several times in autogenerative tasks.

Exporting a model using past keys/values in the decoder

When exporting a decoder model used for generation, it can be useful to encapsulate in the exported ONNX the reuse of past keys and values. This allows to avoid recomputing the same intermediate activations during the generation.

In the ONNX export, the past keys/values are reused by default. This behavior corresponds to --task text2text-generation-with-past, --task text-generation-with-past, or --task automatic-speech-recognition-with-past. If for any purpose you would like to disable the export with past keys/values reuse, passing explicitly to optimum-cli export onnx the task text2text-generation, text-generation or automatic-speech-recognition is required.

A model exported using past key/values can be reused directly into Optimum’s ORTModel:

Copied

and

Copied

Selecting a task

Specifying a --task should not be necessary in most cases when exporting from a model on the BOINC AI Hub.

However, in case you need to check for a given a model architecture what tasks the ONNX export supports, we got you covered. First, you can check the list of supported tasks for both PyTorch and TensorFlow here.

For each model architecture, you can find the list of supported tasks via the TasksManager. For example, for DistilBERT, for the ONNX export, we have:

Copied

You can then pass one of these tasks to the --task argument in the optimum-cli export onnx command, as mentioned above.

Custom export of Transformers models

Customize the export of official Transformers models

Optimum allows for advanced users a finer-grained control over the configuration for the ONNX export. This is especially useful if you would like to export models with different keyword arguments, for example using output_attentions=True or output_hidden_states=True.

To support these use cases, ~exporters.main_export supports two arguments: model_kwargs and custom_onnx_configs, which are used in the following fashion:

  • model_kwargs allows to override some of the default arguments to the models forward, in practice as model(**reference_model_inputs, **model_kwargs).

  • custom_onnx_configs should be a Dict[str, OnnxConfig], mapping from the submodel name (usually model, encoder_model, decoder_model, or decoder_model_with_past - reference) to a custom ONNX configuration for the given submodel.

A complete example is given below, allowing to export models with output_attentions=True.

Copied

For tasks that require only a single ONNX file (e.g. encoder-only), an exported model with custom inputs/outputs can then be used with the class optimum.onnxruntime.ORTModelForCustomTasks for inference with ONNX Runtime on CPU or GPU.

Customize the export of Transformers models with custom modeling

Optimum supports the export of Transformers models with custom modeling that use trust_remote_code=True, not officially supported in the Transormers library but usable with its functionality as pipelines and generation.

Examples of such models are THUDM/chatglm2-6b and mosaicml/mpt-30b.

To export custom models, a dictionary custom_onnx_configs needs to be passed to main_export(), with the ONNX config definition for all the subparts of the model to export (for example, encoder and decoder subparts). The example below allows to export mosaicml/mpt-7b model:

Copied

Moreover, the advanced argument fn_get_submodels to main_export allows to customize how the submodels are extracted in case the model needs to be exported in several submodels. Examples of such functions can be [consulted here](link to utils.py relevant code once merged).

Last updated