Add support for exporting an architecture to ONNX
Last updated
Last updated
If you wish to export a model whose architecture is not already supported by the library, these are the main steps to follow:
Implement a custom ONNX configuration.
Register the ONNX configuration in the .
Export the model to ONNX.
Validate the outputs of the original and exported models.
In this section, weβll look at how BERT was implemented to show whatβs involved with each step.
Letβs start with the ONNX configuration object. We provide a 3-level , and to add support for a model, inheriting from the right middle-end class will be the way to go most of the time. You might have to implement a middle-end class yourself if you are adding an architecture handling a modality and/or case never seen before.
A good way to implement a custom ONNX configuration is to look at the existing configuration implementations in the optimum/exporters/onnx/model_configs.py
file.
Also, if the architecture you are trying to add is (very) similar to an architecture that is already supported (for instance adding support for ALBERT when BERT is already supported), trying to simply inheriting from this class might work.
When inheriting from a middle-end class, look for the one handling the same modality / category of models as the one you are trying to support.
Since BERT is an encoder-based model for text, its configuration inherits from the middle-end class . In optimum/exporters/onnx/model_configs.py
:
Copied
Then comes the model-specific class, BertOnnxConfig
. Two class attributes are specified here:
ATOL_FOR_VALIDATION
: it is used when validating the exported model against the original one, this is the absolute acceptable tolerance for the output values difference.
Once you have implemented an ONNX configuration, you can instantiate it by providing the base modelβs configuration as follows:
Copied
The resulting object has several useful properties. For example, you can view the ONNX operator set that will be used during the export:
Copied
You can also view the outputs associated with the model as follows:
Copied
Notice that the outputs property follows the same structure as the inputs; it returns an OrderedDict
of named outputs and their shapes. The output structure is linked to the choice of task that the configuration is initialised with. By default, the ONNX configuration is initialized with the default
task that corresponds to exporting a model loaded with the AutoModel
class. If you want to export a model for another task, just provide a different task to the task
argument when you initialize the ONNX configuration. For example, if we wished to export BERT with a sequence classification head, we could use:
Copied
Check out BartOnnxConfig
for an advanced example.
To do that, add an entry in the _SUPPORTED_MODEL_TYPE
attribute:
If the model is already supported for other backends than ONNX, it will already have an entry, so you will only need to add an onnx
key specifying the name of the configuration class.
Otherwise, you will have to add the whole entry.
For BERT, it looks as follows:
Copied
Once you have implemented the ONNX configuration, the next step is to export the model. Here we can use the export()
function provided by the optimum.exporters.onnx
package. This function expects the ONNX configuration, along with the base model, and the path to save the exported file:
Copied
Copied
The final step is to validate that the outputs from the base and exported model agree within some absolute tolerance. Here we can use the validate_model_outputs()
function provided by the optimum.exporters.onnx
package:
Copied
Now that the support for the architectures has been implemented, and validated, there are two things left:
Add your model architecture to the tests in tests/exporters/test_onnx_export.py
Thanks for you contribution!
First letβs explain what TextEncoderOnnxConfig
is all about. While most of the features are already implemented in OnnxConfig
, this class is modality-agnostic, meaning that it does not know what kind of inputs it should handle. The way input generation is handled is via the DUMMY_INPUT_GENERATOR_CLASSES
attribute, which is a tuple of s. Here we are making a modality-aware configuration inheriting from OnnxConfig
by specifying DUMMY_INPUT_GENERATOR_CLASSES = (DummyTextInputGenerator,)
.
NORMALIZED_CONFIG_CLASS
: this must be a , it basically allows the input generator to access the model config attributes in a generic way.
Every configuration object must implement the property and return a mapping, where each key corresponds to an input name, and each value indicates the axes in that input that are dynamic. For BERT, we can see that three inputs are required: input_ids
, attention_mask
and token_type_ids
. These inputs have the same shape of (batch_size, sequence_length)
(except for the multiple-choice
task) which is why we see the same axes used in the configuration.
The is the main entry-point to load a model given a name and a task, and to get the proper configuration for a given (architecture, backend) couple. When adding support for the export to ONNX, registering the configuration to the TasksManager
will make the export available in the command line tool.
The onnx_inputs
and onnx_outputs
returned by the export()
function are lists of the keys defined in the and properties of the configuration. Once the model is exported, you can test that the model is well formed as follows:
If your model is larger than 2GB, you will see that many additional files are created during the export. This is expected because ONNX uses to store the model and these have a size limit of 2GB. See the for instructions on how to load models with external data.
Create a PR on the