Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  • Configuration classes for ONNX exports
  • Base classes
  • Middle-end classes
  1. EXPORTERS
  2. ONNX
  3. REFERENCE

ONNX configurations

PreviousREFERENCENextExport functions

Last updated 1 year ago

Configuration classes for ONNX exports

Exporting a model to ONNX involves specifying:

  1. The input names.

  2. The output names.

  3. The dynamic axes. These refer to the input dimensions can be changed dynamically at runtime (e.g. a batch size or sequence length). All other axes will be treated as static, and hence fixed at runtime.

  4. Dummy inputs to trace the model. This is needed in PyTorch to record the computational graph and convert it to ONNX.

Since this data depends on the choice of model and task, we represent it in terms of configuration classes. Each configuration class is associated with a specific model architecture, and follows the naming convention ArchitectureNameOnnxConfig. For instance, the configuration which specifies the ONNX export of BERT models is BertOnnxConfig.

Since many architectures share similar properties for their ONNX configuration, 🌍 Optimum adopts a 3-level class hierarchy:

  1. Abstract and generic base classes. These handle all the fundamental features, while being agnostic to the modality (text, image, audio, etc).

  2. Middle-end classes. These are aware of the modality, but multiple can exist for the same modality depending on the inputs they support. They specify which input generators should be used for the dummy inputs, but remain model-agnostic.

  3. Model-specific classes like the BertOnnxConfig mentioned above. These are the ones actually used to export models.

Base classes

class optimum.exporters.onnx.OnnxConfig

( config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False )

Parameters

  • config (transformers.PretrainedConfig) β€” The model configuration.

  • task (str, defaults to "feature-extraction") β€” The task the model should be exported for.

  • int_dtype (str, defaults to "int64") β€” The data type of integer tensors, could be [β€œint64”, β€œint32”, β€œint8”], default to β€œint64”.

  • float_dtype (str, defaults to "fp32") β€” The data type of float tensors, could be [β€œfp32”, β€œfp16”, β€œbf16”], default to β€œfp32”.

Base class for ONNX exportable model describing metadata on how to export the model through the ONNX format.

Class attributes:

  • ATOL_FOR_VALIDATION (Union[float, Dict[str, float]]) β€” A float or a dictionary mapping task names to float, where the float values represent the absolute tolerance value to use during model conversion validation.

  • DEFAULT_ONNX_OPSET (int, defaults to 11) β€” The default ONNX opset to use for the ONNX export.

  • MIN_TORCH_VERSION (packaging.version.Version, defaults to ~optimum.exporters.onnx.utils.TORCH_MINIMUM_VERSION) β€” The minimum torch version supporting the export of the model to ONNX.

  • MIN_TRANSFORMERS_VERSION (packaging.version.Version, defaults to ~optimum.exporters.onnx.utils.TRANSFORMERS_MINIMUM_VERSION β€” The minimum transformers version supporting the export of the model to ONNX. Not always up-to-date or accurate. This is more for internal use.

  • PATCHING_SPECS (Optional[List[PatchingSpec]], defaults to None) β€” Specify which operators / modules should be patched before performing the export, and how. This is useful when some operator is not supported in ONNX for instance.

inputs

( ) β†’ Dict[str, Dict[int, str]]

Returns

Dict[str, Dict[int, str]]

A mapping of each input name to a mapping of axis position to the axes symbolic name.

Dict containing the axis definition of the input tensors to provide to the model.

outputs

( ) β†’ Dict[str, Dict[int, str]]

Returns

Dict[str, Dict[int, str]]

A mapping of each output name to a mapping of axis position to the axes symbolic name.

Dict containing the axis definition of the output tensors to provide to the model.

generate_dummy_inputs

( framework: str = 'pt'**kwargs ) β†’ Dict

Parameters

  • framework (str, defaults to "pt") β€” The framework for which to create the dummy inputs.

  • batch_size (int, defaults to 2) β€” The batch size to use in the dummy inputs.

  • sequence_length (int, defaults to 16) β€” The sequence length to use in the dummy inputs.

  • num_choices (int, defaults to 4) β€” The number of candidate answers provided for multiple choice task.

  • image_width (int, defaults to 64) β€” The width to use in the dummy inputs for vision tasks.

  • image_height (int, defaults to 64) β€” The height to use in the dummy inputs for vision tasks.

  • num_channels (int, defaults to 3) β€” The number of channels to use in the dummpy inputs for vision tasks.

  • feature_size (int, defaults to 80) β€” The number of features to use in the dummpy inputs for audio tasks in case it is not raw audio. This is for example the number of STFT bins or MEL bins.

  • nb_max_frames (int, defaults to 3000) β€” The number of frames to use in the dummpy inputs for audio tasks in case the input is not raw audio.

  • audio_sequence_length (int, defaults to 16000) β€” The number of frames to use in the dummpy inputs for audio tasks in case the input is raw audio.

Returns

Dict

A dictionary mapping the input names to dummy tensors in the proper framework format.

Generates the dummy inputs necessary for tracing the model. If not explicitely specified, default input shapes are used.

class optimum.exporters.onnx.OnnxConfigWithPast

( config: PretrainedConfigtask: str = 'feature-extraction'int_dtype: str = 'int64'float_dtype: str = 'fp32'use_past: bool = Falseuse_past_in_inputs: bool = Falsepreprocessors: typing.Optional[typing.List[typing.Any]] = Nonelegacy: bool = False )

add_past_key_values

( inputs_or_outputs: typing.Dict[str, typing.Dict[int, str]]direction: str )

Parameters

  • inputs_or_outputs (Dict[str, Dict[int, str]]) β€” The mapping to fill.

  • direction (str) β€” either β€œinputs” or β€œoutputs”, it specifies whether input_or_outputs is the input mapping or the output mapping, this is important for axes naming.

Fills input_or_outputs mapping with past_key_values dynamic axes considering the direction.

class optimum.exporters.onnx.OnnxSeq2SeqConfigWithPast

( config: PretrainedConfigtask: str = 'feature-extraction'int_dtype: str = 'int64'float_dtype: str = 'fp32'use_past: bool = Falseuse_past_in_inputs: bool = Falsebehavior: ConfigBehavior = <ConfigBehavior.MONOLITH: 'monolith'>preprocessors: typing.Optional[typing.List[typing.Any]] = Nonelegacy: bool = False )

with_behavior

( behavior: typing.Union[str, optimum.exporters.onnx.base.ConfigBehavior]use_past: bool = Falseuse_past_in_inputs: bool = False ) β†’ OnnxSeq2SeqConfigWithPast

Parameters

  • behavior (ConfigBehavior) β€” The behavior to use for the new instance.

  • use_past (bool, defaults to False) β€” Whether or not the ONNX config to instantiate is for a model using KV cache.

  • use_past_in_inputs (bool, defaults to False) β€” Whether the KV cache is to be passed as an input to the ONNX.

Returns

OnnxSeq2SeqConfigWithPast

Creates a copy of the current OnnxConfig but with a different ConfigBehavior and use_past value.

Middle-end classes

Text

class optimum.exporters.onnx.TextEncoderOnnxConfig

( config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False )

Handles encoder-based text architectures.

class optimum.exporters.onnx.TextDecoderOnnxConfig

( config: PretrainedConfigtask: str = 'feature-extraction'int_dtype: str = 'int64'float_dtype: str = 'fp32'use_past: bool = Falseuse_past_in_inputs: bool = Falsepreprocessors: typing.Optional[typing.List[typing.Any]] = Nonelegacy: bool = False )

Handles decoder-based text architectures.

class optimum.exporters.onnx.TextSeq2SeqOnnxConfig

( config: PretrainedConfigtask: str = 'feature-extraction'int_dtype: str = 'int64'float_dtype: str = 'fp32'use_past: bool = Falseuse_past_in_inputs: bool = Falsebehavior: ConfigBehavior = <ConfigBehavior.MONOLITH: 'monolith'>preprocessors: typing.Optional[typing.List[typing.Any]] = Nonelegacy: bool = False )

Handles encoder-decoder-based text architectures.

Vision

class optimum.exporters.onnx.config.VisionOnnxConfig

( config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False )

Handles vision architectures.

Multi-modal

class optimum.exporters.onnx.config.TextAndVisionOnnxConfig

( config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False )

Handles multi-modal text and vision architectures.

NORMALIZED_CONFIG_CLASS (Type) β€” A class derived from specifying how to normalize the model config.

DUMMY_INPUT_GENERATOR_CLASSES (Tuple[Type]) β€” A tuple of classes derived from specifying how to create dummy inputs.

Inherits from . A base class to handle the ONNX configuration of decoder-only models.

Inherits from . A base class to handle the ONNX configuration of encoder-decoder models.

🌍
🌍
🌍
<source>
NormalizedConfig
DummyInputGenerator
<source>
<source>
<source>
<source>
OnnxConfig
<source>
<source>
OnnxConfigWithPast
<source>
<source>
<source>
<source>
<source>
<source>