Quantization

Quantization

ORTQuantizer

class optimum.onnxruntime.ORTQuantizer

<source>

( onnx_model_path: Pathconfig: typing.Optional[ForwardRef('PretrainedConfig')] = None )

Handles the ONNX Runtime quantization process for models shared on boincai.com/models.

compute_ranges

<source>

( )

Computes the quantization ranges.

fit

<source>

( dataset: Datasetcalibration_config: CalibrationConfigonnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx'operators_to_quantize: typing.Optional[typing.List[str]] = Nonebatch_size: int = 1use_external_data_format: bool = Falseuse_gpu: bool = Falseforce_symmetric_range: bool = False )

Parameters

  • dataset (Dataset) — The dataset to use when performing the calibration step.

  • calibration_config (~CalibrationConfig) — The configuration containing the parameters related to the calibration step.

  • onnx_augmented_model_name (Union[str, Path], defaults to "augmented_model.onnx") — The path used to save the augmented model used to collect the quantization ranges.

  • operators_to_quantize (Optional[List[str]], defaults to None) — List of the operators types to quantize.

  • batch_size (int, defaults to 1) — The batch size to use when collecting the quantization ranges values.

  • use_external_data_format (bool, defaults to False) — Whether to use external data format to store model which size is >= 2Gb.

  • use_gpu (bool, defaults to False) — Whether to use the GPU when collecting the quantization ranges values.

  • force_symmetric_range (bool, defaults to False) — Whether to make the quantization ranges symmetric.

Performs the calibration step and computes the quantization ranges.

from_pretrained

<source>

( model_or_path: typing.Union[ForwardRef('ORTModel'), str, pathlib.Path]file_name: typing.Optional[str] = None )

Parameters

  • model_or_path (Union[ORTModel, str, Path]) — Can be either:

    • A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.

    • Or an ORTModelForXX class, e.g., ORTModelForQuestionAnswering.

  • file_name(Optional[str], defaults to None) — Overwrites the default model file name from "model.onnx" to file_name. This allows you to load different model files from the same repository or directory.

Instantiates a ORTQuantizer from an ONNX model file or an ORTModel.

get_calibration_dataset

<source>

( dataset_name: strnum_samples: int = 100dataset_config_name: typing.Optional[str] = Nonedataset_split: typing.Optional[str] = Nonepreprocess_function: typing.Optional[typing.Callable] = Nonepreprocess_batch: bool = Trueseed: int = 2016use_auth_token: bool = False )

Parameters

  • dataset_name (str) — The dataset repository name on the BOINC AI Hub or path to a local directory containing data files to load to use for the calibration step.

  • num_samples (int, defaults to 100) — The maximum number of samples composing the calibration dataset.

  • dataset_config_name (Optional[str], defaults to None) — The name of the dataset configuration.

  • dataset_split (Optional[str], defaults to None) — Which split of the dataset to use to perform the calibration step.

  • preprocess_function (Optional[Callable], defaults to None) — Processing function to apply to each example after loading dataset.

  • preprocess_batch (bool, defaults to True) — Whether the preprocess_function should be batched.

  • seed (int, defaults to 2016) — The random seed to use when shuffling the calibration dataset.

  • use_auth_token (bool, defaults to False) — Whether to use the token generated when running transformers-cli login (necessary for some datasets like ImageNet).

Creates the calibration datasets.Dataset to use for the post-training static quantization calibration step.

partial_fit

<source>

( dataset: Datasetcalibration_config: CalibrationConfigonnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx'operators_to_quantize: typing.Optional[typing.List[str]] = Nonebatch_size: int = 1use_external_data_format: bool = Falseuse_gpu: bool = Falseforce_symmetric_range: bool = False )

Parameters

  • dataset (Dataset) — The dataset to use when performing the calibration step.

  • calibration_config (CalibrationConfig) — The configuration containing the parameters related to the calibration step.

  • onnx_augmented_model_name (Union[str, Path], defaults to "augmented_model.onnx") — The path used to save the augmented model used to collect the quantization ranges.

  • operators_to_quantize (Optional[List[str]], defaults to None) — List of the operators types to quantize.

  • batch_size (int, defaults to 1) — The batch size to use when collecting the quantization ranges values.

  • use_external_data_format (bool, defaults to False) — Whether uto se external data format to store model which size is >= 2Gb.

  • use_gpu (bool, defaults to False) — Whether to use the GPU when collecting the quantization ranges values.

  • force_symmetric_range (bool, defaults to False) — Whether to make the quantization ranges symmetric.

Performs the calibration step and collects the quantization ranges without computing them.

quantize

<source>

( quantization_config: QuantizationConfigsave_dir: typing.Union[str, pathlib.Path]file_suffix: typing.Optional[str] = 'quantized'calibration_tensors_range: typing.Union[typing.Dict[str, typing.Tuple[float, float]], NoneType] = Noneuse_external_data_format: bool = Falsepreprocessor: typing.Optional[optimum.onnxruntime.preprocessors.quantization.QuantizationPreprocessor] = None )

Parameters

  • quantization_config (QuantizationConfig) — The configuration containing the parameters related to quantization.

  • save_dir (Union[str, Path]) — The directory where the quantized model should be saved.

  • file_suffix (Optional[str], defaults to "quantized") — The file_suffix used to save the quantized model.

  • calibration_tensors_range (Optional[Dict[str, Tuple[float, float]]], defaults to None) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization.

  • use_external_data_format (bool, defaults to False) — Whether to use external data format to store model which size is >= 2Gb.

  • preprocessor (Optional[QuantizationPreprocessor], defaults to None) — The preprocessor to use to collect the nodes to include or exclude from quantization.

Quantizes a model given the optimization specifications defined in quantization_config.

Last updated