Quantization
Quantization
ORTQuantizer
class optimum.onnxruntime.ORTQuantizer
( onnx_model_path: Pathconfig: typing.Optional[ForwardRef('PretrainedConfig')] = None )
Handles the ONNX Runtime quantization process for models shared on boincai.com/models.
compute_ranges
( )
Computes the quantization ranges.
fit
( dataset: Datasetcalibration_config: CalibrationConfigonnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx'operators_to_quantize: typing.Optional[typing.List[str]] = Nonebatch_size: int = 1use_external_data_format: bool = Falseuse_gpu: bool = Falseforce_symmetric_range: bool = False )
Parameters
dataset (
Dataset) β The dataset to use when performing the calibration step.calibration_config (
~CalibrationConfig) β The configuration containing the parameters related to the calibration step.onnx_augmented_model_name (
Union[str, Path], defaults to"augmented_model.onnx") β The path used to save the augmented model used to collect the quantization ranges.operators_to_quantize (
Optional[List[str]], defaults toNone) β List of the operators types to quantize.batch_size (
int, defaults to 1) β The batch size to use when collecting the quantization ranges values.use_external_data_format (
bool, defaults toFalse) β Whether to use external data format to store model which size is >= 2Gb.use_gpu (
bool, defaults toFalse) β Whether to use the GPU when collecting the quantization ranges values.force_symmetric_range (
bool, defaults toFalse) β Whether to make the quantization ranges symmetric.
Performs the calibration step and computes the quantization ranges.
from_pretrained
( model_or_path: typing.Union[ForwardRef('ORTModel'), str, pathlib.Path]file_name: typing.Optional[str] = None )
Parameters
model_or_path (
Union[ORTModel, str, Path]) β Can be either:A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.
Or an
ORTModelForXXclass, e.g.,ORTModelForQuestionAnswering.
file_name(
Optional[str], defaults toNone) β Overwrites the default model file name from"model.onnx"tofile_name. This allows you to load different model files from the same repository or directory.
Instantiates a ORTQuantizer from an ONNX model file or an ORTModel.
get_calibration_dataset
( dataset_name: strnum_samples: int = 100dataset_config_name: typing.Optional[str] = Nonedataset_split: typing.Optional[str] = Nonepreprocess_function: typing.Optional[typing.Callable] = Nonepreprocess_batch: bool = Trueseed: int = 2016use_auth_token: bool = False )
Parameters
dataset_name (
str) β The dataset repository name on the BOINC AI Hub or path to a local directory containing data files to load to use for the calibration step.num_samples (
int, defaults to 100) β The maximum number of samples composing the calibration dataset.dataset_config_name (
Optional[str], defaults toNone) β The name of the dataset configuration.dataset_split (
Optional[str], defaults toNone) β Which split of the dataset to use to perform the calibration step.preprocess_function (
Optional[Callable], defaults toNone) β Processing function to apply to each example after loading dataset.preprocess_batch (
bool, defaults toTrue) β Whether thepreprocess_functionshould be batched.seed (
int, defaults to 2016) β The random seed to use when shuffling the calibration dataset.use_auth_token (
bool, defaults toFalse) β Whether to use the token generated when runningtransformers-cli login(necessary for some datasets like ImageNet).
Creates the calibration datasets.Dataset to use for the post-training static quantization calibration step.
partial_fit
( dataset: Datasetcalibration_config: CalibrationConfigonnx_augmented_model_name: typing.Union[str, pathlib.Path] = 'augmented_model.onnx'operators_to_quantize: typing.Optional[typing.List[str]] = Nonebatch_size: int = 1use_external_data_format: bool = Falseuse_gpu: bool = Falseforce_symmetric_range: bool = False )
Parameters
dataset (
Dataset) β The dataset to use when performing the calibration step.calibration_config (
CalibrationConfig) β The configuration containing the parameters related to the calibration step.onnx_augmented_model_name (
Union[str, Path], defaults to"augmented_model.onnx") β The path used to save the augmented model used to collect the quantization ranges.operators_to_quantize (
Optional[List[str]], defaults toNone) β List of the operators types to quantize.batch_size (
int, defaults to 1) β The batch size to use when collecting the quantization ranges values.use_external_data_format (
bool, defaults toFalse) β Whether uto se external data format to store model which size is >= 2Gb.use_gpu (
bool, defaults toFalse) β Whether to use the GPU when collecting the quantization ranges values.force_symmetric_range (
bool, defaults toFalse) β Whether to make the quantization ranges symmetric.
Performs the calibration step and collects the quantization ranges without computing them.
quantize
( quantization_config: QuantizationConfigsave_dir: typing.Union[str, pathlib.Path]file_suffix: typing.Optional[str] = 'quantized'calibration_tensors_range: typing.Union[typing.Dict[str, typing.Tuple[float, float]], NoneType] = Noneuse_external_data_format: bool = Falsepreprocessor: typing.Optional[optimum.onnxruntime.preprocessors.quantization.QuantizationPreprocessor] = None )
Parameters
quantization_config (
QuantizationConfig) β The configuration containing the parameters related to quantization.save_dir (
Union[str, Path]) β The directory where the quantized model should be saved.file_suffix (
Optional[str], defaults to"quantized") β The file_suffix used to save the quantized model.calibration_tensors_range (
Optional[Dict[str, Tuple[float, float]]], defaults toNone) β The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization.use_external_data_format (
bool, defaults toFalse) β Whether to use external data format to store model which size is >= 2Gb.preprocessor (
Optional[QuantizationPreprocessor], defaults toNone) β The preprocessor to use to collect the nodes to include or exclude from quantization.
Quantizes a model given the optimization specifications defined in quantization_config.
Last updated