Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  • Quantization
  • Static Quantization example
  1. FURIOSA
  2. HOW-TO GUIDES

Quantization

PreviousModelingNextREFERENCE

Last updated 1 year ago

Quantization

🌍 Optimum provides an optimum.furiosa package that enables you to apply quantization on many models hosted on the BOINC AI Hub using the quantization tool.

The quantization process is abstracted via the FuriosaAIConfig and the FuriosaAIQuantizer classes. The former allows you to specify how quantization should be done, while the latter effectively handles quantization.

Static Quantization example

The FuriosaAIQuantizer class can be used to quantize statically your ONNX model. Below you will find an easy end-to-end example on how to quantize statically .

Copied

>>> from functools import partial
>>> from pathlib import Path
>>> from transformers import AutoFeatureExtractor
>>> from optimum.furiosa import FuriosaAIQuantizer, FuriosaAIModelForImageClassification
>>> from optimum.furiosa.configuration import AutoCalibrationConfig
>>> from optimum.furiosa.utils import export_model_to_onnx

>>> model_id = "eugenecamus/resnet-50-base-beans-demo"

# Convert PyTorch model convert to ONNX and create Quantizer and setup config

>>> feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)

>>> batch_size = 1
>>> image_size = feature_extractor.size["shortest_edge"]
>>> num_labels = 3
>>> onnx_model_name = "model.onnx"
>>> output_dir = "output"
>>> onnx_model_path = Path(output_dir) / onnx_model_name

>>> export_model_to_onnx(
...    model_id,
...    save_dir=output_dir,
...    input_shape_dict={"pixel_values": [batch_size, 3, image_size, image_size]},
...    output_shape_dict={"logits": [batch_size, num_labels]},
...    file_name=onnx_model_name,
)
>>> quantizer = FuriosaAIQuantizer.from_pretrained(output_dir, file_name=onnx_model_name)
>>> qconfig = QuantizationConfig()

# Create the calibration dataset
>>> def preprocess_fn(ex, feature_extractor):
...     return feature_extractor(ex["image"])

>>> calibration_dataset = quantizer.get_calibration_dataset(
...     "beans",
...     preprocess_function=partial(preprocess_fn, feature_extractor=feature_extractor),
...     num_samples=50,
...     dataset_split="train",
... )

# Create the calibration configuration containing the parameters related to calibration.
>>> calibration_config = AutoCalibrationConfig.mse_asym(calibration_dataset)

# Perform the calibration step: computes the activations quantization ranges
>>> ranges = quantizer.fit(
...     dataset=calibration_dataset,
...     calibration_config=calibration_config,
... )

# Apply static quantization on the model
>>> model_quantized_path = quantizer.quantize(
...     save_dir=output,
...     calibration_tensors_range=ranges,
...     quantization_config=qconfig,
... )
🌍
🌍
Furiosa
eugenecamus/resnet-50-base-beans-demo