Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  • Quantization
  • FuriosaAIQuantizer
  1. FURIOSA
  2. REFERENCE

Quantization

PreviousConfigurationNextONNX RUNTIME

Last updated 1 year ago

Quantization

FuriosaAIQuantizer

class optimum.furiosa.FuriosaAIQuantizer

( model_path: Path config: Optional = None )

Handles the FuriosaAI quantization process for models shared on huggingface.co/models.

compute_ranges

( )

Computes the quantization ranges.

fit

( dataset: Dataset calibration_config: CalibrationConfig batch_size: int = 1 )

Parameters

  • dataset (Dataset) — The dataset to use when performing the calibration step.

  • calibration_config () — The configuration containing the parameters related to the calibration step.

  • batch_size (int, optional, defaults to 1) — The batch size to use when collecting the quantization ranges values.

Performs the calibration step and computes the quantization ranges.

from_pretrained

( model_or_path: Union file_name: Optional = None )

Parameters

  • model_or_path (Union[FuriosaAIModel, str, Path]) — Can be either:

    • A path to a saved exported ONNX Intermediate Representation (IR) model, e.g., `./my_model_directory/.

    • Or an FuriosaAIModelModelForXX class, e.g., FuriosaAIModelModelForImageClassification.

  • file_name(Optional[str], optional) — Overwrites the default model file name from "model.onnx" to file_name. This allows you to load different model files from the same repository or directory.

Instantiates a FuriosaAIQuantizer from a model path.

get_calibration_dataset

( dataset_name: str num_samples: int = 100 dataset_config_name: Optional = None dataset_split: Optional = None preprocess_function: Optional = None preprocess_batch: bool = True seed: int = 2016 use_auth_token: bool = False )

Parameters

  • dataset_name (str) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files to load to use for the calibration step.

  • num_samples (int, optional, defaults to 100) — The maximum number of samples composing the calibration dataset.

  • dataset_config_name (Optional[str], optional) — The name of the dataset configuration.

  • dataset_split (Optional[str], optional) — Which split of the dataset to use to perform the calibration step.

  • preprocess_function (Optional[Callable], optional) — Processing function to apply to each example after loading dataset.

  • preprocess_batch (bool, optional, defaults to True) — Whether the preprocess_function should be batched.

  • seed (int, optional, defaults to 2016) — The random seed to use when shuffling the calibration dataset.

  • use_auth_token (bool, optional, defaults to False) — Whether to use the token generated when running transformers-cli login (necessary for some datasets like ImageNet).

Creates the calibration datasets.Dataset to use for the post-training static quantization calibration step.

partial_fit

( dataset: Dataset calibration_config: CalibrationConfig batch_size: int = 1 )

Parameters

  • dataset (Dataset) — The dataset to use when performing the calibration step.

  • calibration_config (CalibrationConfig) — The configuration containing the parameters related to the calibration step.

  • batch_size (int, optional, defaults to 1) — The batch size to use when collecting the quantization ranges values.

Performs the calibration step and collects the quantization ranges without computing them.

quantize

( quantization_config: QuantizationConfig save_dir: Union file_suffix: Optional = 'quantized' calibration_tensors_range: Optional = None )

Parameters

  • quantization_config (QuantizationConfig) — The configuration containing the parameters related to quantization.

  • save_dir (Union[str, Path]) — The directory where the quantized model should be saved.

  • file_suffix (Optional[str], optional, defaults to "quantized") — The file_suffix used to save the quantized model.

  • calibration_tensors_range (Optional[Dict[NodeName, Tuple[float, float]]], optional) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization.

Quantizes a model given the optimization specifications defined in quantization_config.

🌍
🌍
<source>
<source>
<source>
~CalibrationConfig
<source>
<source>
<source>
<source>