Reference

Reference

INCQuantizer

class optimum.intel.INCQuantizer

<source>

( model: Module eval_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None calibration_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None task: typing.Optional[str] = None seed: int = 42 )

Handle the Neural Compressor quantization process.

get_calibration_dataset

<source>

( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: str = 'train' preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True use_auth_token: bool = False )

Parameters

  • dataset_name (str) — The dataset repository name on the BOINC AI Hub or path to a local directory containing data files in generic formats and optionally a dataset script, if it requires some code to read the data files.

  • num_samples (int, defaults to 100) — The maximum number of samples composing the calibration dataset.

  • dataset_config_name (str, optional) — The name of the dataset configuration.

  • dataset_split (str, defaults to "train") — Which split of the dataset to use to perform the calibration step.

  • preprocess_function (Callable, optional) — Processing function to apply to each example after loading dataset.

  • preprocess_batch (bool, defaults to True) — Whether the preprocess_function should be batched.

  • use_auth_token (bool, defaults to False) — Whether to use the token generated when running transformers-cli login.

Create the calibration datasets.Dataset to use for the post-training static quantization calibration step.

quantize

<source>

( quantization_config: PostTrainingQuantConfig save_directory: typing.Union[str, pathlib.Path] calibration_dataset: Dataset = None batch_size: int = 8 data_collator: typing.Optional[DataCollator] = None remove_unused_columns: bool = True file_name: str = None weight_only: bool = False **kwargs )

Parameters

  • quantization_config (PostTrainingQuantConfig) — The configuration containing the parameters related to quantization.

  • save_directory (Union[str, Path]) — The directory where the quantized model should be saved.

  • calibration_dataset (datasets.Dataset, defaults to None) — The dataset to use for the calibration step, needed for post-training static quantization.

  • batch_size (int, defaults to 8) — The number of calibration samples to load per batch.

  • data_collator (DataCollator, defaults to None) — The function to use to form a batch from a list of elements of the calibration dataset.

  • remove_unused_columns (bool, defaults to True) — Whether or not to remove the columns unused by the model forward method.

  • weight_only (bool, defaults to False) — Whether compress weights to integer precision (4-bit by default) while keeping activations floating-point. Fits best for LLM footprint reduction and performance acceleration.

Quantize a model given the optimization specifications defined in quantization_config.

INCTrainer

class optimum.intel.INCTrainer

<source>

( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module] = None args: TrainingArguments = None data_collator: typing.Optional[DataCollator] = None train_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None eval_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None tokenizer: typing.Optional[transformers.tokenization_utils_base.PreTrainedTokenizerBase] = None model_init: typing.Callable[[], transformers.modeling_utils.PreTrainedModel] = None compute_metrics: typing.Union[typing.Callable[[transformers.trainer_utils.EvalPrediction], typing.Dict], NoneType] = None callbacks: typing.Optional[typing.List[transformers.trainer_callback.TrainerCallback]] = None optimizers: typing.Tuple[torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler.LambdaLR] = (None, None) preprocess_logits_for_metrics: typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None quantization_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None pruning_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None distillation_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None task: typing.Optional[str] = None save_onnx_model: bool = False )

INCTrainer enables Intel Neural Compression quantization aware training, pruning and distillation.

compute_distillation_loss

<source>

( student_outputs teacher_outputs )

How the distillation loss is computed given the student and teacher outputs.

compute_loss

<source>

( model inputs return_outputs = False )

How the loss is computed by Trainer. By default, all models return the loss in the first element.

save_model

<source>

( output_dir: typing.Optional[str] = None _internal_call: bool = False save_onnx_model: typing.Optional[bool] = None )

Will save the model, so you can reload it using from_pretrained(). Will only save from the main process.

INCModel

class optimum.intel.INCModel

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )

INCModelForSequenceClassification

class optimum.intel.INCModelForSequenceClassification

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )

INCModelForQuestionAnswering

class optimum.intel.INCModelForQuestionAnswering

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )

INCModelForTokenClassification

class optimum.intel.INCModelForTokenClassification

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )

INCModelForMultipleChoice

class optimum.intel.INCModelForMultipleChoice

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )

INCModelForMaskedLM

class optimum.intel.INCModelForMaskedLM

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )

INCModelForCausalLM

class optimum.intel.INCModelForCausalLM

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None use_cache: bool = True **kwargs )

Parameters

  • model (PyTorch model) — is the main class used to run inference.

  • config (transformers.PretrainedConfig) — PretrainedConfig is the Model configuration class with all the parameters of the model.

  • device (str, defaults to "cpu") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

Neural-compressor Model with a causal language modeling head on top (linear layer with weights tied to the input embeddings).

This model check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

INCModelForSeq2SeqLM

class optimum.intel.INCModelForSeq2SeqLM

<source>

( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )

Last updated