# Reference

## Reference

### INCQuantizer

#### class optimum.intel.INCQuantizer

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/quantization.py#L92)

( model: Module eval\_fn: typing.Union\[typing.Callable\[\[transformers.modeling\_utils.PreTrainedModel], int], NoneType] = None calibration\_fn: typing.Union\[typing.Callable\[\[transformers.modeling\_utils.PreTrainedModel], int], NoneType] = None task: typing.Optional\[str] = None seed: int = 42 )

Handle the Neural Compressor quantization process.

**get\_calibration\_dataset**

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/quantization.py#L388)

( dataset\_name: str num\_samples: int = 100 dataset\_config\_name: typing.Optional\[str] = None dataset\_split: str = 'train' preprocess\_function: typing.Optional\[typing.Callable] = None preprocess\_batch: bool = True use\_auth\_token: bool = False )

Parameters

* **dataset\_name** (`str`) — The dataset repository name on the BOINC AI Hub or path to a local directory containing data files in generic formats and optionally a dataset script, if it requires some code to read the data files.
* **num\_samples** (`int`, defaults to 100) — The maximum number of samples composing the calibration dataset.
* **dataset\_config\_name** (`str`, *optional*) — The name of the dataset configuration.
* **dataset\_split** (`str`, defaults to `"train"`) — Which split of the dataset to use to perform the calibration step.
* **preprocess\_function** (`Callable`, *optional*) — Processing function to apply to each example after loading dataset.
* **preprocess\_batch** (`bool`, defaults to `True`) — Whether the `preprocess_function` should be batched.
* **use\_auth\_token** (`bool`, defaults to `False`) — Whether to use the token generated when running `transformers-cli login`.

Create the calibration `datasets.Dataset` to use for the post-training static quantization calibration step.

**quantize**

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/quantization.py#L134)

( quantization\_config: PostTrainingQuantConfig save\_directory: typing.Union\[str, pathlib.Path] calibration\_dataset: Dataset = None batch\_size: int = 8 data\_collator: typing.Optional\[DataCollator] = None remove\_unused\_columns: bool = True file\_name: str = None weight\_only: bool = False \*\*kwargs )

Parameters

* **quantization\_config** (`PostTrainingQuantConfig`) — The configuration containing the parameters related to quantization.
* **save\_directory** (`Union[str, Path]`) — The directory where the quantized model should be saved.
* **calibration\_dataset** (`datasets.Dataset`, defaults to `None`) — The dataset to use for the calibration step, needed for post-training static quantization.
* **batch\_size** (`int`, defaults to 8) — The number of calibration samples to load per batch.
* **data\_collator** (`DataCollator`, defaults to `None`) — The function to use to form a batch from a list of elements of the calibration dataset.
* **remove\_unused\_columns** (`bool`, defaults to `True`) — Whether or not to remove the columns unused by the model forward method.
* **weight\_only** (`bool`, defaults to `False`) — Whether compress weights to integer precision (4-bit by default) while keeping activations floating-point. Fits best for LLM footprint reduction and performance acceleration.

Quantize a model given the optimization specifications defined in `quantization_config`.

### INCTrainer

#### class optimum.intel.INCTrainer

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/trainer.py#L88)

( model: typing.Union\[transformers.modeling\_utils.PreTrainedModel, torch.nn.modules.module.Module] = None args: TrainingArguments = None data\_collator: typing.Optional\[DataCollator] = None train\_dataset: typing.Optional\[torch.utils.data.dataset.Dataset] = None eval\_dataset: typing.Optional\[torch.utils.data.dataset.Dataset] = None tokenizer: typing.Optional\[transformers.tokenization\_utils\_base.PreTrainedTokenizerBase] = None model\_init: typing.Callable\[\[], transformers.modeling\_utils.PreTrainedModel] = None compute\_metrics: typing.Union\[typing.Callable\[\[transformers.trainer\_utils.EvalPrediction], typing.Dict], NoneType] = None callbacks: typing.Optional\[typing.List\[transformers.trainer\_callback.TrainerCallback]] = None optimizers: typing.Tuple\[torch.optim.optimizer.Optimizer, torch.optim.lr\_scheduler.LambdaLR] = (None, None) preprocess\_logits\_for\_metrics: typing.Callable\[\[torch.Tensor, torch.Tensor], torch.Tensor] = None quantization\_config: typing.Optional\[neural\_compressor.conf.pythonic\_config.\_BaseQuantizationConfig] = None pruning\_config: typing.Optional\[neural\_compressor.conf.pythonic\_config.\_BaseQuantizationConfig] = None distillation\_config: typing.Optional\[neural\_compressor.conf.pythonic\_config.\_BaseQuantizationConfig] = None task: typing.Optional\[str] = None save\_onnx\_model: bool = False )

INCTrainer enables Intel Neural Compression quantization aware training, pruning and distillation.

**compute\_distillation\_loss**

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/trainer.py#L772)

( student\_outputs teacher\_outputs )

How the distillation loss is computed given the student and teacher outputs.

**compute\_loss**

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/trainer.py#L696)

( model inputs return\_outputs = False )

How the loss is computed by Trainer. By default, all models return the loss in the first element.

**save\_model**

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/trainer.py#L546)

( output\_dir: typing.Optional\[str] = None \_internal\_call: bool = False save\_onnx\_model: typing.Optional\[bool] = None )

Will save the model, so you can reload it using `from_pretrained()`. Will only save from the main process.

### INCModel

#### class optimum.intel.INCModel

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_base.py#L64)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None \*\*kwargs )

### INCModelForSequenceClassification

#### class optimum.intel.INCModelForSequenceClassification

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_base.py#L228)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None \*\*kwargs )

### INCModelForQuestionAnswering

#### class optimum.intel.INCModelForQuestionAnswering

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_base.py#L223)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None \*\*kwargs )

### INCModelForTokenClassification

#### class optimum.intel.INCModelForTokenClassification

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_base.py#L233)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None \*\*kwargs )

### INCModelForMultipleChoice

#### class optimum.intel.INCModelForMultipleChoice

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_base.py#L238)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None \*\*kwargs )

### INCModelForMaskedLM

#### class optimum.intel.INCModelForMaskedLM

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_base.py#L248)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None \*\*kwargs )

### INCModelForCausalLM

#### class optimum.intel.INCModelForCausalLM

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_decoder.py#L38)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None use\_cache: bool = True \*\*kwargs )

Parameters

* **model** (`PyTorch model`) — is the main class used to run inference.
* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model.
* **device** (`str`, defaults to `"cpu"`) — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

Neural-compressor Model with a causal language modeling head on top (linear layer with weights tied to the input embeddings).

This model check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

### INCModelForSeq2SeqLM

#### class optimum.intel.INCModelForSeq2SeqLM

[\<source>](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/neural_compressor/modeling_base.py#L243)

( model config: PretrainedConfig = None model\_save\_dir: typing.Union\[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q\_config: typing.Dict = None inc\_config: typing.Dict = None \*\*kwargs )


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://boinc-ai.gitbook.io/optimum/intel/neural-compressor/reference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
