Reference
Reference
INCQuantizer
class optimum.intel.INCQuantizer
( model: Module eval_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None calibration_fn: typing.Union[typing.Callable[[transformers.modeling_utils.PreTrainedModel], int], NoneType] = None task: typing.Optional[str] = None seed: int = 42 )
Handle the Neural Compressor quantization process.
get_calibration_dataset
( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: str = 'train' preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True use_auth_token: bool = False )
Parameters
dataset_name (
str
) — The dataset repository name on the BOINC AI Hub or path to a local directory containing data files in generic formats and optionally a dataset script, if it requires some code to read the data files.num_samples (
int
, defaults to 100) — The maximum number of samples composing the calibration dataset.dataset_config_name (
str
, optional) — The name of the dataset configuration.dataset_split (
str
, defaults to"train"
) — Which split of the dataset to use to perform the calibration step.preprocess_function (
Callable
, optional) — Processing function to apply to each example after loading dataset.preprocess_batch (
bool
, defaults toTrue
) — Whether thepreprocess_function
should be batched.use_auth_token (
bool
, defaults toFalse
) — Whether to use the token generated when runningtransformers-cli login
.
Create the calibration datasets.Dataset
to use for the post-training static quantization calibration step.
quantize
( quantization_config: PostTrainingQuantConfig save_directory: typing.Union[str, pathlib.Path] calibration_dataset: Dataset = None batch_size: int = 8 data_collator: typing.Optional[DataCollator] = None remove_unused_columns: bool = True file_name: str = None weight_only: bool = False **kwargs )
Parameters
quantization_config (
PostTrainingQuantConfig
) — The configuration containing the parameters related to quantization.save_directory (
Union[str, Path]
) — The directory where the quantized model should be saved.calibration_dataset (
datasets.Dataset
, defaults toNone
) — The dataset to use for the calibration step, needed for post-training static quantization.batch_size (
int
, defaults to 8) — The number of calibration samples to load per batch.data_collator (
DataCollator
, defaults toNone
) — The function to use to form a batch from a list of elements of the calibration dataset.remove_unused_columns (
bool
, defaults toTrue
) — Whether or not to remove the columns unused by the model forward method.weight_only (
bool
, defaults toFalse
) — Whether compress weights to integer precision (4-bit by default) while keeping activations floating-point. Fits best for LLM footprint reduction and performance acceleration.
Quantize a model given the optimization specifications defined in quantization_config
.
INCTrainer
class optimum.intel.INCTrainer
( model: typing.Union[transformers.modeling_utils.PreTrainedModel, torch.nn.modules.module.Module] = None args: TrainingArguments = None data_collator: typing.Optional[DataCollator] = None train_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None eval_dataset: typing.Optional[torch.utils.data.dataset.Dataset] = None tokenizer: typing.Optional[transformers.tokenization_utils_base.PreTrainedTokenizerBase] = None model_init: typing.Callable[[], transformers.modeling_utils.PreTrainedModel] = None compute_metrics: typing.Union[typing.Callable[[transformers.trainer_utils.EvalPrediction], typing.Dict], NoneType] = None callbacks: typing.Optional[typing.List[transformers.trainer_callback.TrainerCallback]] = None optimizers: typing.Tuple[torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler.LambdaLR] = (None, None) preprocess_logits_for_metrics: typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor] = None quantization_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None pruning_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None distillation_config: typing.Optional[neural_compressor.conf.pythonic_config._BaseQuantizationConfig] = None task: typing.Optional[str] = None save_onnx_model: bool = False )
INCTrainer enables Intel Neural Compression quantization aware training, pruning and distillation.
compute_distillation_loss
( student_outputs teacher_outputs )
How the distillation loss is computed given the student and teacher outputs.
compute_loss
( model inputs return_outputs = False )
How the loss is computed by Trainer. By default, all models return the loss in the first element.
save_model
( output_dir: typing.Optional[str] = None _internal_call: bool = False save_onnx_model: typing.Optional[bool] = None )
Will save the model, so you can reload it using from_pretrained()
. Will only save from the main process.
INCModel
class optimum.intel.INCModel
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )
INCModelForSequenceClassification
class optimum.intel.INCModelForSequenceClassification
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )
INCModelForQuestionAnswering
class optimum.intel.INCModelForQuestionAnswering
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )
INCModelForTokenClassification
class optimum.intel.INCModelForTokenClassification
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )
INCModelForMultipleChoice
class optimum.intel.INCModelForMultipleChoice
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )
INCModelForMaskedLM
class optimum.intel.INCModelForMaskedLM
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )
INCModelForCausalLM
class optimum.intel.INCModelForCausalLM
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None use_cache: bool = True **kwargs )
Parameters
model (
PyTorch model
) — is the main class used to run inference.config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model.device (
str
, defaults to"cpu"
) — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.
Neural-compressor Model with a causal language modeling head on top (linear layer with weights tied to the input embeddings).
This model check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)
INCModelForSeq2SeqLM
class optimum.intel.INCModelForSeq2SeqLM
( model config: PretrainedConfig = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None q_config: typing.Dict = None inc_config: typing.Dict = None **kwargs )
Last updated