Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  • Reference
  • OVModelForFeatureExtraction
  • OVModelForMaskedLM
  • OVModelForQuestionAnswering
  • OVModelForSequenceClassification
  • OVModelForTokenClassification
  • OVModelForAudioClassification
  • OVModelForAudioFrameClassification
  • OVModelForCTC
  • OVModelForAudioXVector
  • OVModelForImageClassification
  • OVModelForCausalLM
  • OVModelForSeq2SeqLM
  • OVQuantizer
  1. INTEL
  2. OPENVINO

Reference

PreviousOptimizationNextAWS TRAINIUM/INFERENTIA

Last updated 1 year ago

Reference

OVModelForFeatureExtraction

class optimum.intel.OVModelForFeatureExtraction

( model = None config = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a BaseModelOutput for feature extraction tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

  • attention_mask (torch.Tensor), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

  • token_type_ids (torch.Tensor, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 1 for tokens that are sentence A,

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of feature extraction using transformers.pipelines:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.intel import OVModelForFeatureExtraction

>>> tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
>>> model = OVModelForFeatureExtraction.from_pretrained("sentence-transformers/all-MiniLM-L6-v2", export=True)
>>> pipe = pipeline("feature-extraction", model=model, tokenizer=tokenizer)
>>> outputs = pipe("My Name is Peter and I live in New York.")

OVModelForMaskedLM

class optimum.intel.OVModelForMaskedLM

( model = None config = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a MaskedLMOutput for masked language modeling tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

  • attention_mask (torch.Tensor), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

  • token_type_ids (torch.Tensor, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 1 for tokens that are sentence A,

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of masked language modeling using transformers.pipelines:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.intel import OVModelForMaskedLM

>>> tokenizer = AutoTokenizer.from_pretrained("roberta-base")
>>> model = OVModelForMaskedLM.from_pretrained("roberta-base", export=True)
>>> mask_token = tokenizer.mask_token
>>> pipe = pipeline("fill-mask", model=model, tokenizer=tokenizer)
>>> outputs = pipe("The goal of life is" + mask_token)

OVModelForQuestionAnswering

class optimum.intel.OVModelForQuestionAnswering

( model = None config = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a QuestionAnsweringModelOutput for extractive question-answering tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

  • attention_mask (torch.Tensor), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

  • token_type_ids (torch.Tensor, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 1 for tokens that are sentence A,

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of question answering using transformers.pipeline:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.intel import OVModelForQuestionAnswering

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
>>> model = OVModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad", export=True)
>>> pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
>>> outputs = pipe(question, text)

OVModelForSequenceClassification

class optimum.intel.OVModelForSequenceClassification

( model = None config = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

  • attention_mask (torch.Tensor), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

  • token_type_ids (torch.Tensor, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 1 for tokens that are sentence A,

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of sequence classification using transformers.pipeline:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.intel import OVModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
>>> model = OVModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", export=True)
>>> pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
>>> outputs = pipe("Hello, my dog is cute")

OVModelForTokenClassification

class optimum.intel.OVModelForTokenClassification

( model = None config = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a TokenClassifierOutput for token classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

( input_ids: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray] token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

  • attention_mask (torch.Tensor), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

  • token_type_ids (torch.Tensor, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 1 for tokens that are sentence A,

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of token classification using transformers.pipelines:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.intel import OVModelForTokenClassification

>>> tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
>>> model = OVModelForTokenClassification.from_pretrained("dslim/bert-base-NER", export=True)
>>> pipe = pipeline("token-classification", model=model, tokenizer=tokenizer)
>>> outputs = pipe("My Name is Peter and I live in New York.")

OVModelForAudioClassification

class optimum.intel.OVModelForAudioClassification

( model = None config = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a SequenceClassifierOutput for audio classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

( input_values: typing.Union[torch.Tensor, numpy.ndarray] attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

  • attention_mask (torch.Tensor), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

  • token_type_ids (torch.Tensor, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 1 for tokens that are sentence A,

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of audio classification using transformers.pipelines:

Copied

>>> from datasets import load_dataset
>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.intel import OVModelForAudioClassification

>>> preprocessor = AutoFeatureExtractor.from_pretrained("superb/hubert-base-superb-er")
>>> model = OVModelForAudioClassification.from_pretrained("superb/hubert-base-superb-er", export=True)
>>> pipe = pipeline("audio-classification", model=model, feature_extractor=preprocessor)
>>> dataset = load_dataset("superb", "ks", split="test")
>>> audio_file = dataset[3]["audio"]["array"]
>>> outputs = pipe(audio_file)

OVModelForAudioFrameClassification

class optimum.intel.OVModelForAudioFrameClassification

( model: Model config: PretrainedConfig = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model for with a frame classification head on top for tasks like Speaker Diarization.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Audio Frame Classification model for OpenVINO.

forward

( input_values: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None **kwargs )

Parameters

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of audio frame classification:

Copied

>>> from transformers import AutoFeatureExtractor
>>> from optimum.intel import OVModelForAudioFrameClassification
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("ba-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("anton-l/wav2vec2-base-superb-sd")
>>> model =  OVModelForAudioFrameClassification.from_pretrained("anton-l/wav2vec2-base-superb-sd", export=True)

>>> inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt", sampling_rate=sampling_rate)
>>>    logits = model(**inputs).logits

>>> probabilities = torch.sigmoid(torch.as_tensor(logits)[0])
>>> labels = (probabilities > 0.5).long()
>>> labels[0].tolist()

OVModelForCTC

class optimum.intel.OVModelForCTC

( model: Model config: PretrainedConfig = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

Onnx Model with a language modeling head on top for Connectionist Temporal Classification (CTC).

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

CTC model for OpenVINO.

forward

( input_values: typing.Optional[torch.Tensor] = None attention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None **kwargs )

Parameters

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of CTC:

Copied

>>> from transformers import AutoFeatureExtractor
>>> from optimum.intel import OVModelForCTC
>>> from datasets import load_dataset

>>> dataset = load_dataset("ba-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> processor = AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")
>>> model = OVModelForCTC.from_pretrained("facebook/hubert-large-ls960-ft", export=True)

>>> # audio file is decoded on the fly
>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="np")
>>> logits = model(**inputs).logits
>>> predicted_ids = np.argmax(logits, axis=-1)

>>> transcription = processor.batch_decode(predicted_ids)

OVModelForAudioXVector

class optimum.intel.OVModelForAudioXVector

( model: Model config: PretrainedConfig = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

Onnx Model with an XVector feature extraction head on top for tasks like Speaker Verification.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Audio XVector model for OpenVINO.

forward

( input_values: typing.Optional[torch.Tensor] = None attention_mask: typing.Optional[torch.Tensor] = None **kwargs )

Parameters

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of Audio XVector:

Copied

>>> from transformers import AutoFeatureExtractor
>>> from optimum.intel import OVModelForAudioXVector
>>> from datasets import load_dataset
>>> import torch

>>> dataset = load_dataset("ba-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate

>>> feature_extractor = AutoFeatureExtractor.from_pretrained("anton-l/wav2vec2-base-superb-sv")
>>> model = OVModelForAudioXVector.from_pretrained("anton-l/wav2vec2-base-superb-sv", export=True)

>>> # audio file is decoded on the fly
>>> inputs = feature_extractor(
...     [d["array"] for d in dataset[:2]["audio"]], sampling_rate=sampling_rate, return_tensors="pt", padding=True
... )
>>>     embeddings = model(**inputs).embeddings

>>> embeddings = torch.nn.functional.normalize(embeddings, dim=-1).cpu()

>>> cosine_sim = torch.nn.CosineSimilarity(dim=-1)
>>> similarity = cosine_sim(embeddings[0], embeddings[1])
>>> threshold = 0.7
>>> if similarity < threshold:
...     print("Speakers are not the same!")
>>> round(similarity.item(), 2)

OVModelForImageClassification

class optimum.intel.OVModelForImageClassification

( model = None config = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a ImageClassifierOutput for image classification tasks.

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

forward

( pixel_values: typing.Union[torch.Tensor, numpy.ndarray] **kwargs )

Parameters

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of image classification using transformers.pipelines:

Copied

>>> from transformers import AutoFeatureExtractor, pipeline
>>> from optimum.intel import OVModelForImageClassification

>>> preprocessor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
>>> model = OVModelForImageClassification.from_pretrained("google/vit-base-patch16-224", export=True)
>>> model.reshape(batch_size=1, sequence_length=3, height=224, width=224)
>>> pipe = pipeline("image-classification", model=model, feature_extractor=preprocessor)
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> outputs = pipe(url)

This class can also be used with [timm](https://github.com/boincai/pytorch-image-models)

Copied

>>> from transformers import pipeline
>>> from optimum.intel.openvino.modeling_timm import TimmImageProcessor
>>> from optimum.intel import OVModelForImageClassification

>>> model_id = "timm/vit_tiny_patch16_224.augreg_in21k"
>>> preprocessor = TimmImageProcessor.from_pretrained(model_id)
>>> model = OVModelForImageClassification.from_pretrained(model_id, export=True)
>>> pipe = pipeline("image-classification", model=model, feature_extractor=preprocessor)
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> outputs = pipe(url)

OVModelForCausalLM

class optimum.intel.OVModelForCausalLM

( model: Model config: PretrainedConfig = None device: str = 'CPU' dynamic_shapes: bool = True ov_config: typing.Union[typing.Dict[str, str], NoneType] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None **kwargs )

Parameters

  • model (openvino.runtime.Model) — is the main class used to run OpenVINO Runtime inference.

  • device (str, defaults to "CPU") — The device type for which the model will be optimized for. The resulting compiled model will contains nodes specific to this device.

  • dynamic_shapes (bool, defaults to True) — All the model’s dimension will be set to dynamic when set to True. Should be set to False for the model to not be dynamically reshaped by default.

  • ov_config (Optional[Dict], defaults to None) — The dictionnary containing the informations related to the model compilation.

  • compile (bool, defaults to True) — Disable the model compilation during the loading step when set to False. Can be useful to avoid unnecessary compilation, in the case where the model needs to be statically reshaped, the device modified or if FP16 conversion is enabled.

OpenVINO Model with a causal language modeling head on top (linear layer with weights tied to the input embeddings).

This model inherits from optimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

can_generate

( )

Returns True to validate the check that the model using GenerationMixin.generate() can indeed generate.

forward

( input_ids: LongTensor attention_mask: typing.Optional[torch.LongTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.FloatTensor]]] = None **kwargs )

Parameters

  • attention_mask (torch.Tensor), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

    • 1 for tokens that are not masked,

  • token_type_ids (torch.Tensor, optional) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:

    • 1 for tokens that are sentence A,

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

Copied

>>> from transformers import AutoTokenizer
>>> from optimum.intel import OVModelForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = OVModelForCausalLM.from_pretrained("gpt2", export=True)
>>> inputs = tokenizer("I love this story because", return_tensors="pt")
>>> gen_tokens = model.generate(**inputs, do_sample=True, temperature=0.9, min_length=20, max_length=20)
>>> tokenizer.batch_decode(gen_tokens)

Example using transformers.pipelines:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.intel import OVModelForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = OVModelForCausalLM.from_pretrained("gpt2", export=True)
>>> gen_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)
>>> text = "I love this story because"
>>> gen = gen_pipeline(text)

OVModelForSeq2SeqLM

class optimum.intel.OVModelForSeq2SeqLM

( encoder: Model decoder: Model decoder_with_past: Model = None config: PretrainedConfig = None **kwargs )

Parameters

  • encoder (openvino.runtime.Model) — The OpenVINO Runtime model associated to the encoder.

  • decoder (openvino.runtime.Model) — The OpenVINO Runtime model associated to the decoder.

  • decoder_with_past (openvino.runtime.Model) — The OpenVINO Runtime model associated to the decoder with past key values.

Sequence-to-sequence model with a language modeling head for OpenVINO inference.

forward

( input_ids: LongTensor = None attention_mask: typing.Optional[torch.FloatTensor] = None decoder_input_ids: typing.Optional[torch.LongTensor] = None encoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None **kwargs )

Parameters

  • input_ids (torch.LongTensor) — Indices of input sequence tokens in the vocabulary of shape (batch_size, encoder_sequence_length).

  • attention_mask (torch.LongTensor) — Mask to avoid performing attention on padding token indices, of shape (batch_size, encoder_sequence_length). Mask values selected in [0, 1].

  • decoder_input_ids (torch.LongTensor) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, decoder_sequence_length).

  • encoder_outputs (torch.FloatTensor) — The encoder last_hidden_state of shape (batch_size, encoder_sequence_length, hidden_size).

  • past_key_values (tuple(tuple(torch.FloatTensor), *optional*) — Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

Copied

>>> from transformers import AutoTokenizer
>>> from optimum.intel import OVModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("echarlaix/t5-small-openvino")
>>> model = OVModelForSeq2SeqLM.from_pretrained("echarlaix/t5-small-openvino")
>>> text = "He never went out without a book under his arm, and he often came back with two."
>>> inputs = tokenizer(text, return_tensors="pt")
>>> gen_tokens = model.generate(**inputs)
>>> outputs = tokenizer.batch_decode(gen_tokens)

Example using transformers.pipeline:

Copied

>>> from transformers import AutoTokenizer, pipeline
>>> from optimum.intel import OVModelForSeq2SeqLM

>>> tokenizer = AutoTokenizer.from_pretrained("echarlaix/t5-small-openvino")
>>> model = OVModelForSeq2SeqLM.from_pretrained("echarlaix/t5-small-openvino")
>>> pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
>>> text = "He never went out without a book under his arm, and he often came back with two."
>>> outputs = pipe(text)

OVQuantizer

class optimum.intel.OVQuantizer

( model: PreTrainedModel task: typing.Optional[str] = None seed: int = 42 **kwargs )

Handle the NNCF quantization process.

get_calibration_dataset

( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: str = 'train' preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True use_auth_token: bool = False cache_dir: typing.Optional[str] = None )

Parameters

  • dataset_name (str) — The dataset repository name on the BOINC AI Hub or path to a local directory containing data files in generic formats and optionally a dataset script, if it requires some code to read the data files.

  • num_samples (int, defaults to 100) — The maximum number of samples composing the calibration dataset.

  • dataset_config_name (str, optional) — The name of the dataset configuration.

  • dataset_split (str, defaults to "train") — Which split of the dataset to use to perform the calibration step.

  • preprocess_function (Callable, optional) — Processing function to apply to each example after loading dataset.

  • preprocess_batch (bool, defaults to True) — Whether the preprocess_function should be batched.

  • use_auth_token (bool, defaults to False) — Whether to use the token generated when running transformers-cli login.

  • cache_dir (str, optional) — Caching directory for a calibration dataset.

Create the calibration datasets.Dataset to use for the post-training static quantization calibration step.

quantize

( calibration_dataset: Dataset = None save_directory: typing.Union[str, pathlib.Path] = None quantization_config: OVConfig = None file_name: typing.Optional[str] = None batch_size: int = 1 data_collator: typing.Optional[DataCollator] = None remove_unused_columns: bool = True weights_only: bool = False **kwargs )

Parameters

  • calibration_dataset (datasets.Dataset) — The dataset to use for the calibration step.

  • save_directory (Union[str, Path]) — The directory where the quantized model should be saved.

  • quantization_config (OVConfig, optional) — The configuration containing the parameters related to quantization.

  • file_name (str, optional) — The model file name to use when saving the model. Overwrites the default file name "model.onnx".

  • batch_size (int, defaults to 8) — The number of calibration samples to load per batch.

  • data_collator (DataCollator, optional) — The function to use to form a batch from a list of elements of the calibration dataset.

  • remove_unused_columns (bool, defaults to True) — Whether or not to remove the columns unused by the model forward method.

  • weights_only (bool, defaults to False) — Compress weights to integer precision (8-bit by default) while keeping activations floating-point. Fits best for LLM footprint reduction and performance acceleration.

Quantize a model given the optimization specifications defined in quantization_config.

Examples:

Copied

>>> from optimum.intel.openvino import OVQuantizer, OVModelForSequenceClassification
>>> from transformers import AutoModelForSequenceClassification
>>> model = OVModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", export=True)
>>> # or
>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
>>> quantizer = OVQuantizer.from_pretrained(model, task="text-classification")
>>> quantizer.quantize(calibration_dataset=calibration_dataset, save_directory="./quantized_model")
>>> optimized_model = OVModelForSequenceClassification.from_pretrained("./quantized_model")

Copied

>>> from optimum.intel.openvino import OVQuantizer, OVModelForCausalLM
>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b")
>>> quantizer = OVQuantizer.from_pretrained(model, task="text-generation")
>>> quantizer.quantize(save_directory="./quantized_model", weights_only=True)
>>> optimized_model = OVModelForCausalLM.from_pretrained("./quantized_model")

input_ids (torch.Tensor) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using .

0 for tokens that are masked.

0 for tokens that are sentence B.

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_ids (torch.Tensor) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using .

0 for tokens that are masked.

0 for tokens that are sentence B.

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_ids (torch.Tensor) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using .

0 for tokens that are masked.

0 for tokens that are sentence B.

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_ids (torch.Tensor) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using .

0 for tokens that are masked.

0 for tokens that are sentence B.

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_ids (torch.Tensor) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using .

0 for tokens that are masked.

0 for tokens that are sentence B.

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_ids (torch.Tensor) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using .

0 for tokens that are masked.

0 for tokens that are sentence B.

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_values (torch.Tensor of shape (batch_size, sequence_length)) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using .

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_values (torch.Tensor of shape (batch_size, sequence_length)) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using .

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_values (torch.Tensor of shape (batch_size, sequence_length)) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using .

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

pixel_values (torch.Tensor) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images using .

The forward method, overrides the __call__ special method.

models hosted on . Example:

config (transformers.PretrainedConfig) — is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the ~intel.openvino.modeling.OVBaseModel.from_pretrained method to load the model weights.

input_ids (torch.Tensor) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using .

0 for tokens that are masked.

0 for tokens that are sentence B.

The forward method, overrides the __call__ special method.

config (transformers.PretrainedConfig) — is an instance of the configuration associated to the model. Initializing with a config file does not load the weights associated with the model, only the configuration.

The forward method, overrides the __call__ special method.

🌍
🌍
<source>
PretrainedConfig
<source>
AutoTokenizer
What are input IDs?
What are attention masks?
What are token type IDs?
OVModelForFeatureExtraction
<source>
PretrainedConfig
<source>
AutoTokenizer
What are input IDs?
What are attention masks?
What are token type IDs?
OVModelForMaskedLM
<source>
PretrainedConfig
<source>
AutoTokenizer
What are input IDs?
What are attention masks?
What are token type IDs?
OVModelForQuestionAnswering
<source>
PretrainedConfig
<source>
AutoTokenizer
What are input IDs?
What are attention masks?
What are token type IDs?
OVModelForSequenceClassification
<source>
PretrainedConfig
<source>
AutoTokenizer
What are input IDs?
What are attention masks?
What are token type IDs?
OVModelForTokenClassification
<source>
PretrainedConfig
<source>
AutoTokenizer
What are input IDs?
What are attention masks?
What are token type IDs?
OVModelForAudioClassification
<source>
PretrainedConfig
<source>
AutoFeatureExtractor
OVModelForAudioFrameClassification
<source>
PretrainedConfig
<source>
AutoFeatureExtractor
OVModelForCTC
<source>
PretrainedConfig
<source>
AutoFeatureExtractor
OVModelForAudioXVector
<source>
PretrainedConfig
<source>
AutoFeatureExtractor
OVModelForImageClassification
BOINC AIHub
<source>
PretrainedConfig
<source>
<source>
AutoTokenizer
What are input IDs?
What are attention masks?
What are token type IDs?
OVModelForCausalLM
<source>
PretrainedConfig
<source>
OVModelForSeq2SeqLM
<source>
<source>
<source>