ONNX Runtime Models
Last updated
Last updated
The following ORT classes are available for instantiating a base model class without a specific head.
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
Base class for implementing models using ONNX Runtime.
The ORTModel implements generic methods for interacting with the BOINC AI Hub as well as exporting vanilla transformers models to ONNX using optimum.exporters.onnx
toolchain.
Class attributes:
model_type (str
, optional, defaults to "onnx_model"
) — The name of the model type to use when registering the ORTModel classes.
auto_model_class (Type
, optional, defaults to AutoModel
) — The “AutoModel” class to represented by the current ORTModel class.
Common attributes:
model (ort.InferenceSession
) — The ONNX Runtime InferenceSession that is running the model.
config ( — The configuration of the model.
use_io_binding (bool
, optional, defaults to True
) — Whether to use I/O bindings with ONNX Runtime with the CUDAExecutionProvider, this can significantly speedup inference depending on the task.
model_save_dir (Path
) — The directory where the model exported to ONNX is saved. By defaults, if the loaded model is local, the directory where the original model will be used. Otherwise, the cache directory is used.
providers (`List[str]) — The list of execution providers available to ONNX Runtime.
from_pretrained
( model_id: typing.Union[str, pathlib.Path]export: bool = Falseforce_download: bool = Falseuse_auth_token: typing.Optional[str] = Nonecache_dir: typing.Optional[str] = Nonesubfolder: str = ''config: typing.Optional[ForwardRef('PretrainedConfig')] = Nonelocal_files_only: bool = Falseprovider: str = 'CPUExecutionProvider'session_options: typing.Optional[onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions] = Noneprovider_options: typing.Union[typing.Dict[str, typing.Any], NoneType] = Noneuse_io_binding: typing.Optional[bool] = None**kwargs ) → ORTModel
Parameters
model_id (Union[str, Path]
) — Can be either:
A string, the model id of a pretrained model hosted inside a model repo on boincai.com. Valid model ids can be located at the root-level, like bert-base-uncased
, or namespaced under a user or organization name, like dbmdz/bert-base-german-cased
.
A path to a directory containing a model saved using ~OptimizedModel.save_pretrained
, e.g., ./my_model_directory/
.
from_transformers (bool
, defaults to False
) — Defines whether the provided model_id
contains a vanilla Transformers checkpoint.
force_download (bool
, defaults to True
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.
use_auth_token (Optional[str]
, defaults to None
) — The token to use as HTTP bearer authorization for remote files. If True
, will use the token generated when running transformers-cli login
(stored in ~/.boincai
).
cache_dir (Optional[str]
, defaults to None
) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.
subfolder (str
, defaults to ""
) — In case the relevant files are located inside a subfolder of the model repo either locally or on boincai.com, you can specify the folder name here.
config (Optional[transformers.PretrainedConfig]
, defaults to None
) — The model configuration.
local_files_only (Optional[bool]
, defaults to False
) — Whether or not to only look at local files (i.e., do not try to download the model).
trust_remote_code (bool
, defaults to False
) — Whether or not to allow for custom code defined on the Hub in their own modeling. This option should only be set to True
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
session_options (Optional[onnxruntime.SessionOptions]
, defaults to None
), — ONNX Runtime session options to use for loading the model.
use_io_binding (Optional[bool]
, defaults to None
) — Whether to use IOBinding during inference to avoid memory copy between the host and device, or between numpy/torch tensors and ONNX Runtime ORTValue. Defaults to True
if the execution provider is CUDAExecutionProvider. For [~onnxruntime.ORTModelForCausalLM], defaults to True
on CPUExecutionProvider, in all other cases defaults to False
.
kwargs (Dict[str, Any]
) — Will be passed to the underlying model loading methods.
Parameters for decoder models (ORTModelForCausalLM, ORTModelForSeq2SeqLM, ORTModelForSeq2SeqLM, ORTModelForSpeechSeq2Seq, ORTModelForVision2Seq)
use_cache (Optional[bool]
, defaults to True
) — Whether or not past key/values cache should be used. Defaults to True
.
Parameters for ORTModelForCausalLM
use_merged (Optional[bool]
, defaults to None
) — whether or not to use a single ONNX that handles both the decoding without and with past key values reuse. This option defaults to True
if loading from a local repository and a merged decoder is found. When exporting with export=True
, defaults to False
. This option should be set to True
to minimize memory usage.
Returns
ORTModel
The loaded ORTModel model.
Instantiate a pretrained model from a pre-trained model configuration.
load_model
( path: typing.Union[str, pathlib.Path]provider: str = 'CPUExecutionProvider'session_options: typing.Optional[onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions] = Noneprovider_options: typing.Union[typing.Dict[str, typing.Any], NoneType] = None )
Parameters
path (Union[str, Path]
) — Path of the ONNX model.
session_options (Optional[onnxruntime.SessionOptions]
, defaults to None
) — ONNX Runtime session options to use for loading the model.
Loads an ONNX Inference session with a given provider. Default provider is CPUExecutionProvider
to match the default behaviour in PyTorch/TensorFlow/JAX.
raise_on_numpy_input_io_binding
( use_torch: bool )
Parameters
use_torch (bool
) — Whether the tensor used during inference are of type torch.Tensor or not.
Raises an error if IO Binding is requested although the tensor used are numpy arrays.
shared_attributes_init
( model: InferenceSessionuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
Initializes attributes that may be shared among several ONNX Runtime inference sesssions.
to
( device: typing.Union[torch.device, str, int] ) → ORTModel
Parameters
device (torch.device
or str
or int
) — Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id. You can pass native torch.device
or a str
too.
Returns
ORTModel
the model placed on the requested device.
Changes the ONNX Runtime provider according to the device.
The following ORT classes are available for the following natural language processing tasks.
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = Nonegeneration_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = Noneuse_cache: typing.Optional[bool] = None**kwargs )
ONNX model with a causal language modeling head for ONNX Runtime inference. This class officially supports bloom, codegen, falcon, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gptj, llama.
forward
( input_ids: LongTensorattention_mask: typing.Optional[torch.FloatTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Nonepast_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonelabels: typing.Optional[torch.LongTensor] = Noneuse_cache_branch: bool = None**kwargs )
Parameters
input_ids (torch.LongTensor
) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, sequence_length)
.
attention_mask (torch.LongTensor
) — Mask to avoid performing attention on padding token indices, of shape (batch_size, sequence_length)
. Mask values selected in [0, 1]
.
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to
None)
— Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers
with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
.
The ORTModelForCausalLM
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of text generation:
Copied
Example using transformers.pipelines
:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with a MaskedLMOutput for masked language modeling tasks. This class officially supports albert, bert, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, ibert, mobilebert, roberta, roformer, squeezebert, xlm, xlm_roberta.
forward
( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Noneattention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonetoken_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs )
Parameters
attention_mask (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
1 for tokens that are not masked,
token_type_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
1 for tokens that are sentence A,
The ORTModelForMaskedLM
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of feature extraction:
Copied
Example using transformers.pipeline
:
Copied
( encoder_session: InferenceSessiondecoder_session: InferenceSessionconfig: PretrainedConfigonnx_paths: typing.List[str]decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Noneuse_cache: bool = Trueuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = Nonegeneration_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None**kwargs )
Sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports bart, blenderbot, blenderbot_small, longt5, m2m_100, marian, mbart, mt5, pegasus, t5.
forward
( input_ids: LongTensor = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonedecoder_input_ids: typing.Optional[torch.LongTensor] = Noneencoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonepast_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonelabels: typing.Optional[torch.LongTensor] = None**kwargs )
Parameters
input_ids (torch.LongTensor
) — Indices of input sequence tokens in the vocabulary of shape (batch_size, encoder_sequence_length)
.
attention_mask (torch.LongTensor
) — Mask to avoid performing attention on padding token indices, of shape (batch_size, encoder_sequence_length)
. Mask values selected in [0, 1]
.
decoder_input_ids (torch.LongTensor
) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, decoder_sequence_length)
.
encoder_outputs (torch.FloatTensor
) — The encoder last_hidden_state
of shape (batch_size, encoder_sequence_length, hidden_size)
.
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to
None)
— Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers
with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head)
and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
The ORTModelForSeq2SeqLM
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of text generation:
Copied
Example using transformers.pipeline
:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks. This class officially supports albert, bart, bert, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, ibert, mbart, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.
forward
( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Noneattention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonetoken_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs )
Parameters
attention_mask (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
1 for tokens that are not masked,
token_type_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
1 for tokens that are sentence A,
The ORTModelForSequenceClassification
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of single-label classification:
Copied
Example using transformers.pipelines
:
Copied
Example using zero-shot-classification transformers.pipelines
:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks. This class officially supports albert, bert, bloom, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, gpt2, ibert, mobilebert, roberta, roformer, squeezebert, xlm, xlm_roberta.
forward
( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Noneattention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonetoken_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs )
Parameters
attention_mask (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
1 for tokens that are not masked,
token_type_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
1 for tokens that are sentence A,
The ORTModelForTokenClassification
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of token classification:
Copied
Example using transformers.pipelines
:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks. This class officially supports albert, bert, camembert, convbert, data2vec_text, deberta_v2, distilbert, electra, flaubert, ibert, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.
forward
( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Noneattention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonetoken_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs )
Parameters
attention_mask (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
1 for tokens that are not masked,
token_type_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
1 for tokens that are sentence A,
The ORTModelForMultipleChoice
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of mutliple choice:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD. This class officially supports albert, bart, bert, camembert, convbert, data2vec_text, deberta, deberta_v2, distilbert, electra, flaubert, gptj, ibert, mbart, mobilebert, nystromformer, roberta, roformer, squeezebert, xlm, xlm_roberta.
forward
( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Noneattention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonetoken_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs )
Parameters
attention_mask (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
1 for tokens that are not masked,
token_type_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
1 for tokens that are sentence A,
The ORTModelForQuestionAnswering
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of question answering:
Copied
Example using transformers.pipeline
:
Copied
The following ORT classes are available for the following computer vision tasks.
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model for image-classification tasks. This class officially supports beit, convnext, data2vec_vision, deit, levit, mobilenet_v1, mobilenet_v2, mobilevit, poolformer, resnet, segformer, swin, vit.
forward
( pixel_values: typing.Union[torch.Tensor, numpy.ndarray]**kwargs )
Parameters
The ORTModelForImageClassification
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of image classification:
Copied
Example using transformers.pipeline
:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model for semantic-segmentation, with an all-MLP decode head on top e.g. for ADE20k, CityScapes. This class officially supports segformer.
forward
( **kwargs )
Parameters
The ORTModelForSemanticSegmentation
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of semantic segmentation:
Copied
Example using transformers.pipeline
:
Copied
The following ORT classes are available for the following audio tasks.
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model for audio-classification, with a sequence classification head on top (a linear layer over the pooled output) for tasks like SUPERB Keyword Spotting. This class officially supports audio_spectrogram_transformer, data2vec_audio, hubert, sew, sew_d, unispeech, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.
forward
( input_values: typing.Optional[torch.Tensor] = Noneattenton_mask: typing.Optional[torch.Tensor] = None**kwargs )
Parameters
The ORTModelForAudioClassification
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of audio classification:
Copied
Example using transformers.pipeline
:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with a frame classification head on top for tasks like Speaker Diarization. This class officially supports data2vec_audio, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.
forward
( input_values: typing.Optional[torch.Tensor] = None**kwargs )
Parameters
The ORTModelForAudioFrameClassification
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of audio frame classification:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with a language modeling head on top for Connectionist Temporal Classification (CTC). This class officially supports data2vec_audio, hubert, sew, sew_d, unispeech, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.
forward
( input_values: typing.Optional[torch.Tensor] = None**kwargs )
Parameters
The ORTModelForCTC
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of CTC:
Copied
( encoder_session: InferenceSessiondecoder_session: InferenceSessionconfig: PretrainedConfigonnx_paths: typing.List[str]decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Noneuse_cache: bool = Trueuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = Nonegeneration_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None**kwargs )
Speech Sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports whisper, speech_to_text.
forward
( input_features: typing.Optional[torch.FloatTensor] = Noneattention_mask: typing.Optional[torch.LongTensor] = Nonedecoder_input_ids: typing.Optional[torch.LongTensor] = Noneencoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonepast_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonelabels: typing.Optional[torch.LongTensor] = None**kwargs )
Parameters
input_features (torch.FloatTensor
) — Mel features extracted from the raw speech waveform. (batch_size, feature_size, encoder_sequence_length)
.
decoder_input_ids (torch.LongTensor
) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, decoder_sequence_length)
.
encoder_outputs (torch.FloatTensor
) — The encoder last_hidden_state
of shape (batch_size, encoder_sequence_length, hidden_size)
.
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to
None)
— Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers
with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head)
and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
The ORTModelForSpeechSeq2Seq
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of text generation:
Copied
Example using transformers.pipeline
:
Copied
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model with an XVector feature extraction head on top for tasks like Speaker Verification. This class officially supports data2vec_audio, unispeech_sat, wavlm, wav2vec2, wav2vec2-conformer.
forward
( input_values: typing.Optional[torch.Tensor] = None**kwargs )
Parameters
The ORTModelForAudioXVector
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of Audio XVector:
Copied
The following ORT classes are available for the following multimodal tasks.
( encoder_session: InferenceSessiondecoder_session: InferenceSessionconfig: PretrainedConfigonnx_paths: typing.List[str]decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Noneuse_cache: bool = Trueuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = Nonegeneration_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None**kwargs )
VisionEncoderDecoder Sequence-to-sequence model with a language modeling head for ONNX Runtime inference. This class officially supports trocr and vision-encoder-decoder.
forward
( pixel_values: typing.Optional[torch.FloatTensor] = Nonedecoder_input_ids: typing.Optional[torch.LongTensor] = Noneencoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonepast_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonelabels: typing.Optional[torch.LongTensor] = None**kwargs )
Parameters
pixel_values (torch.FloatTensor
) — Features extracted from an Image. This tensor should be of shape (batch_size, num_channels, height, width)
.
decoder_input_ids (torch.LongTensor
) — Indices of decoder input sequence tokens in the vocabulary of shape (batch_size, decoder_sequence_length)
.
encoder_outputs (torch.FloatTensor
) — The encoder last_hidden_state
of shape (batch_size, encoder_sequence_length, hidden_size)
.
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to
None)
— Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers
with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head)
and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
The ORTModelForVision2Seq
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of text generation:
Copied
Example using transformers.pipeline
:
Copied
( encoder_session: InferenceSessiondecoder_session: InferenceSessionconfig: PretrainedConfigonnx_paths: typing.List[str]decoder_with_past_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Noneuse_cache: bool = Trueuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = Nonegeneration_config: typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None**kwargs )
Pix2struct model with a language modeling head for ONNX Runtime inference. This class officially supports pix2struct.
forward
( flattened_patches: typing.Optional[torch.FloatTensor] = Noneattention_mask: typing.Optional[torch.LongTensor] = Nonedecoder_input_ids: typing.Optional[torch.LongTensor] = Nonedecoder_attention_mask: typing.Optional[torch.BoolTensor] = Noneencoder_outputs: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonepast_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonelabels: typing.Optional[torch.LongTensor] = None**kwargs )
Parameters
flattened_patches (torch.FloatTensor
of shape (batch_size, seq_length, hidden_size)
) — Flattened pixel patches. the hidden_size
is obtained by the following formula: hidden_size
= num_channels
patch_size
patch_size
The process of flattening the pixel patches is done by Pix2StructProcessor
.
attention_mask (torch.FloatTensor
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices.
decoder_input_ids (torch.LongTensor
of shape (batch_size, target_sequence_length)
, optional) — Indices of decoder input sequence tokens in the vocabulary. Pix2StructText uses the pad_token_id
as the starting token for decoder_input_ids
generation. If past_key_values
is used, optionally only the last decoder_input_ids
have to be input (see past_key_values
).
decoder_attention_mask (torch.BoolTensor
of shape (batch_size, target_sequence_length)
, optional) — Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids
. Causal mask will also be used by default.
encoder_outputs (tuple(tuple(torch.FloatTensor)
, optional) — Tuple consists of (last_hidden_state
, optional
: hidden_states, optional
: attentions) last_hidden_state
of shape (batch_size, sequence_length, hidden_size)
is a sequence of hidden states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.
past_key_values (tuple(tuple(torch.FloatTensor), *optional*, defaults to
None)
— Contains the precomputed key and value hidden states of the attention blocks used to speed up decoding. The tuple is of length config.n_layers
with each tuple having 2 tensors of shape (batch_size, num_heads, decoder_sequence_length, embed_size_per_head)
and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head)
.
The ORTModelForPix2Struct
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of pix2struct:
Copied
The following ORT classes are available for the following custom tasks.
ORTModelForCustomTasks
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model for any custom tasks. It can be used to leverage the inference acceleration for any single-file ONNX model, that may use custom inputs and outputs.
forward
( **kwargs )
The ORTModelForCustomTasks
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of custom tasks(e.g. a sentence transformers taking pooler_output
as output):
Copied
Example using transformers.pipelines
(only if the task is supported):
Copied
ORTModelForFeatureExtraction
( model: InferenceSessionconfig: PretrainedConfiguse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Nonepreprocessors: typing.Optional[typing.List] = None**kwargs )
ONNX Model for feature-extraction task.
forward
( input_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Noneattention_mask: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = Nonetoken_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs )
Parameters
attention_mask (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
1 for tokens that are not masked,
token_type_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]
:
1 for tokens that are sentence A,
The ORTModelForFeatureExtraction
forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Example of feature extraction:
Copied
Example using transformers.pipeline
:
Copied
ORTStableDiffusionPipeline
( vae_decoder_session: InferenceSessiontext_encoder_session: InferenceSessionunet_session: InferenceSessionconfig: typing.Dict[str, typing.Any]tokenizer: CLIPTokenizerscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler]feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = Nonevae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetext_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = Noneuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )
__call__
( prompt: typing.Union[str, typing.List[str], NoneType] = Noneheight: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: int = 50guidance_scale: float = 7.5negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: int = 1eta: float = 0.0generator: typing.Optional[numpy.random.mtrand.RandomState] = Nonelatents: typing.Optional[numpy.ndarray] = Noneprompt_embeds: typing.Optional[numpy.ndarray] = Nonenegative_prompt_embeds: typing.Optional[numpy.ndarray] = Noneoutput_type: str = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = Nonecallback_steps: int = 1guidance_rescale: float = 0.0 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
prompt (Optional[Union[str, List[str]]]
, defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds
. instead.
height (Optional[int]
, defaults to None) — The height in pixels of the generated image.
width (Optional[int]
, defaults to None) — The width in pixels of the generated image.
num_inference_steps (int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
negative_prompt (Optional[Union[str, list]]
) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
).
num_images_per_prompt (int
, defaults to 1) — The number of images to generate per prompt.
generator (Optional[np.random.RandomState]
, defaults to None
) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray]
, defaults to None
) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator
.
prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument.
negative_prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
return_dict (bool
, defaults to True
) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple.
callback (Optional[Callable], defaults to None
) — A function that will be called every callback_steps
steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor)
.
callback_steps (int
, defaults to 1) — The frequency at which the callback
function will be called. If not specified, the callback will be called at every step.
Returns
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
ORTStableDiffusionImg2ImgPipeline
( vae_decoder_session: InferenceSessiontext_encoder_session: InferenceSessionunet_session: InferenceSessionconfig: typing.Dict[str, typing.Any]tokenizer: CLIPTokenizerscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler]feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = Nonevae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetext_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = Noneuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )
__call__
( prompt: typing.Union[str, typing.List[str], NoneType] = Noneimage: typing.Union[numpy.ndarray, PIL.Image.Image] = Nonestrength: float = 0.8num_inference_steps: int = 50guidance_scale: float = 7.5negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: int = 1eta: float = 0.0generator: typing.Optional[numpy.random.mtrand.RandomState] = Noneprompt_embeds: typing.Optional[numpy.ndarray] = Nonenegative_prompt_embeds: typing.Optional[numpy.ndarray] = Noneoutput_type: str = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = Nonecallback_steps: int = 1 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
prompt (Optional[Union[str, List[str]]]
, defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds
. instead.
image (Union[np.ndarray, PIL.Image.Image]
) — Image
, or tensor representing an image batch which will be upscaled.
strength (float
, defaults to 0.8) — Conceptually, indicates how much to transform the reference image
. Must be between 0 and 1. image
will be used as a starting point, adding more noise to it the larger the strength
. The number of denoising steps depends on the amount of noise initially added. When strength
is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified in num_inference_steps
. A value of 1, therefore, essentially ignores image
.
num_inference_steps (int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
negative_prompt (Optional[Union[str, list]]
) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
).
num_images_per_prompt (int
, defaults to 1) — The number of images to generate per prompt.
generator (Optional[np.random.RandomState]
, defaults to None
) —: A np.random.RandomState to make generation deterministic.
prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument.
negative_prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
return_dict (bool
, defaults to True
) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple.
callback (Optional[Callable], defaults to None
) — A function that will be called every callback_steps
steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor)
.
callback_steps (int
, defaults to 1) — The frequency at which the callback
function will be called. If not specified, the callback will be called at every step.
Returns
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
ORTStableDiffusionInpaintPipeline
( vae_decoder_session: InferenceSessiontext_encoder_session: InferenceSessionunet_session: InferenceSessionconfig: typing.Dict[str, typing.Any]tokenizer: CLIPTokenizerscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler]feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = Nonevae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetext_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = Noneuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )
__call__
( prompt: typing.Union[str, typing.List[str]]image: Imagemask_image: Imageheight: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: int = 50guidance_scale: float = 7.5negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: int = 1eta: float = 0.0generator: typing.Optional[numpy.random.mtrand.RandomState] = Nonelatents: typing.Optional[numpy.ndarray] = Noneprompt_embeds: typing.Optional[numpy.ndarray] = Nonenegative_prompt_embeds: typing.Optional[numpy.ndarray] = Noneoutput_type: str = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = Nonecallback_steps: int = 1 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
prompt (Union[str, List[str]]
) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds
. instead.
image (PIL.Image.Image
) — Image
, or tensor representing an image batch which will be upscaled.
mask_image (PIL.Image.Image
) — Image
, or tensor representing a masked image batch which will be upscaled.
height (Optional[int]
, defaults to None) — The height in pixels of the generated image.
width (Optional[int]
, defaults to None) — The width in pixels of the generated image.
num_inference_steps (int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
negative_prompt (Optional[Union[str, list]]
) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
).
num_images_per_prompt (int
, defaults to 1) — The number of images to generate per prompt.
generator (Optional[np.random.RandomState]
, defaults to None
) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray]
, defaults to None
) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator
.
prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument.
negative_prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
return_dict (bool
, defaults to True
) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple.
callback (Optional[Callable], defaults to None
) — A function that will be called every callback_steps
steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor)
.
callback_steps (int
, defaults to 1) — The frequency at which the callback
function will be called. If not specified, the callback will be called at every step.
Returns
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
ORTStableDiffusionXLPipeline
( vae_decoder_session: InferenceSessiontext_encoder_session: InferenceSessionunet_session: InferenceSessionconfig: typing.Dict[str, typing.Any]tokenizer: CLIPTokenizerscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler]feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = Nonevae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetext_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = Noneuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Noneadd_watermarker: typing.Optional[bool] = None )
__call__
( prompt: typing.Union[str, typing.List[str], NoneType] = Noneheight: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: int = 50guidance_scale: float = 5.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: int = 1eta: float = 0.0generator: typing.Optional[numpy.random.mtrand.RandomState] = Nonelatents: typing.Optional[numpy.ndarray] = Noneprompt_embeds: typing.Optional[numpy.ndarray] = Nonenegative_prompt_embeds: typing.Optional[numpy.ndarray] = Nonepooled_prompt_embeds: typing.Optional[numpy.ndarray] = Nonenegative_pooled_prompt_embeds: typing.Optional[numpy.ndarray] = Noneoutput_type: str = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = Nonecallback_steps: int = 1cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Noneguidance_rescale: float = 0.0original_size: typing.Union[typing.Tuple[int, int], NoneType] = Nonecrops_coords_top_left: typing.Tuple[int, int] = (0, 0)target_size: typing.Union[typing.Tuple[int, int], NoneType] = None ) → ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
or tuple
Parameters
prompt (Optional[Union[str, List[str]]]
, defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds
. instead.
height (Optional[int]
, defaults to None) — The height in pixels of the generated image.
width (Optional[int]
, defaults to None) — The width in pixels of the generated image.
num_inference_steps (int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
negative_prompt (Optional[Union[str, list]]
) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
).
num_images_per_prompt (int
, defaults to 1) — The number of images to generate per prompt.
generator (Optional[np.random.RandomState]
, defaults to None
) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray]
, defaults to None
) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator
.
prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument.
negative_prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
return_dict (bool
, defaults to True
) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
instead of a plain tuple.
callback (Optional[Callable], defaults to None
) — A function that will be called every callback_steps
steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor)
.
callback_steps (int
, defaults to 1) — The frequency at which the callback
function will be called. If not specified, the callback will be called at every step.
Returns
~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
or tuple
~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
ORTStableDiffusionXLImg2ImgPipeline
( vae_decoder_session: InferenceSessiontext_encoder_session: InferenceSessionunet_session: InferenceSessionconfig: typing.Dict[str, typing.Any]tokenizer: CLIPTokenizerscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler]feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = Nonevae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetext_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = Noneuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = Noneadd_watermarker: typing.Optional[bool] = None )
__call__
( prompt: typing.Union[str, typing.List[str], NoneType] = Noneimage: typing.Union[numpy.ndarray, PIL.Image.Image] = Nonestrength: float = 0.3num_inference_steps: int = 50guidance_scale: float = 5.0negative_prompt: typing.Union[str, typing.List[str], NoneType] = Nonenum_images_per_prompt: int = 1eta: float = 0.0generator: typing.Optional[numpy.random.mtrand.RandomState] = Nonelatents: typing.Optional[numpy.ndarray] = Noneprompt_embeds: typing.Optional[numpy.ndarray] = Nonenegative_prompt_embeds: typing.Optional[numpy.ndarray] = Nonepooled_prompt_embeds: typing.Optional[numpy.ndarray] = Nonenegative_pooled_prompt_embeds: typing.Optional[numpy.ndarray] = Noneoutput_type: str = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = Nonecallback_steps: int = 1cross_attention_kwargs: typing.Union[typing.Dict[str, typing.Any], NoneType] = Noneguidance_rescale: float = 0.0original_size: typing.Union[typing.Tuple[int, int], NoneType] = Nonecrops_coords_top_left: typing.Tuple[int, int] = (0, 0)target_size: typing.Union[typing.Tuple[int, int], NoneType] = Noneaesthetic_score: float = 6.0negative_aesthetic_score: float = 2.5 ) → ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
or tuple
Parameters
prompt (Optional[Union[str, List[str]]]
, defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds
. instead.
image (Union[np.ndarray, PIL.Image.Image]
) — Image
, or tensor representing an image batch which will be upscaled.
strength (float
, defaults to 0.8) — Conceptually, indicates how much to transform the reference image
. Must be between 0 and 1. image
will be used as a starting point, adding more noise to it the larger the strength
. The number of denoising steps depends on the amount of noise initially added. When strength
is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified in num_inference_steps
. A value of 1, therefore, essentially ignores image
.
num_inference_steps (int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
negative_prompt (Optional[Union[str, list]]
) — The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds
. instead. Ignored when not using guidance (i.e., ignored if guidance_scale
is less than 1
).
num_images_per_prompt (int
, defaults to 1) — The number of images to generate per prompt.
generator (Optional[np.random.RandomState]
, defaults to None
) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray]
, defaults to None
) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator
.
prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument.
negative_prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt
input argument.
return_dict (bool
, defaults to True
) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
instead of a plain tuple.
callback (Optional[Callable], defaults to None
) — A function that will be called every callback_steps
steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor)
.
callback_steps (int
, defaults to 1) — The frequency at which the callback
function will be called. If not specified, the callback will be called at every step.
Returns
~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
or tuple
~pipelines.stable_diffusion.StableDiffusionXLPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
ORTLatentConsistencyModelPipeline
( vae_decoder_session: InferenceSessiontext_encoder_session: InferenceSessionunet_session: InferenceSessionconfig: typing.Dict[str, typing.Any]tokenizer: CLIPTokenizerscheduler: typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_pndm.PNDMScheduler, diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler]feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = Nonevae_encoder_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetext_encoder_2_session: typing.Optional[onnxruntime.capi.onnxruntime_inference_collection.InferenceSession] = Nonetokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = Noneuse_io_binding: typing.Optional[bool] = Nonemodel_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None )
__call__
( prompt: typing.Union[str, typing.List[str], NoneType] = Noneheight: typing.Optional[int] = Nonewidth: typing.Optional[int] = Nonenum_inference_steps: int = 4original_inference_steps: int = Noneguidance_scale: float = 8.5num_images_per_prompt: int = 1generator: typing.Optional[numpy.random.mtrand.RandomState] = Nonelatents: typing.Optional[numpy.ndarray] = Noneprompt_embeds: typing.Optional[numpy.ndarray] = Noneoutput_type: str = 'pil'return_dict: bool = Truecallback: typing.Union[typing.Callable[[int, int, numpy.ndarray], NoneType], NoneType] = Nonecallback_steps: int = 1 ) → ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
prompt (Optional[Union[str, List[str]]]
, defaults to None) — The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds
. instead.
height (Optional[int]
, defaults to None) — The height in pixels of the generated image.
width (Optional[int]
, defaults to None) — The width in pixels of the generated image.
num_inference_steps (int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
num_images_per_prompt (int
, defaults to 1) — The number of images to generate per prompt.
generator (Optional[np.random.RandomState]
, defaults to None
) —: A np.random.RandomState to make generation deterministic.
latents (Optional[np.ndarray]
, defaults to None
) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator
.
prompt_embeds (Optional[np.ndarray]
, defaults to None
) — Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt
input argument.
return_dict (bool
, defaults to True
) — Whether or not to return a ~pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple.
callback (Optional[Callable], defaults to None
) — A function that will be called every callback_steps
steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor)
.
callback_steps (int
, defaults to 1) — The frequency at which the callback
function will be called. If not specified, the callback will be called at every step.
Returns
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
~pipelines.stable_diffusion.StableDiffusionPipelineOutput
if return_dict
is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of
bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the
safety_checker`.
Function invoked when calling the pipeline for generation.
provider (str
, defaults to "CPUExecutionProvider"
) — ONNX Runtime provider to use for loading the model. See for possible providers.
provider_options (Optional[Dict[str, Any]]
, defaults to None
) — Provider option dictionaries corresponding to the provider used. See available options for each provider: .
provider (str
, defaults to "CPUExecutionProvider"
) — ONNX Runtime provider to use for loading the model. See for possible providers.
provider_options (Optional[Dict[str, Any]]
, defaults to None
) — Provider option dictionary corresponding to the provider used. See available options for each provider: .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using . See and for details.
0 for tokens that are masked.
0 for tokens that are sentence B.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using . See and for details.
0 for tokens that are masked.
0 for tokens that are sentence B.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using . See and for details.
0 for tokens that are masked.
0 for tokens that are sentence B.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using . See and for details.
0 for tokens that are masked.
0 for tokens that are sentence B.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using . See and for details.
0 for tokens that are masked.
0 for tokens that are sentence B.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
pixel_values (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, num_channels, height, width)
, defaults to None
) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images using .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
pixel_values (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, num_channels, height, width)
, defaults to None
) — Pixel values corresponding to the images in the current batch. Pixel values can be obtained from encoded images using .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_values (torch.Tensor
of shape (batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_values (torch.Tensor
of shape (batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_values (torch.Tensor
of shape (batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_values (torch.Tensor
of shape (batch_size, sequence_length)
) — Float values of input raw speech waveform.. Input values can be obtained from audio file loaded into an array using .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
input_ids (Union[torch.Tensor, np.ndarray, None]
of shape (batch_size, sequence_length)
, defaults to None
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using . See and for details.
0 for tokens that are masked.
0 for tokens that are sentence B.
ONNX Runtime-powered stable diffusion pipeline corresponding to .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
guidance_scale (float
, defaults to 7.5) — Guidance scale as defined in . guidance_scale
is defined as w
of equation 2. of . Guidance scale is enabled by setting guidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the text prompt
, usually at the expense of lower image quality.
eta (float
, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: . Only applies to schedulers.DDIMScheduler
, will be ignored for others.
output_type (str
, defaults to "pil"
) — The output format of the generate image. Choose between : PIL.Image.Image
or np.array
.
guidance_rescale (float
, defaults to 0.0) — Guidance rescale factor proposed by guidance_scale
is defined as φ
in equation 16. of . Guidance rescale factor should fix overexposure when using zero terminal SNR.
ONNX Runtime-powered stable diffusion pipeline corresponding to .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
guidance_scale (float
, defaults to 7.5) — Guidance scale as defined in . guidance_scale
is defined as w
of equation 2. of . Guidance scale is enabled by setting guidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the text prompt
, usually at the expense of lower image quality.
eta (float
, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: . Only applies to schedulers.DDIMScheduler
, will be ignored for others.
output_type (str
, defaults to "pil"
) — The output format of the generate image. Choose between : PIL.Image.Image
or np.array
.
ONNX Runtime-powered stable diffusion pipeline corresponding to .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
guidance_scale (float
, defaults to 7.5) — Guidance scale as defined in . guidance_scale
is defined as w
of equation 2. of . Guidance scale is enabled by setting guidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the text prompt
, usually at the expense of lower image quality.
eta (float
, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: . Only applies to schedulers.DDIMScheduler
, will be ignored for others.
output_type (str
, defaults to "pil"
) — The output format of the generate image. Choose between : PIL.Image.Image
or np.array
.
ONNX Runtime-powered stable diffusion pipeline corresponding to .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
guidance_scale (float
, defaults to 5) — Guidance scale as defined in . guidance_scale
is defined as w
of equation 2. of . Guidance scale is enabled by setting guidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the text prompt
, usually at the expense of lower image quality.
eta (float
, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: . Only applies to schedulers.DDIMScheduler
, will be ignored for others.
output_type (str
, defaults to "pil"
) — The output format of the generate image. Choose between : PIL.Image.Image
or np.array
.
guidance_rescale (float
, defaults to 0.7) — Guidance rescale factor proposed by guidance_scale
is defined as φ
in equation 16. of . Guidance rescale factor should fix overexposure when using zero terminal SNR.
ONNX Runtime-powered stable diffusion pipeline corresponding to .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
guidance_scale (float
, defaults to 5) — Guidance scale as defined in . guidance_scale
is defined as w
of equation 2. of . Guidance scale is enabled by setting guidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the text prompt
, usually at the expense of lower image quality.
eta (float
, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: . Only applies to schedulers.DDIMScheduler
, will be ignored for others.
output_type (str
, defaults to "pil"
) — The output format of the generate image. Choose between : PIL.Image.Image
or np.array
.
guidance_rescale (float
, defaults to 0.7) — Guidance rescale factor proposed by guidance_scale
is defined as φ
in equation 16. of . Guidance rescale factor should fix overexposure when using zero terminal SNR.
ONNX Runtime-powered stable diffusion pipeline corresponding to .
This model inherits from , check its documentation for the generic methods the library implements for all its model (such as downloading or saving).
This class should be initialized using the method.
guidance_scale (float
, defaults to 7.5) — Guidance scale as defined in . guidance_scale
is defined as w
of equation 2. of . Guidance scale is enabled by setting guidance_scale > 1
. Higher guidance scale encourages to generate images that are closely linked to the text prompt
, usually at the expense of lower image quality.
output_type (str
, defaults to "pil"
) — The output format of the generate image. Choose between : PIL.Image.Image
or np.array
.
guidance_rescale (float
, defaults to 0.0) — Guidance rescale factor proposed by guidance_scale
is defined as φ
in equation 16. of . Guidance rescale factor should fix overexposure when using zero terminal SNR.