# Neuron Models

## Models

### Generic model classes

#### NeuronBaseModel

The `NeuronBaseModel` class is available for instantiating a base Neuron model without a specific head. It is used as the base class for all tasks but text generation.

#### class optimum.neuron.NeuronBaseModel

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_base.py#L49)

( model: ScriptModule config: PretrainedConfig model\_save\_dir: Union = None model\_file\_name: Optional = None preprocessors: Optional = None neuron\_config: Optional = None \*\*kwargs )

Base class running compiled and optimized models on Neuron devices.

It implements generic methods for interacting with the BOINC AI Hub as well as compiling vanilla transformers models to neuron-optimized TorchScript module and export it using `optimum.exporters.neuron` toolchain.

Class attributes:

* model\_type (`str`, *optional*, defaults to `"neuron_model"`) — The name of the model type to use when registering the NeuronBaseModel classes.
* auto\_model\_class (`Type`, *optional*, defaults to `AutoModel`) — The `AutoModel` class to be represented by the current NeuronBaseModel class.

Common attributes:

* model (`torch.jit._script.ScriptModule`) — The loaded `ScriptModule` compiled for neuron devices.
* config ([PretrainedConfig](https://huggingface.co/docs/transformers/main/en/main_classes/configuration#transformers.PretrainedConfig)) — The configuration of the model.
* model\_save\_dir (`Path`) — The directory where a neuron compiled model is saved. By default, if the loaded model is local, the directory where the original model will be used. Otherwise, the cache directory will be used.

**get\_input\_static\_shapes**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_base.py#L423)

( neuron\_config: NeuronConfig )

Gets a dictionary of inputs with their valid static shapes.

**load\_model**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_base.py#L92)

( path: Union )

Parameters

* **path** (`Union[str, Path]`) — Path of the compiled model.

Loads a TorchScript module compiled by neuron(x)-cc compiler. It will be first loaded onto CPU and then moved to one or multiple [NeuronCore](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-hardware/neuroncores-arch.html).

**remove\_padding**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_base.py#L506)

( outputs: List dims: List indices: List )

Parameters

* **outputs** (`List[torch.Tensor]`) — List of torch tensors which are inference output.
* **dims** (`List[int]`) — List of dimensions in which we slice a tensor.
* **indices** (`List[int]`) — List of indices in which we slice a tensor along an axis.

Removes padding from output tensors.

#### NeuronDecoderModel

The `NeuronDecoderModel` class is the base class for text generation models.

#### class optimum.neuron.NeuronDecoderModel

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_decoder.py#L52)

( model: Module config: PretrainedConfig model\_path: Union generation\_config: Optional = None )

Base class to convert and run pre-trained transformers decoder models on Neuron devices.

It implements the methods to convert a pre-trained transformers decoder model into a Neuron transformer model by:

* transferring the checkpoint weights of the original into an optimized neuron graph,
* compiling the resulting graph using the Neuron compiler.

Common attributes:

* model (`torch.nn.Module`) — The decoder model with a graph optimized for neuron devices.
* config ([PretrainedConfig](https://huggingface.co/docs/transformers/main/en/main_classes/configuration#transformers.PretrainedConfig)) — The configuration of the original model.
* generation\_config ([GenerationConfig](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationConfig)) — The generation configuration used by default when calling `generate()`.

### Natural Language Processing

The following Neuron model classes are available for natural language processing tasks.

#### NeuronModelForFeatureExtraction

#### class optimum.neuron.NeuronModelForFeatureExtraction

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L115)

( model: ScriptModule config: PretrainedConfig model\_save\_dir: Union = None model\_file\_name: Optional = None preprocessors: Optional = None neuron\_config: Optional = None \*\*kwargs )

Parameters

* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the `optimum.neuron.modeling.NeuronBaseModel.from_pretrained` method to load the model weights.
* **model** (`torch.jit._script.ScriptModule`) — [torch.jit.\_script.ScriptModule](https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html) is the TorchScript graph compiled by neuron(x) compiler.

Neuron Model with a BaseModelOutput for feature-extraction tasks.

This model inherits from `~neuron.modeling.NeuronBaseModel`. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Feature Extraction model on Neuron devices.

**forward**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L128)

( input\_ids: Tensor attention\_mask: Tensor token\_type\_ids: Optional = None \*\*kwargs )

Parameters

* **input\_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer). See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details. [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
* **attention\_mask** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  * 1 for tokens that are **not masked**,
  * 0 for tokens that are **masked**. [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
* **token\_type\_ids** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  * 1 for tokens that are **sentence A**,
  * 0 for tokens that are **sentence B**. [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)

The [NeuronModelForFeatureExtraction](https://huggingface.co/docs/optimum.neuron/main/en/package_reference/modeling#optimum.neuron.NeuronModelForFeatureExtraction) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of feature extraction: *(Following model is compiled with neuronx compiler and can only be run on INF2. Replace “neuronx” with “neuron” if you are using INF1.)*

Copied

```
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForFeatureExtraction

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/all-MiniLM-L6-v2-neuronx")
>>> model = NeuronModelForFeatureExtraction.from_pretrained("optimum/all-MiniLM-L6-v2-neuronx")

>>> inputs = tokenizer("Dear Evan Hansen is the winner of six Tony Awards.", return_tensors="pt")

>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> list(last_hidden_state.shape)
[1, 13, 384]
```

#### NeuronModelForMaskedLM

#### class optimum.neuron.NeuronModelForMaskedLM

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L191)

( model: ScriptModule config: PretrainedConfig model\_save\_dir: Union = None model\_file\_name: Optional = None preprocessors: Optional = None neuron\_config: Optional = None \*\*kwargs )

Parameters

* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the `optimum.neuron.modeling.NeuronBaseModel.from_pretrained` method to load the model weights.
* **model** (`torch.jit._script.ScriptModule`) — [torch.jit.\_script.ScriptModule](https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html) is the TorchScript graph compiled by neuron(x) compiler.

Neuron Model with a MaskedLMOutput for masked language modeling tasks.

This model inherits from `~neuron.modeling.NeuronBaseModel`. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Masked language model for on Neuron devices.

**forward**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L204)

( input\_ids: Tensor attention\_mask: Tensor token\_type\_ids: Optional = None \*\*kwargs )

Parameters

* **input\_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer). See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details. [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
* **attention\_mask** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  * 1 for tokens that are **not masked**,
  * 0 for tokens that are **masked**. [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
* **token\_type\_ids** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  * 1 for tokens that are **sentence A**,
  * 0 for tokens that are **sentence B**. [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)

The [NeuronModelForMaskedLM](https://huggingface.co/docs/optimum.neuron/main/en/package_reference/modeling#optimum.neuron.NeuronModelForMaskedLM) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of fill mask: *(Following model is compiled with neuronx compiler and can only be run on INF2. Replace “neuronx” with “neuron” if you are using INF1.)*

Copied

```
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForMaskedLM
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/legal-bert-base-uncased-neuronx")
>>> model = NeuronModelForMaskedLM.from_pretrained("optimum/legal-bert-base-uncased-neuronx")

>>> inputs = tokenizer("This [MASK] Agreement is between General Motors and John Murray.", return_tensors="pt")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 13, 30522]
```

#### NeuronModelForSequenceClassification

#### class optimum.neuron.NeuronModelForSequenceClassification

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L329)

( model: ScriptModule config: PretrainedConfig model\_save\_dir: Union = None model\_file\_name: Optional = None preprocessors: Optional = None neuron\_config: Optional = None \*\*kwargs )

Parameters

* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the `optimum.neuron.modeling.NeuronBaseModel.from_pretrained` method to load the model weights.
* **model** (`torch.jit._script.ScriptModule`) — [torch.jit.\_script.ScriptModule](https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html) is the TorchScript graph compiled by neuron(x) compiler.

Neuron Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

This model inherits from `~neuron.modeling.NeuronBaseModel`. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Sequence Classification model on Neuron devices.

**forward**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L343)

( input\_ids: Tensor attention\_mask: Tensor token\_type\_ids: Optional = None \*\*kwargs )

Parameters

* **input\_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer). See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details. [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
* **attention\_mask** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  * 1 for tokens that are **not masked**,
  * 0 for tokens that are **masked**. [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
* **token\_type\_ids** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  * 1 for tokens that are **sentence A**,
  * 0 for tokens that are **sentence B**. [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)

The [NeuronModelForSequenceClassification](https://huggingface.co/docs/optimum.neuron/main/en/package_reference/modeling#optimum.neuron.NeuronModelForSequenceClassification) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of single-label classification: *(Following model is compiled with neuronx compiler and can only be run on INF2.)*

Copied

```
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx")
>>> model = NeuronModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english-neuronx")

>>> inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 2]
```

#### NeuronModelForQuestionAnswering

#### class optimum.neuron.NeuronModelForQuestionAnswering

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L261)

( model: ScriptModule config: PretrainedConfig model\_save\_dir: Union = None model\_file\_name: Optional = None preprocessors: Optional = None neuron\_config: Optional = None \*\*kwargs )

Parameters

* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the `optimum.neuron.modeling.NeuronBaseModel.from_pretrained` method to load the model weights.
* **model** (`torch.jit._script.ScriptModule`) — [torch.jit.\_script.ScriptModule](https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html) is the TorchScript graph compiled by neuron(x) compiler.

Neuron Model with a QuestionAnsweringModelOutput for extractive question-answering tasks like SQuAD.

This model inherits from `~neuron.modeling.NeuronBaseModel`. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Question Answering model on Neuron devices.

**forward**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L274)

( input\_ids: Tensor attention\_mask: Tensor token\_type\_ids: Optional = None \*\*kwargs )

Parameters

* **input\_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer). See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details. [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
* **attention\_mask** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  * 1 for tokens that are **not masked**,
  * 0 for tokens that are **masked**. [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
* **token\_type\_ids** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  * 1 for tokens that are **sentence A**,
  * 0 for tokens that are **sentence B**. [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)

The [NeuronModelForQuestionAnswering](https://huggingface.co/docs/optimum.neuron/main/en/package_reference/modeling#optimum.neuron.NeuronModelForQuestionAnswering) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of question answering: *(Following model is compiled with neuronx compiler and can only be run on INF2.)*

Copied

```
>>> import torch
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForQuestionAnswering

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/roberta-base-squad2-neuronx")
>>> model = NeuronModelForQuestionAnswering.from_pretrained("optimum/roberta-base-squad2-neuronx")

>>> question, text = "Are there wheelchair spaces in the theatres?", "Yes, we have reserved wheelchair spaces with a good view."
>>> inputs = tokenizer(question, text, return_tensors="pt")
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([12])

>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits
```

#### NeuronModelForTokenClassification

#### class optimum.neuron.NeuronModelForTokenClassification

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L397)

( model: ScriptModule config: PretrainedConfig model\_save\_dir: Union = None model\_file\_name: Optional = None preprocessors: Optional = None neuron\_config: Optional = None \*\*kwargs )

Parameters

* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the `optimum.neuron.modeling.NeuronBaseModel.from_pretrained` method to load the model weights.
* **model** (`torch.jit._script.ScriptModule`) — [torch.jit.\_script.ScriptModule](https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html) is the TorchScript graph compiled by neuron(x) compiler.

Neuron Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.

This model inherits from `~neuron.modeling.NeuronBaseModel`. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Token Classification model on Neuron devices.

**forward**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L411)

( input\_ids: Tensor attention\_mask: Tensor token\_type\_ids: Optional = None \*\*kwargs )

Parameters

* **input\_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer). See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details. [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
* **attention\_mask** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  * 1 for tokens that are **not masked**,
  * 0 for tokens that are **masked**. [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
* **token\_type\_ids** (`Union[torch.Tensor, None]` of shape `(batch_size, sequence_length)`, defaults to `None`) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  * 1 for tokens that are **sentence A**,
  * 0 for tokens that are **sentence B**. [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)

The [NeuronModelForTokenClassification](https://huggingface.co/docs/optimum.neuron/main/en/package_reference/modeling#optimum.neuron.NeuronModelForTokenClassification) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of token classification: *(Following model is compiled with neuronx compiler and can only be run on INF2.)*

Copied

```
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForTokenClassification

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER-neuronx")
>>> model = NeuronModelForTokenClassification.from_pretrained("optimum/bert-base-NER-neuronx")

>>> inputs = tokenizer("Lin-Manuel Miranda is an American songwriter, actor, singer, filmmaker, and playwright.", return_tensors="pt")

>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 20, 9]
```

#### NeuronModelForMultipleChoice

#### class optimum.neuron.NeuronModelForMultipleChoice

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L478)

( model: ScriptModule config: PretrainedConfig model\_save\_dir: Union = None model\_file\_name: Optional = None preprocessors: Optional = None neuron\_config: Optional = None \*\*kwargs )

Parameters

* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the `optimum.neuron.modeling.NeuronBaseModel.from_pretrained` method to load the model weights.
* **model** (`torch.jit._script.ScriptModule`) — [torch.jit.\_script.ScriptModule](https://pytorch.org/docs/stable/generated/torch.jit.ScriptModule.html) is the TorchScript graph compiled by neuron(x) compiler.

Neuron Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

This model inherits from `~neuron.modeling.NeuronBaseModel`. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

Multiple choice model on Neuron devices.

**forward**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L492)

( input\_ids: Tensor attention\_mask: Tensor token\_type\_ids: Optional = None \*\*kwargs )

Parameters

* **input\_ids** (`torch.Tensor` of shape `(batch_size, num_choices, sequence_length)`) — Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`](https://huggingface.co/docs/transformers/autoclass_tutorial#autotokenizer). See [`PreTrainedTokenizer.encode`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.encode) and [`PreTrainedTokenizer.__call__`](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizerBase.__call__) for details. [What are input IDs?](https://huggingface.co/docs/transformers/glossary#input-ids)
* **attention\_mask** (`Union[torch.Tensor, None]` of shape `(batch_size, num_choices, sequence_length)`, defaults to `None`) — Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
  * 1 for tokens that are **not masked**,
  * 0 for tokens that are **masked**. [What are attention masks?](https://huggingface.co/docs/transformers/glossary#attention-mask)
* **token\_type\_ids** (`Union[torch.Tensor, None]` of shape `(batch_size, num_choices, sequence_length)`, defaults to `None`) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:
  * 1 for tokens that are **sentence A**,
  * 0 for tokens that are **sentence B**. [What are token type IDs?](https://huggingface.co/docs/transformers/glossary#token-type-ids)

The [NeuronModelForMultipleChoice](https://huggingface.co/docs/optimum.neuron/main/en/package_reference/modeling#optimum.neuron.NeuronModelForMultipleChoice) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of mutliple choice: *(Following model is compiled with neuronx compiler and can only be run on INF2.)*

Copied

```
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForMultipleChoice

>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased_SWAG-neuronx")
>>> model = NeuronModelForMultipleChoice.from_pretrained("optimum/bert-base-uncased_SWAG-neuronx", export=True)

>>> num_choices = 4
>>> first_sentence = ["Members of the procession walk down the street holding small horn brass instruments."] * num_choices
>>> second_sentence = [
...     "A drum line passes by walking down the street playing their instruments.",
...     "A drum line has heard approaching them.",
...     "A drum line arrives and they're outside dancing and asleep.",
...     "A drum line turns the lead singer watches the performance."
... ]
>>> inputs = tokenizer(first_sentence, second_sentence, truncation=True, padding=True)

# Unflatten the inputs values expanding it to the shape [batch_size, num_choices, seq_length]
>>> for k, v in inputs.items():
...     inputs[k] = [v[i: i + num_choices] for i in range(0, len(v), num_choices)]
>>> inputs = dict(inputs.convert_to_tensors(tensor_type="pt"))
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> logits.shape
[1, 4]
```

#### NeuronModelForCausalLM

#### class optimum.neuron.NeuronModelForCausalLM

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L565)

( model: Module config: PretrainedConfig model\_path: Union generation\_config: Optional = None )

Parameters

* **model** (`torch.nn.Module`) — [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) is the neuron decoder graph.
* **config** (`transformers.PretrainedConfig`) — [PretrainedConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.PretrainedConfig) is the Model configuration class with all the parameters of the model.
* **model\_path** (`Path`) — The directory where the compiled artifacts for the model are stored. It can be a temporary directory if the model has never been saved locally before.
* **generation\_config** (`transformers.GenerationConfig`) — [GenerationConfig](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig) holds the configuration for the model generation task.

Neuron model with a causal language modeling head for inference on Neuron devices.

This model inherits from `~neuron.modeling.NeuronDecoderModel`. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving)

**can\_generate**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L642)

( )

Returns True to validate the check made in `GenerationMixin.generate()`.

**forward**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L592)

( input\_ids: Tensor cache\_ids: Tensor start\_ids: Tensor = None return\_dict: bool = True )

Parameters

* **input\_ids** (`torch.LongTensor`) — Indices of decoder input sequence tokens in the vocabulary of shape `(batch_size, sequence_length)`.
* **cache\_ids** (`torch.LongTensor`) — The indices at which the cached key and value for the current inputs need to be stored.
* **start\_ids** (`torch.LongTensor`) — The indices of the first tokens to be processed, deduced form the attention masks.

The [NeuronModelForCausalLM](https://huggingface.co/docs/optimum.neuron/main/en/package_reference/modeling#optimum.neuron.NeuronModelForCausalLM) forward method, overrides the `__call__` special method.

Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

Example of text generation:

Copied

```
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronModelForCausalLM
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = NeuronModelForCausalLM.from_pretrained("gpt2", export=True)

>>> inputs = tokenizer("My favorite moment of the day is", return_tensors="pt")

>>> gen_tokens = model.generate(**inputs, do_sample=True, temperature=0.9, min_length=20, max_length=20)
>>> tokenizer.batch_decode(gen_tokens)
```

**generate**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L646)

( input\_ids: Tensor attention\_mask: Optional = None generation\_config: Optional = None \*\*kwargs ) → `torch.Tensor`

Parameters

* **input\_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`) — The sequence used as a prompt for the generation.
* **attention\_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) — Mask to avoid performing attention on padding token indices.
* **generation\_config** (`~transformers.generation.GenerationConfig`, *optional*) — The generation configuration to be used as base parametrization for the generation call. `**kwargs` passed to generate matching the attributes of `generation_config` will override them. If `generation_config` is not provided, default will be used, which had the following loading priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model configuration. Please note that unspecified parameters will inherit `GenerationConfig`’s default values, whose documentation should be checked to parameterize generation.

Returns

`torch.Tensor`

A `torch.FloatTensor`.

A streamlined generate() method overriding the transformers.GenerationMixin.generate() method.

This method uses the same logits processors/warpers and stopping criterias as the transformers library `generate()` method but restricts the generation to greedy search and sampling.

It does not support transformers `generate()` advanced options.

Please refer to [https://boincai.com/docs/transformers/en/main\_classes/text\_generation#transformers.GenerationMixin.generate](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationMixin.generate) for details on generation configuration.

**generate\_tokens**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling.py#L721)

( input\_ids: LongTensor selector: TokenSelector batch\_size: int attention\_mask: Optional = None \*\*model\_kwargs ) → `torch.LongTensor`

Parameters

* **input\_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`) — The sequence used as a prompt for the generation.
* **selector** (`TokenSelector`) — The object implementing the generation logic based on transformers processors and stopping criterias.
* **batch\_size** (`int`) — The actual input batch size. Used to avoid generating tokens for padded inputs.
* **attention\_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) — Mask to avoid performing attention on padding token indices. model\_kwargs — Additional model specific kwargs will be forwarded to the `forward` function of the model.

Returns

`torch.LongTensor`

A `torch.LongTensor` containing the generated tokens.

Generate tokens using sampling or greedy search.

### Stable Diffusion

#### NeuronStableDiffusionPipelineBase

#### class optimum.neuron.modeling\_diffusion.NeuronStableDiffusionPipelineBase

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L79)

( text\_encoder: ScriptModule unet: ScriptModule vae\_decoder: Union config: Dict tokenizer: CLIPTokenizer scheduler: Union vae\_encoder: Union = None text\_encoder\_2: Union = None tokenizer\_2: Optional = None feature\_extractor: Optional = None device\_ids: Optional = None configs: Optional = None neuron\_configs: Optional = None model\_save\_dir: Union = None model\_and\_config\_save\_paths: Optional = None )

**load\_model**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L230)

( text\_encoder\_path: Union unet\_path: Union vae\_decoder\_path: Union = None vae\_encoder\_path: Union = None text\_encoder\_2\_path: Union = None device\_ids: Optional = None dynamic\_batch\_size: bool = False )

Parameters

* **text\_encoder\_path** (`Union[str, Path]`) — Path of the compiled text encoder.
* **unet\_path** (`Union[str, Path]`) — Path of the compiled U-NET.
* **vae\_decoder\_path** (`Optional[Union[str, Path]]`, defaults to `None`) — Path of the compiled VAE decoder.
* **vae\_encoder\_path** (`Optional[Union[str, Path]]`, defaults to `None`) — Path of the compiled VAE encoder. It is optional, only used for tasks taking images as input.
* **text\_encoder\_2\_path** (`Optional[Union[str, Path]]`, defaults to `None`) — Path of the compiled second frozen text encoder. SDXL only.
* **device\_ids** (`Optional[List[int]]`, defaults to `None`) — The ID of neuron cores to load a model, in the case of stable diffusion, it is only used for loading unet, and by default unet will be loaded onto both neuron cores of a device.
* **dynamic\_batch\_size** (`bool`, defaults to `False`) — Whether enable dynamic batch size for neuron compiled model. If `True`, the input batch size can be a multiple of the batch size during the compilation.

Loads Stable Diffusion TorchScript modules compiled by neuron(x)-cc compiler. It will be first loaded onto CPU and then moved to one or multiple [NeuronCore](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-hardware/neuroncores-arch.html).

#### NeuronStableDiffusionPipeline

#### class optimum.neuron.NeuronStableDiffusionPipeline

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L675)

( text\_encoder: ScriptModule unet: ScriptModule vae\_decoder: Union config: Dict tokenizer: CLIPTokenizer scheduler: Union vae\_encoder: Union = None text\_encoder\_2: Union = None tokenizer\_2: Optional = None feature\_extractor: Optional = None device\_ids: Optional = None configs: Optional = None neuron\_configs: Optional = None model\_save\_dir: Union = None model\_and\_config\_save\_paths: Optional = None )

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion.py#L52)

( prompt: Union = None num\_inference\_steps: int = 50 guidance\_scale: float = 7.5 negative\_prompt: Union = None num\_images\_per\_prompt: int = 1 eta: float = 0.0 generator: Union = None latents: Optional = None prompt\_embeds: Optional = None negative\_prompt\_embeds: Optional = None output\_type: Optional = 'pil' return\_dict: bool = True callback: Optional = None callback\_steps: int = 1 cross\_attention\_kwargs: Optional = None guidance\_rescale: float = 0.0 ) → `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` or `tuple`

Parameters

* **prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
* **num\_inference\_steps** (`int`, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
* **guidance\_scale** (`float`, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
* **negative\_prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
* **num\_images\_per\_prompt** (`int`, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching).
* **eta** (`float`, defaults to 0.0) — Corresponds to parameter eta (η) from the [DDIM](https://arxiv.org/abs/2010.02502) paper. Only applies to the `diffusers.schedulers.DDIMScheduler`, and is ignored in other schedulers.
* **generator** (`Optional[Union[torch.Generator, List[torch.Generator]]]`, defaults to `None`) — A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **latents** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied random `generator`.
* **prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from the `prompt` input argument.
* **negative\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
* **output\_type** (`Optional[str]`, defaults to `"pil"`) — The output format of the generated image. Choose between `PIL.Image` or `np.array`.
* **return\_dict** (`bool`, defaults to `True`) — Whether or not to return a `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` instead of a plain tuple.
* **callback** (`Optional[Callable]`, defaults to `None`) — A function that calls every `callback_steps` steps during inference. The function is called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, defaults to 1) — The frequency at which the `callback` function is called. If not specified, the callback is called at every step.
* **cross\_attention\_kwargs** (`dict`, defaults to `None`) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
* **guidance\_rescale** (`float`, defaults to 0.0) — Guidance rescale factor from [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). Guidance rescale factor should fix overexposure when using zero terminal SNR.

Returns

`diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` or `tuple`

If `return_dict` is `True`, `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` is returned, otherwise a `tuple` is returned where the first element is a list with the generated images and the second element is a list of `bool`s indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content.

The call function to the pipeline for generation.

Examples:

Copied

```
>>> from optimum.neuron import NeuronStableDiffusionPipeline

>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}

>>> stable_diffusion = NeuronStableDiffusionPipeline.from_pretrained(
...     "runwayml/stable-diffusion-v1-5", export=True, **compiler_args, **input_shapes
... )
>>> stable_diffusion.save_pretrained("sd_neuron/")

>>> prompt = "a photo of an astronaut riding a horse on mars"
>>> image = stable_diffusion(prompt).images[0]
```

#### NeuronStableDiffusionImg2ImgPipeline

#### class optimum.neuron.NeuronStableDiffusionImg2ImgPipeline

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L679)

( text\_encoder: ScriptModule unet: ScriptModule vae\_decoder: Union config: Dict tokenizer: CLIPTokenizer scheduler: Union vae\_encoder: Union = None text\_encoder\_2: Union = None tokenizer\_2: Optional = None feature\_extractor: Optional = None device\_ids: Optional = None configs: Optional = None neuron\_configs: Optional = None model\_save\_dir: Union = None model\_and\_config\_save\_paths: Optional = None )

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_img2img.py#L83)

( prompt: Union = None image: Optional = None strength: float = 0.8 num\_inference\_steps: int = 50 guidance\_scale: float = 7.5 negative\_prompt: Union = None num\_images\_per\_prompt: int = 1 eta: float = 0.0 generator: Optional = None prompt\_embeds: Optional = None negative\_prompt\_embeds: Optional = None output\_type: str = 'pil' return\_dict: bool = True callback: Optional = None callback\_steps: int = 1 cross\_attention\_kwargs: Optional = None ) → `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` or `tuple`

Parameters

* **prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
* **image** (`Optional["PipelineImageInput"]`, defaults to `None`) — `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it’s a tensor or a list or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but if passing latents directly it is not encoded again.
* **strength** (`float`, defaults to 0.8) — Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image`.
* **num\_inference\_steps** (`int`, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. This parameter is modulated by `strength`.
* **guidance\_scale** (`float`, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
* **negative\_prompt** (`Optional[Union[str, List[str]`, defaults to `None`) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
* **num\_images\_per\_prompt** (`int`, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching).
* **eta** (`float`, defaults to 0.0) — Corresponds to parameter eta (η) from the [DDIM](https://arxiv.org/abs/2010.02502) paper. Only applies to the `diffusers.schedulers.DDIMScheduler`, and is ignored in other schedulers.
* **generator** (`Optional[Union[torch.Generator, List[torch.Generator]]]`, defaults to `None`) — A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from the `prompt` input argument.
* **negative\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
* **output\_type** (`Optional[str]`, defaults to `"pil"`) — The output format of the generated image. Choose between `PIL.Image` or `np.array`.
* **return\_dict** (`bool`, defaults to `True`) — Whether or not to return a `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` instead of a plain tuple.
* **callback** (`Optional[Callable]`, defaults to `None`) — A function that calls every `callback_steps` steps during inference. The function is called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, defaults to 1) — The frequency at which the `callback` function is called. If not specified, the callback is called at every step.
* **cross\_attention\_kwargs** (`dict`, defaults to `None`) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).

Returns

`diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` or `tuple`

If `return_dict` is `True`, `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` is returned, otherwise a `tuple` is returned where the first element is a list with the generated images and the second element is a list of `bool`s indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content.

The call function to the pipeline for generation.

Examples:

Copied

```
>>> from optimum.neuron import NeuronStableDiffusionImg2ImgPipeline
>>> from diffusers.utils import load_image

>>> url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
>>> init_image = load_image(url).convert("RGB")

>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> pipeline = NeuronStableDiffusionImg2ImgPipeline.from_pretrained(
...     "nitrosocke/Ghibli-Diffusion", export=True, **compiler_args, **input_shapes, device_ids=[0, 1]
... )
>>> pipeline.save_pretrained("sd_img2img/")

>>> prompt = "ghibli style, a fantasy landscape with snowcapped mountains, trees, lake with detailed reflection."
>>> image = pipeline(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]
```

#### NeuronStableDiffusionInpaintPipeline

#### class optimum.neuron.NeuronStableDiffusionInpaintPipeline

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L685)

( text\_encoder: ScriptModule unet: ScriptModule vae\_decoder: Union config: Dict tokenizer: CLIPTokenizer scheduler: Union vae\_encoder: Union = None text\_encoder\_2: Union = None tokenizer\_2: Optional = None feature\_extractor: Optional = None device\_ids: Optional = None configs: Optional = None neuron\_configs: Optional = None model\_save\_dir: Union = None model\_and\_config\_save\_paths: Optional = None )

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_inpaint.py#L46)

( prompt: Union = None image: Optional = None mask\_image: Optional = None masked\_image\_latents: Optional = None strength: float = 1.0 num\_inference\_steps: int = 50 guidance\_scale: float = 7.5 negative\_prompt: Union = None num\_images\_per\_prompt: Optional = 1 eta: float = 0.0 generator: Union = None latents: Optional = None prompt\_embeds: Optional = None negative\_prompt\_embeds: Optional = None output\_type: Optional = 'pil' return\_dict: bool = True callback: Optional = None callback\_steps: int = 1 cross\_attention\_kwargs: Optional = None clip\_skip: int = None ) → `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` or `tuple`

Parameters

* **prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
* **image** (`Optional["PipelineImageInput"]`, defaults to `None`) — `Image`, numpy array or tensor representing an image batch to be inpainted (which parts of the image to be masked out with `mask_image` and repainted according to `prompt`). For both numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it’s a tensor or a list or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but if passing latents directly it is not encoded again.
* **mask\_image** (`Optional["PipelineImageInput"]`, defaults to `None`) — `Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a single channel (luminance) before use. If it’s a numpy array or pytorch tensor, it should contain one color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B, H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W, 1)`, or `(H, W)`.
* **strength** (`float`, defaults to 1.0) — Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image`.
* **num\_inference\_steps** (`int`, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. This parameter is modulated by `strength`.
* **guidance\_scale** (`float`, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
* **negative\_prompt** (`Optional[Union[str, List[str]`, defaults to `None`) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
* **num\_images\_per\_prompt** (`int`, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching).
* **eta** (`float`, defaults to 0.0) — Corresponds to parameter eta (η) from the [DDIM](https://arxiv.org/abs/2010.02502) paper. Only applies to the `diffusers.schedulers.DDIMScheduler`, and is ignored in other schedulers.
* **generator** (`Optional[Union[torch.Generator, List[torch.Generator]]]`, defaults to `None`) — A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **latents** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied random `generator`.
* **prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from the `prompt` input argument.
* **negative\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
* **output\_type** (`Optional[str]`, defaults to `"pil"`) — The output format of the generated image. Choose between `PIL.Image` or `np.array`.
* **return\_dict** (`bool`, defaults to `True`) — Whether or not to return a `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` instead of a plain tuple.
* **callback** (`Optional[Callable]`, defaults to `None`) — A function that calls every `callback_steps` steps during inference. The function is called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, defaults to 1) — The frequency at which the `callback` function is called. If not specified, the callback is called at every step.
* **cross\_attention\_kwargs** (`dict`, defaults to `None`) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
* **clip\_skip** (`int`, defaults to `None`) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.

Returns

`diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` or `tuple`

If `return_dict` is `True`, `diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput` is returned, otherwise a `tuple` is returned where the first element is a list with the generated images and the second element is a list of `bool`s indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content.

The call function to the pipeline for generation.

Examples:

Copied

```
>>> from optimum.neuron import NeuronStableDiffusionInpaintPipeline
>>> from diffusers.utils import load_image

>>> img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
>>> mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

>>> init_image = load_image(img_url).convert("RGB")
>>> mask_image = load_image(mask_url).convert("RGB")

>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
>>> pipeline = NeuronStableDiffusionInpaintPipeline.from_pretrained(
...     "runwayml/stable-diffusion-inpainting", export=True, **compiler_args, **input_shapes, device_ids=[0, 1])
... )
>>> pipeline.save_pretrained("sd_inpaint/")

>>> prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
>>> image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
```

#### NeuronStableDiffusionXLPipeline

#### class optimum.neuron.NeuronStableDiffusionXLPipeline

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L746)

( text\_encoder: ScriptModule unet: ScriptModule vae\_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae\_encoder: Optional = None text\_encoder\_2: Optional = None tokenizer\_2: Optional = None feature\_extractor: Optional = None device\_ids: Optional = None configs: Optional = None neuron\_configs: Optional = None model\_save\_dir: Union = None model\_and\_config\_save\_paths: Optional = None add\_watermarker: Optional = None )

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_xl.py#L57)

( prompt: Union = None prompt\_2: Union = None num\_inference\_steps: int = 50 denoising\_end: Optional = None guidance\_scale: float = 5.0 negative\_prompt: Union = None negative\_prompt\_2: Union = None num\_images\_per\_prompt: int = 1 eta: float = 0.0 generator: Union = None latents: Optional = None prompt\_embeds: Optional = None negative\_prompt\_embeds: Optional = None pooled\_prompt\_embeds: Optional = None negative\_pooled\_prompt\_embeds: Optional = None output\_type: Optional = 'pil' return\_dict: bool = True callback: Optional = None callback\_steps: int = 1 cross\_attention\_kwargs: Optional = None guidance\_rescale: float = 0.0 original\_size: Optional = None crops\_coords\_top\_left: Tuple = (0, 0) target\_size: Optional = None negative\_original\_size: Optional = None negative\_crops\_coords\_top\_left: Tuple = (0, 0) negative\_target\_size: Optional = None clip\_skip: Optional = None ) → `diffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput` or `tuple`

Parameters

* **prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. instead.
* **prompt\_2** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to be sent to the `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is used in both text-encoders
* **num\_inference\_steps** (`int`, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
* **denoising\_end** (`Optional[float]`, defaults to `None`) — When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise as determined by the discrete timesteps selected by the scheduler. The denoising\_end parameter should ideally be utilized when this pipeline forms a part of a “Mixture of Denoisers” multi-pipeline setup, as elaborated in [**Refining the Image Output**](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#refining-the-image-output)
* **guidance\_scale** (`float`, defaults to 5.0) — Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). `guidance_scale` is defined as `w` of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, usually at the expense of lower image quality.
* **negative\_prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
* **negative\_prompt\_2** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
* **num\_images\_per\_prompt** (`int`, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching).
* **eta** (`float`, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: <https://arxiv.org/abs/2010.02502>. Only applies to `schedulers.DDIMScheduler`, will be ignored for others.
* **generator** (`Optional[Union[torch.Generator, List[torch.Generator]]]`, defaults to `None`) — One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **latents** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random `generator`.
* **prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, text embeddings will be generated from `prompt` input argument.
* **negative\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **pooled\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, pooled text embeddings will be generated from `prompt` input argument.
* **negative\_pooled\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, pooled negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **output\_type** (`Optional[str]`, defaults to `"pil"`) — The output format of the generate image. Choose between [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
* **return\_dict** (`bool`, defaults to `True`) — Whether or not to return a `diffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput` instead of a plain tuple.
* **callback** (`Optional[Callable]`, defaults to `None`) — A function that will be called every `callback_steps` steps during inference. The function will be called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, defaults to 1) — The frequency at which the `callback` function will be called. If not specified, the callback will be called at every step.
* **cross\_attention\_kwargs** (`dict`, defaults to `None`) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under `self.processor` in [diffusers.models.attention\_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
* **guidance\_rescale** (`float`, *optional*, defaults to 0.0) — Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf) `guidance_scale` is defined as `φ` in equation 16. of [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). Guidance rescale factor should fix overexposure when using zero terminal SNR.
* **original\_size** (`Optional[Tuple[int, int]]`, defaults to (1024, 1024)) — If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled. `original_size` defaults to `(width, height)` if not specified. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **crops\_coords\_top\_left** (`Tuple[int]`, defaults to (0, 0)) — `crops_coords_top_left` can be used to generate an image that appears to be “cropped” from the position `crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting `crops_coords_top_left` to (0, 0). Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **target\_size** (`Tuple[int]`,defaults to (1024, 1024)) — For most cases, `target_size` should be set to the desired height and width of the generated image. If not specified it will default to `(width, height)`. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **negative\_original\_size** (`Tuple[int]`, defaults to (1024, 1024)) — To negatively condition the generation process based on a specific image resolution. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **negative\_crops\_coords\_top\_left** (`Tuple[int]`, defaults to (0, 0)) — To negatively condition the generation process based on a specific crop coordinates. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **negative\_target\_size** (`Tuple[int]`, defaults to (1024, 1024)) — To negatively condition the generation process based on a target image resolution. It should be as same as the `target_size` for most cases. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **clip\_skip** (`Optional[int]`, defaults to `None`) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.

Returns

`diffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput` or `tuple`

`diffusers.pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput` if `return_dict` is True, otherwise a `tuple`. When returning a tuple, the first element is a list with the generated images.

Function invoked when calling the pipeline for generation.

Examples:

Copied

```
>>> from optimum.neuron import NeuronStableDiffusionXLPipeline

>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}

>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
...     "stabilityai/stable-diffusion-xl-base-1.0", export=True, **compiler_args, **input_shapes)
... )
>>> stable_diffusion_xl.save_pretrained("sd_neuron_xl/")

>>> prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
>>> image = stable_diffusion_xl(prompt).images[0]
```

#### NeuronStableDiffusionXLImg2ImgPipeline

#### class optimum.neuron.NeuronStableDiffusionXLImg2ImgPipeline

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L750)

( text\_encoder: ScriptModule unet: ScriptModule vae\_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae\_encoder: Optional = None text\_encoder\_2: Optional = None tokenizer\_2: Optional = None feature\_extractor: Optional = None device\_ids: Optional = None configs: Optional = None neuron\_configs: Optional = None model\_save\_dir: Union = None model\_and\_config\_save\_paths: Optional = None add\_watermarker: Optional = None )

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_xl_img2img.py#L109)

( prompt: Union = None prompt\_2: Union = None image: Optional = None strength: float = 0.3 num\_inference\_steps: int = 50 denoising\_start: Optional = None denoising\_end: Optional = None guidance\_scale: float = 5.0 negative\_prompt: Union = None negative\_prompt\_2: Union = None num\_images\_per\_prompt: Optional = 1 eta: float = 0.0 generator: Union = None latents: Optional = None prompt\_embeds: Optional = None negative\_prompt\_embeds: Optional = None pooled\_prompt\_embeds: Optional = None negative\_pooled\_prompt\_embeds: Optional = None output\_type: Optional = 'pil' return\_dict: bool = True callback: Optional = None callback\_steps: int = 1 cross\_attention\_kwargs: Optional = None guidance\_rescale: float = 0.0 original\_size: Tuple = None crops\_coords\_top\_left: Tuple = (0, 0) target\_size: Tuple = None negative\_original\_size: Optional = None negative\_crops\_coords\_top\_left: Tuple = (0, 0) negative\_target\_size: Optional = None aesthetic\_score: float = 6.0 negative\_aesthetic\_score: float = 2.5 clip\_skip: Optional = None ) → `diffusers.pipelines.stable_diffusion.StableDiffusionXLPipelineOutput` or `tuple`

Parameters

* **prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. instead.
* **prompt\_2** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to be sent to the `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is used in both text-encoders
* **image** (`Optional["PipelineImageInput"]`, defaults to `None`) — The image(s) to modify with the pipeline.
* **strength** (`float`, defaults to 0.3) — Conceptually, indicates how much to transform the reference `image`. Must be between 0 and 1. `image` will be used as a starting point, adding more noise to it the larger the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified in `num_inference_steps`. A value of 1, therefore, essentially ignores `image`. Note that in the case of `denoising_start` being declared as an integer, the value of `strength` will be ignored.
* **num\_inference\_steps** (`int`, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
* **denoising\_start** (`Optional[float]`, defaults to `None`) — When specified, indicates the fraction (between 0.0 and 1.0) of the total denoising process to be bypassed before it is initiated. Consequently, the initial part of the denoising process is skipped and it is assumed that the passed `image` is a partly denoised image. Note that when this is specified, strength will be ignored. The `denoising_start` parameter is particularly beneficial when this pipeline is integrated into a “Mixture of Denoisers” multi-pipeline setup, as detailed in [**Refining the Image Output**](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#refining-the-image-output).
* **denoising\_end** (`Optional[float]`, defaults to `None`) — When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise (ca. final 20% of timesteps still needed) and should be denoised by a successor pipeline that has `denoising_start` set to 0.8 so that it only denoises the final 20% of the scheduler. The denoising\_end parameter should ideally be utilized when this pipeline forms a part of a “Mixture of Denoisers” multi-pipeline setup, as elaborated in [**Refining the Image Output**](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#refining-the-image-output).
* **guidance\_scale** (`float`, defaults to 7.5) — Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). `guidance_scale` is defined as `w` of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, usually at the expense of lower image quality.
* **negative\_prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
* **negative\_prompt\_2** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
* **num\_images\_per\_prompt** (`int`, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching).
* **eta** (`float`, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: <https://arxiv.org/abs/2010.02502>. Only applies to `schedulers.DDIMScheduler`, will be ignored for others.
* **generator** (`Optional[Union[torch.Generator, List[torch.Generator]]]`, defaults to `None`) — One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **latents** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random `generator`.
* **prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, text embeddings will be generated from `prompt` input argument.
* **negative\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **pooled\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, pooled text embeddings will be generated from `prompt` input argument.
* **negative\_pooled\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, pooled negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **output\_type** (`Optional[str]`, defaults to `"pil"`) — The output format of the generate image. Choose between [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
* **return\_dict** (`bool`, defaults to `True`) — Whether or not to return a `diffusers.pipelines.stable_diffusion.StableDiffusionXLPipelineOutput` instead of a plain tuple.
* **callback** (`Optional[Callable]`, defaults to `None`) — A function that will be called every `callback_steps` steps during inference. The function will be called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_stcallback\_steps** (`int`, defaults to 1) — The frequency at which the `callback` function will be called. If not specified, the callback will be called at every step.
* **cross\_attention\_kwargs** (`Optional[Dict[str, Any]]`, defaults to `None`) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under `self.processor` in [diffusers.models.attention\_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
* **guidance\_rescale** (`float`, defaults to 0.0) — Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf) `guidance_scale` is defined as `φ` in equation 16. of [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). Guidance rescale factor should fix overexposure when using zero terminal SNR.
* **original\_size** (`Optional[Tuple[int, int]]`, defaults to (1024, 1024)) — If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled. `original_size` defaults to `(width, height)` if not specified. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **crops\_coords\_top\_left** (`Tuple[int]`, defaults to (0, 0)) — `crops_coords_top_left` can be used to generate an image that appears to be “cropped” from the position `crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting `crops_coords_top_left` to (0, 0). Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **target\_size** (`Tuple[int]`,defaults to (1024, 1024)) — For most cases, `target_size` should be set to the desired height and width of the generated image. If not specified it will default to `(width, height)`. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **negative\_original\_size** (`Tuple[int]`, defaults to (1024, 1024)) — To negatively condition the generation process based on a specific image resolution. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai.com/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **negative\_crops\_coords\_top\_left** (`Tuple[int]`, defaults to (0, 0)) — To negatively condition the generation process based on a specific crop coordinates. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **negative\_target\_size** (`Tuple[int]`, defaults to (1024, 1024)) — To negatively condition the generation process based on a target image resolution. It should be as same as the `target_size` for most cases. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **aesthetic\_score** (`float`, defaults to 6.0) — Used to simulate an aesthetic score of the generated image by influencing the positive text condition. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **negative\_aesthetic\_score** (`float`, defaults to 2.5) — Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). Can be used to simulate an aesthetic score of the generated image by influencing the negative text condition.
* **clip\_skip** (`Optional[int]`, defaults to `None`) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.

Returns

`diffusers.pipelines.stable_diffusion.StableDiffusionXLPipelineOutput` or `tuple`

`diffusers.pipelines.stable_diffusion.StableDiffusionXLPipelineOutput` if `return_dict` is True, otherwise a \`tuple. When returning a tuple, the first element is a list with the generated images.

Function invoked when calling the pipeline for generation.

Examples:

Copied

```
>>> from optimum.neuron import NeuronStableDiffusionXLImg2ImgPipeline
>>> from diffusers.utils import load_image

>>> url = "https://boincai.com/datasets/optimum/documentation-images/resolve/main/intel/openvino/sd_xl/castle_friedrich.png"
>>> init_image = load_image(url).convert("RGB")

>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> pipeline = NeuronStableDiffusionXLImg2ImgPipeline.from_pretrained(
...     "stabilityai/stable-diffusion-xl-base-1.0", export=True, **compiler_args, **input_shapes, device_ids=[0, 1]
... )
>>> pipeline.save_pretrained("sdxl_img2img/")

>>> prompt = "a dog running, lake, moat"
>>> image = pipeline(prompt=prompt, image=init_image).images[0]
```

#### NeuronStableDiffusionXLInpaintPipeline

#### class optimum.neuron.NeuronStableDiffusionXLInpaintPipeline

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/modeling_diffusion.py#L756)

( text\_encoder: ScriptModule unet: ScriptModule vae\_decoder: ScriptModule config: Dict tokenizer: CLIPTokenizer scheduler: Union vae\_encoder: Optional = None text\_encoder\_2: Optional = None tokenizer\_2: Optional = None feature\_extractor: Optional = None device\_ids: Optional = None configs: Optional = None neuron\_configs: Optional = None model\_save\_dir: Union = None model\_and\_config\_save\_paths: Optional = None add\_watermarker: Optional = None )

**\_\_call\_\_**

[\<source>](https://github.com/huggingface/optimum-neuron/blob/main/optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_xl_inpaint.py#L74)

( prompt: Union = None prompt\_2: Union = None image: Optional = None mask\_image: Optional = None masked\_image\_latents: Optional = None strength: float = 0.9999 num\_inference\_steps: int = 50 denoising\_start: Optional = None denoising\_end: Optional = None guidance\_scale: float = 7.5 negative\_prompt: Union = None negative\_prompt\_2: Union = None num\_images\_per\_prompt: Optional = 1 eta: float = 0.0 generator: Union = None latents: Optional = None prompt\_embeds: Optional = None negative\_prompt\_embeds: Optional = None pooled\_prompt\_embeds: Optional = None negative\_pooled\_prompt\_embeds: Optional = None output\_type: Optional = 'pil' return\_dict: bool = True callback: Optional = None callback\_steps: int = 1 cross\_attention\_kwargs: Optional = None guidance\_rescale: float = 0.0 original\_size: Tuple = None crops\_coords\_top\_left: Tuple = (0, 0) target\_size: Tuple = None negative\_original\_size: Optional = None negative\_crops\_coords\_top\_left: Tuple = (0, 0) negative\_target\_size: Optional = None aesthetic\_score: float = 6.0 negative\_aesthetic\_score: float = 2.5 clip\_skip: Optional = None ) → `diffusers.pipelines.stable_diffusion.StableDiffusionXLPipelineOutput` or `tuple`

Parameters

* **prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. instead.
* **prompt\_2** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts to be sent to the `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is used in both text-encoders
* **image** (`Optional["PipelineImageInput"]`, defaults to `None`) — `Image`, or tensor representing an image batch which will be inpainted, *i.e.* parts of the image will be masked out with `mask_image` and repainted according to `prompt`.
* **mask\_image** (`Optional["PipelineImageInput"]`, defaults to `None`) — `Image`, or tensor representing an image batch, to mask `image`. White pixels in the mask will be repainted, while black pixels will be preserved. If `mask_image` is a PIL image, it will be converted to a single channel (luminance) before use. If it’s a tensor, it should contain one color channel (L) instead of 3, so the expected shape would be `(B, H, W, 1)`.
* **strength** (`float`, defaults to 0.9999) — Conceptually, indicates how much to transform the masked portion of the reference `image`. Must be between 0 and 1. `image` will be used as a starting point, adding more noise to it the larger the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise will be maximum and the denoising process will run for the full number of iterations specified in `num_inference_steps`. A value of 1, therefore, essentially ignores the masked portion of the reference `image`. Note that in the case of `denoising_start` being declared as an integer, the value of `strength` will be ignored.
* **num\_inference\_steps** (`int`, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
* **denoising\_start** (`Optional[float]`, defaults to `None`) — When specified, indicates the fraction (between 0.0 and 1.0) of the total denoising process to be bypassed before it is initiated. Consequently, the initial part of the denoising process is skipped and it is assumed that the passed `image` is a partly denoised image. Note that when this is specified, strength will be ignored. The `denoising_start` parameter is particularly beneficial when this pipeline is integrated into a “Mixture of Denoisers” multi-pipeline setup, as detailed in [**Refining the Image Output**](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#refining-the-image-output).
* **denoising\_end** (`Optional[float]`, defaults to `None`) — When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise (ca. final 20% of timesteps still needed) and should be denoised by a successor pipeline that has `denoising_start` set to 0.8 so that it only denoises the final 20% of the scheduler. The denoising\_end parameter should ideally be utilized when this pipeline forms a part of a “Mixture of Denoisers” multi-pipeline setup, as elaborated in [**Refining the Image Output**](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#refining-the-image-output).
* **guidance\_scale** (`float`, defaults to 7.5) — Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598). `guidance_scale` is defined as `w` of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`, usually at the expense of lower image quality.
* **negative\_prompt** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`).
* **negative\_prompt\_2** (`Optional[Union[str, List[str]]]`, defaults to `None`) — The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
* **prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, text embeddings will be generated from `prompt` input argument.
* **negative\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **pooled\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, pooled text embeddings will be generated from `prompt` input argument.
* **negative\_pooled\_prompt\_embeds** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not provided, pooled negative\_prompt\_embeds will be generated from `negative_prompt` input argument.
* **num\_images\_per\_prompt** (`int`, defaults to 1) — The number of images to generate per prompt.
* **eta** (`float`, defaults to 0.0) — Corresponds to parameter eta (η) in the DDIM paper: <https://arxiv.org/abs/2010.02502>. Only applies to `schedulers.DDIMScheduler`, will be ignored for others.
* **generator** (`Optional[Union[torch.Generator, List[torch.Generator]]]`, defaults to `None`) — One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make generation deterministic.
* **latents** (`Optional[torch.FloatTensor]`, defaults to `None`) — Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random `generator`.
* **output\_type** (`Optional[str]`, defaults to `"pil"`) — The output format of the generate image. Choose between [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
* **return\_dict** (`bool`, defaults to `True`) — Whether or not to return a `~pipelines.stable_diffusion.StableDiffusionPipelineOutput` instead of a plain tuple.
* **callback** (`Optional[Callable]`, defaults to `None`) — A function that will be called every `callback_steps` steps during inference. The function will be called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
* **callback\_steps** (`int`, defaults to 1) — The frequency at which the `callback` function will be called. If not specified, the callback will be called at every step.
* **cross\_attention\_kwargs** (`Optional[Dict[str, Any]]`, defaults to `None`) — A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under `self.processor` in [diffusers.models.attention\_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
* **original\_size** (`Tuple[int]`, defaults to (1024, 1024)) — If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled. `original_size` defaults to `(height, width)` if not specified. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **crops\_coords\_top\_left** (`Tuple[int]`, defaults to (0, 0)) — `crops_coords_top_left` can be used to generate an image that appears to be “cropped” from the position `crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting `crops_coords_top_left` to (0, 0). Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **target\_size** (`Tuple[int]`, defaults to (1024, 1024)) — For most cases, `target_size` should be set to the desired height and width of the generated image. If not specified it will default to `(height, width)`. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **negative\_original\_size** (`Tuple[int]`, defaults to (1024, 1024)) — To negatively condition the generation process based on a specific image resolution. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **negative\_crops\_coords\_top\_left** (`Tuple[int]`, defaults to (0, 0)) — To negatively condition the generation process based on a specific crop coordinates. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **negative\_target\_size** (`Tuple[int]`, defaults to (1024, 1024)) — To negatively condition the generation process based on a target image resolution. It should be as same as the `target_size` for most cases. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more information, refer to this issue thread: [https://github.com/boincai.com/diffusers/issues/4208](https://github.com/huggingface/diffusers/issues/4208).
* **aesthetic\_score** (`float`, defaults to 6.0) — Used to simulate an aesthetic score of the generated image by influencing the positive text condition. Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952).
* **negative\_aesthetic\_score** (`float`, defaults to 2.5) — Part of SDXL’s micro-conditioning as explained in section 2.2 of [https://boincai.com/papers/2307.01952](https://huggingface.co/papers/2307.01952). Can be used to simulate an aesthetic score of the generated image by influencing the negative text condition.
* **clip\_skip** (`Optional[int]`, defaults to `None`) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.

Returns

`diffusers.pipelines.stable_diffusion.StableDiffusionXLPipelineOutput` or `tuple`

`diffusers.pipelines.stable_diffusion.StableDiffusionXLPipelineOutput` if `return_dict` is True, otherwise a `tuple.` tuple. When returning a tuple, the first element is a list with the generated images.

Function invoked when calling the pipeline for generation.

Examples:

Copied

```
>>> from optimum.neuron import NeuronStableDiffusionXLInpaintPipeline
>>> from diffusers.utils import load_image

>>> img_url = "https://boincai.com/datasets/boincai/documentation-images/resolve/main/diffusers/sdxl-text2img.png" (
>>> mask_url = "https://boincai.com/datasets/boincai/documentation-images/resolve/main/diffusers/sdxl-inpaint-mask.png"

>>> init_image = load_image(img_url).convert("RGB")
>>> mask_image = load_image(mask_url).convert("RGB")

>>> compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
>>> input_shapes = {"batch_size": 1, "height": 1024, "width": 1024}
>>> pipeline = NeuronStableDiffusionXLInpaintPipeline.from_pretrained(
...     "stabilityai/stable-diffusion-xl-base-1.0", export=True, **compiler_args, **input_shapes, device_ids=[0, 1])
... )
>>> pipeline.save_pretrained("sdxl_inpaint/")

>>> prompt = "A deep sea diver floating"
>>> image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85, guidance_scale=12.5).images[0]
```
