# Prior Transformer

## Transformer Temporal

A Transformer model for video-like data.

### TransformerTemporalModel

#### class diffusers.models.TransformerTemporalModel

[\<source>](https://github.com/huggingface/diffusers/blob/v0.21.0/src/diffusers/models/transformer_temporal.py#L39)

( num\_attention\_heads: int = 16attention\_head\_dim: int = 88in\_channels: typing.Optional\[int] = Noneout\_channels: typing.Optional\[int] = Nonenum\_layers: int = 1dropout: float = 0.0norm\_num\_groups: int = 32cross\_attention\_dim: typing.Optional\[int] = Noneattention\_bias: bool = Falsesample\_size: typing.Optional\[int] = Noneactivation\_fn: str = 'geglu'norm\_elementwise\_affine: bool = Truedouble\_self\_attention: bool = True )

Parameters

* **num\_attention\_heads** (`int`, *optional*, defaults to 16) — The number of heads to use for multi-head attention.
* **attention\_head\_dim** (`int`, *optional*, defaults to 88) — The number of channels in each head.
* **in\_channels** (`int`, *optional*) — The number of channels in the input and output (specify if the input is **continuous**).
* **num\_layers** (`int`, *optional*, defaults to 1) — The number of layers of Transformer blocks to use.
* **dropout** (`float`, *optional*, defaults to 0.0) — The dropout probability to use.
* **cross\_attention\_dim** (`int`, *optional*) — The number of `encoder_hidden_states` dimensions to use.
* **sample\_size** (`int`, *optional*) — The width of the latent images (specify if the input is **discrete**). This is fixed during training since it is used to learn a number of position embeddings.
* **activation\_fn** (`str`, *optional*, defaults to `"geglu"`) — Activation function to use in feed-forward.
* **attention\_bias** (`bool`, *optional*) — Configure if the `TransformerBlock` attention should contain a bias parameter.
* **double\_self\_attention** (`bool`, *optional*) — Configure if each `TransformerBlock` should contain two self-attention layers.

A Transformer model for video-like data.

**forward**

[\<source>](https://github.com/huggingface/diffusers/blob/v0.21.0/src/diffusers/models/transformer_temporal.py#L107)

( hidden\_statesencoder\_hidden\_states = Nonetimestep = Noneclass\_labels = Nonenum\_frames = 1cross\_attention\_kwargs = Nonereturn\_dict: bool = True ) → [TransformerTemporalModelOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/models/transformer_temporal#diffusers.models.transformer_temporal.TransformerTemporalModelOutput) or `tuple`

Parameters

* **hidden\_states** (`torch.LongTensor` of shape `(batch size, num latent pixels)` if discrete, `torch.FloatTensor` of shape `(batch size, channel, height, width)` if continuous) — Input hidden\_states.
* **encoder\_hidden\_states** ( `torch.LongTensor` of shape `(batch size, encoder_hidden_states dim)`, *optional*) — Conditional embeddings for cross attention layer. If not given, cross-attention defaults to self-attention.
* **timestep** ( `torch.long`, *optional*) — Used to indicate denoising step. Optional timestep to be applied as an embedding in `AdaLayerNorm`.
* **class\_labels** ( `torch.LongTensor` of shape `(batch size, num classes)`, *optional*) — Used to indicate class labels conditioning. Optional class labels to be applied as an embedding in `AdaLayerZeroNorm`.
* **return\_dict** (`bool`, *optional*, defaults to `True`) — Whether or not to return a [UNet2DConditionOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/models/unet2d-cond#diffusers.models.unet_2d_condition.UNet2DConditionOutput) instead of a plain tuple.

Returns

[TransformerTemporalModelOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/models/transformer_temporal#diffusers.models.transformer_temporal.TransformerTemporalModelOutput) or `tuple`

If `return_dict` is True, an [TransformerTemporalModelOutput](https://huggingface.co/docs/diffusers/v0.21.0/en/api/models/transformer_temporal#diffusers.models.transformer_temporal.TransformerTemporalModelOutput) is returned, otherwise a `tuple` where the first element is the sample tensor.

The `TransformerTemporal` forward method.

### TransformerTemporalModelOutput

#### class diffusers.models.transformer\_temporal.TransformerTemporalModelOutput

[\<source>](https://github.com/huggingface/diffusers/blob/v0.21.0/src/diffusers/models/transformer_temporal.py#L27)

( sample: FloatTensor )

Parameters

* **sample** (`torch.FloatTensor` of shape `(batch_size x num_frames, num_channels, height, width)`) — The hidden states output conditioned on `encoder_hidden_states` input.

The output of `TransformerTemporalModel`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://boinc-ai.gitbook.io/diffusers/api/models/prior-transformer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
