Attention Processor
Last updated
Last updated
An attention processor is a class for applying different types of attention mechanisms.
( )
Default processor for performing attention-related computations.
( )
Processor for implementing scaled dot-product attention (enabled by default if youβre using PyTorch 2.0).
( hidden_sizecross_attention_dim = Nonerank = 4network_alpha = None**kwargs )
Parameters
hidden_size (int
, optional) β The hidden size of the attention layer.
cross_attention_dim (int
, optional) β The number of channels in the encoder_hidden_states
.
rank (int
, defaults to 4) β The dimension of the LoRA update matrices.
network_alpha (int
, optional) β Equivalent to alpha
but itβs usage is specific to Kohya (A1111) style LoRAs.
Processor for implementing the LoRA attention mechanism.
( hidden_sizecross_attention_dim = Nonerank = 4network_alpha = None**kwargs )
Parameters
hidden_size (int
) β The hidden size of the attention layer.
cross_attention_dim (int
, optional) β The number of channels in the encoder_hidden_states
.
rank (int
, defaults to 4) β The dimension of the LoRA update matrices.
network_alpha (int
, optional) β Equivalent to alpha
but itβs usage is specific to Kohya (A1111) style LoRAs.
Processor for implementing the LoRA attention mechanism using PyTorch 2.0βs memory-efficient scaled dot-product attention.
( train_kv = Truetrain_q_out = Truehidden_size = Nonecross_attention_dim = Noneout_bias = Truedropout = 0.0 )
Parameters
train_kv (bool
, defaults to True
) β Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool
, defaults to True
) β Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int
, optional, defaults to None
) β The hidden size of the attention layer.
cross_attention_dim (int
, optional, defaults to None
) β The number of channels in the encoder_hidden_states
.
out_bias (bool
, defaults to True
) β Whether to include the bias parameter in train_q_out
.
dropout (float
, optional, defaults to 0.0) β The dropout probability to use.
Processor for implementing attention for the Custom Diffusion method.
( )
Processor for performing attention-related computations with extra learnable key and value matrices for the text encoder.
( )
Processor for performing scaled dot-product attention (enabled by default if youβre using PyTorch 2.0), with extra learnable key and value matrices for the text encoder.
( hidden_sizecross_attention_dim = Nonerank = 4network_alpha = None )
Parameters
hidden_size (int
, optional) β The hidden size of the attention layer.
cross_attention_dim (int
, optional, defaults to None
) β The number of channels in the encoder_hidden_states
.
rank (int
, defaults to 4) β The dimension of the LoRA update matrices.
Processor for implementing the LoRA attention mechanism with extra learnable key and value matrices for the text encoder.
( attention_op: typing.Optional[typing.Callable] = None )
Parameters
Processor for implementing memory efficient attention using xFormers.
( hidden_sizecross_attention_dimrank = 4attention_op: typing.Optional[typing.Callable] = Nonenetwork_alpha = None**kwargs )
Parameters
hidden_size (int
, optional) β The hidden size of the attention layer.
cross_attention_dim (int
, optional) β The number of channels in the encoder_hidden_states
.
rank (int
, defaults to 4) β The dimension of the LoRA update matrices.
network_alpha (int
, optional) β Equivalent to alpha
but itβs usage is specific to Kohya (A1111) style LoRAs.
Processor for implementing the LoRA attention mechanism with memory efficient attention using xFormers.
( train_kv = Truetrain_q_out = Falsehidden_size = Nonecross_attention_dim = Noneout_bias = Truedropout = 0.0attention_op: typing.Optional[typing.Callable] = None )
Parameters
train_kv (bool
, defaults to True
) β Whether to newly train the key and value matrices corresponding to the text features.
train_q_out (bool
, defaults to True
) β Whether to newly train query matrices corresponding to the latent image features.
hidden_size (int
, optional, defaults to None
) β The hidden size of the attention layer.
cross_attention_dim (int
, optional, defaults to None
) β The number of channels in the encoder_hidden_states
.
out_bias (bool
, defaults to True
) β Whether to include the bias parameter in train_q_out
.
dropout (float
, optional, defaults to 0.0) β The dropout probability to use.
Processor for implementing memory efficient attention using xFormers for the Custom Diffusion method.
( slice_size )
Parameters
slice_size (int
, optional) β The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size
, and attention_head_dim
must be a multiple of the slice_size
.
Processor for implementing sliced attention.
( slice_size )
Parameters
slice_size (int
, optional) β The number of steps to compute attention. Uses as many slices as attention_head_dim // slice_size
, and attention_head_dim
must be a multiple of the slice_size
.
Processor for implementing sliced attention with extra learnable key and value matrices for the text encoder.
attention_op (Callable
, optional, defaults to None
) β The base to use as the attention operator. It is recommended to set to None
, and allow xFormers to choose the best operator.
attention_op (Callable
, optional, defaults to None
) β The base to use as the attention operator. It is recommended to set to None
, and allow xFormers to choose the best operator.
attention_op (Callable
, optional, defaults to None
) β The base to use as the attention operator. It is recommended to set to None
, and allow xFormers to choose the best operator.