Trajectory Transformer
Last updated
Last updated
This model is in maintenance mode only, so we won’t accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.30.0. You can do so by running the following command: pip install -U transformers==4.30.0
.
The Trajectory Transformer model was proposed in by Michael Janner, Qiyang Li, Sergey Levine.
The abstract from the paper is the following:
Reinforcement learning (RL) is typically concerned with estimating stationary policies or single-step models, leveraging the Markov property to factorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence of high rewards. Viewed in this way, it is tempting to consider whether high-capacity sequence prediction models that work well in other domains, such as natural-language processing, can also provide effective solutions to the RL problem. To this end, we explore how RL can be tackled with the tools of sequence modeling, using a Transformer architecture to model distributions over trajectories and repurposing beam search as a planning algorithm. Framing RL as sequence modeling problem simplifies a range of design decisions, allowing us to dispense with many of the components common in offline RL algorithms. We demonstrate the flexibility of this approach across long-horizon dynamics prediction, imitation learning, goal-conditioned RL, and offline RL. Further, we show that this approach can be combined with existing model-free algorithms to yield a state-of-the-art planner in sparse-reward, long-horizon tasks.
Tips:
This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).
This model was contributed by . The original code can be found .
( vocab_size = 100action_weight = 5reward_weight = 1value_weight = 1block_size = 249action_dim = 6observation_dim = 17transition_dim = 25n_layer = 4n_head = 4n_embd = 128embd_pdrop = 0.1attn_pdrop = 0.1resid_pdrop = 0.1learning_rate = 0.0006max_position_embeddings = 512initializer_range = 0.02layer_norm_eps = 1e-12kaiming_initializer_range = 1use_cache = Truepad_token_id = 1bos_token_id = 50256eos_token_id = 50256**kwargs )
Parameters
action_weight (int
, optional, defaults to 5) — Weight of the action in the loss function
reward_weight (int
, optional, defaults to 1) — Weight of the reward in the loss function
value_weight (int
, optional, defaults to 1) — Weight of the value in the loss function
block_size (int
, optional, defaults to 249) — Size of the blocks in the trajectory transformer.
action_dim (int
, optional, defaults to 6) — Dimension of the action space.
observation_dim (int
, optional, defaults to 17) — Dimension of the observation space.
transition_dim (int
, optional, defaults to 25) — Dimension of the transition space.
n_layer (int
, optional, defaults to 4) — Number of hidden layers in the Transformer encoder.
n_head (int
, optional, defaults to 4) — Number of attention heads for each attention layer in the Transformer encoder.
n_embd (int
, optional, defaults to 128) — Dimensionality of the embeddings and hidden states.
resid_pdrop (float
, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
embd_pdrop (int
, optional, defaults to 0.1) — The dropout ratio for the embeddings.
attn_pdrop (float
, optional, defaults to 0.1) — The dropout ratio for the attention.
hidden_act (str
or function
, optional, defaults to "gelu"
) — The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu"
, "relu"
, "selu"
and "gelu_new"
are supported.
max_position_embeddings (int
, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
initializer_range (float
, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (float
, optional, defaults to 1e-12) — The epsilon used by the layer normalization layers.
kaiming_initializer_range (`float, optional, defaults to 1) — A coefficient scaling the negative slope of the kaiming initializer rectifier for EinLinear layers.
use_cache (bool
, optional, defaults to True
) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True
. Example —
Copied
( config )
Parameters
the full GPT language model, with a context size of block_size
forward
( trajectories: typing.Optional[torch.LongTensor] = Nonepast_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = Nonetargets: typing.Optional[torch.FloatTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Noneuse_cache: typing.Optional[bool] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = None ) → transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or tuple(torch.FloatTensor)
Parameters
trajectories (torch.LongTensor
of shape (batch_size, sequence_length)
) — Batch of trajectories, where a trajectory is a sequence of states, actions and rewards.
past_key_values (Tuple[Tuple[torch.Tensor]]
of length config.n_layers
, optional) — Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (see past_key_values
output below). Can be used to speed up sequential decoding. The input_ids
which have their past given to this model should not be passed as input_ids
as they have already been computed.
targets (torch.LongTensor
of shape (batch_size, sequence_length)
, optional) — Desired targets used to compute the loss.
attention_mask (torch.FloatTensor
of shape (batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]
:
1 for tokens that are not masked,
0 for tokens that are masked.
use_cache (bool
, optional) — If set to True
, past_key_values
key value states are returned and can be used to speed up decoding (see past_key_values
).
output_attentions (bool
, optional) — Whether or not to return the attentions tensors of all attention layers. See attentions
under returned tensors for more detail.
output_hidden_states (bool
, optional) — Whether or not to return the hidden states of all layers. See hidden_states
under returned tensors for more detail.
Returns
transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or tuple(torch.FloatTensor)
loss (torch.FloatTensor
of shape (1,)
, optional, returned when labels
is provided) — Language modeling loss.
logits (torch.FloatTensor
of shape (batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
past_key_values (Tuple[Tuple[torch.Tensor]]
, optional, returned when use_cache=True
is passed or when config.use_cache=True
) — Tuple of length config.n_layers
, containing tuples of tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)
). Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see past_key_values
input) to speed up sequential decoding.
hidden_states (tuple(torch.FloatTensor)
, optional, returned when output_hidden_states=True
is passed or when config.output_hidden_states=True
) — Tuple of torch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size)
. Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor)
, optional, returned when output_attentions=True
is passed or when config.output_attentions=True
) — Tuple of torch.FloatTensor
(one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length)
. GPT2Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
Examples:
Copied
vocab_size (int
, optional, defaults to 100) — Vocabulary size of the TrajectoryTransformer model. Defines the number of different tokens that can be represented by the trajectories
passed when calling
This is the configuration class to store the configuration of a . It is used to instantiate an TrajectoryTransformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the TrajectoryTransformer architecture.
Configuration objects inherit from and can be used to control the model outputs. Read the documentation from for more information.
config () — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the method to load the model weights.
The bare TrajectoryTransformer Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
return_dict (bool
, optional) — Whether or not to return a instead of a plain tuple.
A transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or a tuple of torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various elements depending on the configuration () and inputs.
The forward method, overrides the __call__
special method.