AutoModelForCausalLM
AutoModelForCausalLM
class transformers.AutoModelForCausalLM
( *args**kwargs )
This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the from_pretrained() class method or the from_config() class method.
This class cannot be instantiated directly using __init__()
(throws an error).
from_config
( **kwargs )
Parameters
config (PretrainedConfig) — The model class to instantiate is selected based on the configuration class:
BartConfig configuration class: BartForCausalLM (BART model)
BertConfig configuration class: BertLMHeadModel (BERT model)
BertGenerationConfig configuration class: BertGenerationDecoder (Bert Generation model)
BigBirdConfig configuration class: BigBirdForCausalLM (BigBird model)
BigBirdPegasusConfig configuration class: BigBirdPegasusForCausalLM (BigBird-Pegasus model)
BioGptConfig configuration class: BioGptForCausalLM (BioGpt model)
BlenderbotConfig configuration class: BlenderbotForCausalLM (Blenderbot model)
BlenderbotSmallConfig configuration class: BlenderbotSmallForCausalLM (BlenderbotSmall model)
BloomConfig configuration class: BloomForCausalLM (BLOOM model)
CTRLConfig configuration class: CTRLLMHeadModel (CTRL model)
CamembertConfig configuration class: CamembertForCausalLM (CamemBERT model)
CodeGenConfig configuration class: CodeGenForCausalLM (CodeGen model)
CpmAntConfig configuration class: CpmAntForCausalLM (CPM-Ant model)
Data2VecTextConfig configuration class: Data2VecTextForCausalLM (Data2VecText model)
ElectraConfig configuration class: ElectraForCausalLM (ELECTRA model)
ErnieConfig configuration class: ErnieForCausalLM (ERNIE model)
FalconConfig configuration class: FalconForCausalLM (Falcon model)
GPT2Config configuration class: GPT2LMHeadModel (OpenAI GPT-2 model)
GPTBigCodeConfig configuration class: GPTBigCodeForCausalLM (GPTBigCode model)
GPTJConfig configuration class: GPTJForCausalLM (GPT-J model)
GPTNeoConfig configuration class: GPTNeoForCausalLM (GPT Neo model)
GPTNeoXConfig configuration class: GPTNeoXForCausalLM (GPT NeoX model)
GPTNeoXJapaneseConfig configuration class: GPTNeoXJapaneseForCausalLM (GPT NeoX Japanese model)
GitConfig configuration class: GitForCausalLM (GIT model)
LlamaConfig configuration class: LlamaForCausalLM (LLaMA model)
MBartConfig configuration class: MBartForCausalLM (mBART model)
MarianConfig configuration class: MarianForCausalLM (Marian model)
MegaConfig configuration class: MegaForCausalLM (MEGA model)
MegatronBertConfig configuration class: MegatronBertForCausalLM (Megatron-BERT model)
MistralConfig configuration class: MistralForCausalLM (Mistral model)
MptConfig configuration class: MptForCausalLM (MPT model)
MusicgenConfig configuration class: MusicgenForCausalLM (MusicGen model)
MvpConfig configuration class: MvpForCausalLM (MVP model)
OPTConfig configuration class: OPTForCausalLM (OPT model)
OpenAIGPTConfig configuration class: OpenAIGPTLMHeadModel (OpenAI GPT model)
OpenLlamaConfig configuration class: OpenLlamaForCausalLM (OpenLlama model)
PLBartConfig configuration class: PLBartForCausalLM (PLBart model)
PegasusConfig configuration class: PegasusForCausalLM (Pegasus model)
PersimmonConfig configuration class: PersimmonForCausalLM (Persimmon model)
ProphetNetConfig configuration class: ProphetNetForCausalLM (ProphetNet model)
QDQBertConfig configuration class: QDQBertLMHeadModel (QDQBert model)
ReformerConfig configuration class: ReformerModelWithLMHead (Reformer model)
RemBertConfig configuration class: RemBertForCausalLM (RemBERT model)
RoCBertConfig configuration class: RoCBertForCausalLM (RoCBert model)
RoFormerConfig configuration class: RoFormerForCausalLM (RoFormer model)
RobertaConfig configuration class: RobertaForCausalLM (RoBERTa model)
RobertaPreLayerNormConfig configuration class: RobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
RwkvConfig configuration class: RwkvForCausalLM (RWKV model)
Speech2Text2Config configuration class: Speech2Text2ForCausalLM (Speech2Text2 model)
TrOCRConfig configuration class: TrOCRForCausalLM (TrOCR model)
TransfoXLConfig configuration class: TransfoXLLMHeadModel (Transformer-XL model)
XGLMConfig configuration class: XGLMForCausalLM (XGLM model)
XLMConfig configuration class: XLMWithLMHeadModel (XLM model)
XLMProphetNetConfig configuration class: XLMProphetNetForCausalLM (XLM-ProphetNet model)
XLMRobertaConfig configuration class: XLMRobertaForCausalLM (XLM-RoBERTa model)
XLMRobertaXLConfig configuration class: XLMRobertaXLForCausalLM (XLM-RoBERTa-XL model)
XLNetConfig configuration class: XLNetLMHeadModel (XLNet model)
XmodConfig configuration class: XmodForCausalLM (X-MOD model)
Instantiates one of the model classes of the library (with a causal language modeling head) from a configuration.
Note: Loading a model from its configuration file does not load the model weights. It only affects the model’s configuration. Use from_pretrained() to load the model weights.
Examples:
Copied
from_pretrained
( *model_args**kwargs )
Parameters
pretrained_model_name_or_path (
str
oros.PathLike
) — Can be either:A string, the model id of a pretrained model hosted inside a model repo on huggingface.co. Valid model ids can be located at the root-level, like
bert-base-uncased
, or namespaced under a user or organization name, likedbmdz/bert-base-german-cased
.A path to a directory containing model weights saved using save_pretrained(), e.g.,
./my_model_directory/
.A path or url to a tensorflow index checkpoint file (e.g,
./tf_model/model.ckpt.index
). In this case,from_tf
should be set toTrue
and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
model_args (additional positional arguments, optional) — Will be passed along to the underlying model
__init__()
method.config (PretrainedConfig, optional) — Configuration for the model to use instead of an automatically loaded configuration. Configuration can be automatically loaded when:
The model is a model provided by the library (loaded with the model id string of a pretrained model).
The model was saved using save_pretrained() and is reloaded by supplying the save directory.
The model is loaded by supplying a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
state_dict (Dict[str, torch.Tensor], optional) — A state dictionary to use instead of a state dictionary loaded from saved weights file.
This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using save_pretrained() and from_pretrained() is not a simpler option.
cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model configuration should be cached if the standard cache should not be used.from_tf (
bool
, optional, defaults toFalse
) — Load the model weights from a TensorFlow checkpoint save file (see docstring ofpretrained_model_name_or_path
argument).force_download (
bool
, optional, defaults toFalse
) — Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist.resume_download (
bool
, optional, defaults toFalse
) — Whether or not to delete incompletely received files. Will attempt to resume the download if such a file exists.proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}
. The proxies are used on each request.output_loading_info(
bool
, optional, defaults toFalse
) — Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.local_files_only(
bool
, optional, defaults toFalse
) — Whether or not to only look at local files (e.g., not try downloading the model).revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.trust_remote_code (
bool
, optional, defaults toFalse
) — Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set toTrue
for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.code_revision (
str
, optional, defaults to"main"
) — The specific revision to use for the code on the Hub, if the code leaves in a different repository than the rest of the model. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.kwargs (additional keyword arguments, optional) — Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
output_attentions=True
). Behaves differently depending on whether aconfig
is provided or automatically loaded:If a configuration is provided with
config
,**kwargs
will be directly passed to the underlying model’s__init__
method (we assume all relevant updates to the configuration have already been done)If a configuration is not provided,
kwargs
will be first passed to the configuration class initialization function (from_pretrained()). Each key ofkwargs
that corresponds to a configuration attribute will be used to override said attribute with the suppliedkwargs
value. Remaining keys that do not correspond to any configuration attribute will be passed to the underlying model’s__init__
function.
Instantiate one of the model classes of the library (with a causal language modeling head) from a pretrained model.
The model class to instantiate is selected based on the model_type
property of the config object (either passed as an argument or loaded from pretrained_model_name_or_path
if possible), or when it’s missing, by falling back to using pattern matching on pretrained_model_name_or_path
:
bart — BartForCausalLM (BART model)
bert — BertLMHeadModel (BERT model)
bert-generation — BertGenerationDecoder (Bert Generation model)
big_bird — BigBirdForCausalLM (BigBird model)
bigbird_pegasus — BigBirdPegasusForCausalLM (BigBird-Pegasus model)
biogpt — BioGptForCausalLM (BioGpt model)
blenderbot — BlenderbotForCausalLM (Blenderbot model)
blenderbot-small — BlenderbotSmallForCausalLM (BlenderbotSmall model)
bloom — BloomForCausalLM (BLOOM model)
camembert — CamembertForCausalLM (CamemBERT model)
code_llama — LlamaForCausalLM (CodeLlama model)
codegen — CodeGenForCausalLM (CodeGen model)
cpmant — CpmAntForCausalLM (CPM-Ant model)
ctrl — CTRLLMHeadModel (CTRL model)
data2vec-text — Data2VecTextForCausalLM (Data2VecText model)
electra — ElectraForCausalLM (ELECTRA model)
ernie — ErnieForCausalLM (ERNIE model)
falcon — FalconForCausalLM (Falcon model)
git — GitForCausalLM (GIT model)
gpt-sw3 — GPT2LMHeadModel (GPT-Sw3 model)
gpt2 — GPT2LMHeadModel (OpenAI GPT-2 model)
gpt_bigcode — GPTBigCodeForCausalLM (GPTBigCode model)
gpt_neo — GPTNeoForCausalLM (GPT Neo model)
gpt_neox — GPTNeoXForCausalLM (GPT NeoX model)
gpt_neox_japanese — GPTNeoXJapaneseForCausalLM (GPT NeoX Japanese model)
gptj — GPTJForCausalLM (GPT-J model)
llama — LlamaForCausalLM (LLaMA model)
marian — MarianForCausalLM (Marian model)
mbart — MBartForCausalLM (mBART model)
mega — MegaForCausalLM (MEGA model)
megatron-bert — MegatronBertForCausalLM (Megatron-BERT model)
mistral — MistralForCausalLM (Mistral model)
mpt — MptForCausalLM (MPT model)
musicgen — MusicgenForCausalLM (MusicGen model)
mvp — MvpForCausalLM (MVP model)
open-llama — OpenLlamaForCausalLM (OpenLlama model)
openai-gpt — OpenAIGPTLMHeadModel (OpenAI GPT model)
opt — OPTForCausalLM (OPT model)
pegasus — PegasusForCausalLM (Pegasus model)
persimmon — PersimmonForCausalLM (Persimmon model)
plbart — PLBartForCausalLM (PLBart model)
prophetnet — ProphetNetForCausalLM (ProphetNet model)
qdqbert — QDQBertLMHeadModel (QDQBert model)
reformer — ReformerModelWithLMHead (Reformer model)
rembert — RemBertForCausalLM (RemBERT model)
roberta — RobertaForCausalLM (RoBERTa model)
roberta-prelayernorm — RobertaPreLayerNormForCausalLM (RoBERTa-PreLayerNorm model)
roc_bert — RoCBertForCausalLM (RoCBert model)
roformer — RoFormerForCausalLM (RoFormer model)
rwkv — RwkvForCausalLM (RWKV model)
speech_to_text_2 — Speech2Text2ForCausalLM (Speech2Text2 model)
transfo-xl — TransfoXLLMHeadModel (Transformer-XL model)
trocr — TrOCRForCausalLM (TrOCR model)
xglm — XGLMForCausalLM (XGLM model)
xlm — XLMWithLMHeadModel (XLM model)
xlm-prophetnet — XLMProphetNetForCausalLM (XLM-ProphetNet model)
xlm-roberta — XLMRobertaForCausalLM (XLM-RoBERTa model)
xlm-roberta-xl — XLMRobertaXLForCausalLM (XLM-RoBERTa-XL model)
xlnet — XLNetLMHeadModel (XLNet model)
xmod — XmodForCausalLM (X-MOD model)
The model is set in evaluation mode by default using model.eval()
(so for instance, dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train()
Examples:
Copied
Last updated