Troubleshoot
Sometimes errors occur, but we are here to help! This guide covers some of the most common issues weβve seen and how you can resolve them. However, this guide isnβt meant to be a comprehensive collection of every π Transformers issue. For more help with troubleshooting your issue.
Asking for help on the forums. There are specific categories you can post your question to, like Beginners or π Transformers. Make sure you write a good descriptive forum post with some reproducible code to maximize the likelihood that your problem is solved!
Create an Issue on the π Transformers repository if it is a bug related to the library. Try to include as much information describing the bug as possible to help us better figure out whatβs wrong and how we can fix it.
Check the Migration guide if you use an older version of π Transformers since some important changes have been introduced between versions.
For more details about troubleshooting and getting help, take a look at Chapter 8 of the BOINC AI course.
Firewalled environments
Some GPU instances on cloud and intranet setups are firewalled to external connections, resulting in a connection error. When your script attempts to download model weights or datasets, the download will hang and then timeout with the following message:
Copied
In this case, you should try to run π Transformers on offline mode to avoid the connection error.
CUDA out of memory
Training large models with millions of parameters can be challenging without the appropriate hardware. A common error you may encounter when the GPU runs out of memory is:
Copied
Here are some potential solutions you can try to lessen memory use:
Reduce the
per_device_train_batch_size
value in TrainingArguments.Try using
gradient_accumulation_steps
in TrainingArguments to effectively increase overall batch size.
Refer to the Performance guide for more details about memory-saving techniques.
Unable to load a saved TensorFlow model
TensorFlowβs model.save method will save the entire model - architecture, weights, training configuration - in a single file. However, when you load the model file again, you may run into an error because π Transformers may not load all the TensorFlow-related objects in the model file. To avoid issues with saving and loading TensorFlow models, we recommend you:
Save the model weights as a
h5
file extension withmodel.save_weights
and then reload the model with from_pretrained():
Copied
Save the model with
~TFPretrainedModel.save_pretrained
and load it again with from_pretrained():
Copied
ImportError
Another common error you may encounter, especially if it is a newly released model, is ImportError
:
Copied
For these error types, check to make sure you have the latest version of π Transformers installed to access the most recent models:
Copied
CUDA error: device-side assert triggered
Sometimes you may run into a generic CUDA error about an error in the device code.
Copied
You should try to run the code on a CPU first to get a more descriptive error message. Add the following environment variable to the beginning of your code to switch to a CPU:
Copied
Another option is to get a better traceback from the GPU. Add the following environment variable to the beginning of your code to get the traceback to point to the source of the error:
Copied
Incorrect output when padding tokens arenβt masked
In some cases, the output hidden_state
may be incorrect if the input_ids
include padding tokens. To demonstrate, load a model and tokenizer. You can access a modelβs pad_token_id
to see its value. The pad_token_id
may be None
for some models, but you can always manually set it.
Copied
The following example shows the output without masking the padding tokens:
Copied
Here is the actual output of the second sequence:
Copied
Most of the time, you should provide an attention_mask
to your model to ignore the padding tokens to avoid this silent error. Now the output of the second sequence matches its actual output:
By default, the tokenizer creates an attention_mask
for you based on your specific tokenizerβs defaults.
Copied
π Transformers doesnβt automatically create an attention_mask
to mask a padding token if it is provided because:
Some models donβt have a padding token.
For some use-cases, users want a model to attend to a padding token.
ValueError: Unrecognized configuration class XYZ for this kind of AutoModel
Generally, we recommend using the AutoModel class to load pretrained instances of models. This class can automatically infer and load the correct architecture from a given checkpoint based on the configuration. If you see this ValueError
when loading a model from a checkpoint, this means the Auto class couldnβt find a mapping from the configuration in the given checkpoint to the kind of model you are trying to load. Most commonly, this happens when a checkpoint doesnβt support a given task. For instance, youβll see this error in the following example because there is no GPT2 for question answering:
Copied
Last updated