Neuron model cache
Neuron Model Cache
The Neuron Model Cache is a remote cache for compiled Neuron models in the neff
format. It is integrated into the NeuronTrainer class to enable loading pretrained models from the cache instead of compiling them locally. This can speed up the training process by about –3x.
The Neuron Model Cache is hosted on the BOINC AI Hub and includes compiled files for all popular and supported pre-trained models optimum-neuron
.
When training a Transformers or Diffusion model with vanilla torch-neuronx
, the models needs to be first compiled. The compiled version is stored in a local directory, usually /var/tmp/neuron-compile-cache
. This means that every time you train a new model in a new environment, you need to recompile it, which takes a lot of time.
We created the Neuron Model Cache to solve this limitation by providing a public cache of precompiled available models and a private cache to create your private, secured, remote model cache.
The Neuron Model Cache plugs into the local cache directory of the BOINC AI Hub. During training, the NeuronTrainer will check if compilation files are available on the Hub and download them if they are found, allowing you to save both time and cost by skipping the compilation phase.
How the caching system works
Hash computation
Many factors can trigger compilation among which:
The model weights
The input shapes
The precision of the model, full-precision or bf16
The version of the Neuron X compiler
The number of Neuron cores used
These parameters are used to compute a hash. This hash is then used to compare local hashes for our training session against hashes stored on the BOINC AI Hub, and act accordingly (download or push).
How to use the Neuron model cache
The Public model cache will be used when your training script uses the NeuronTrainer. There are no additional changes needed.
How to use a private Neuron model cache
The repository for the public cache is aws-neuron/optimum-neuron-cache
. This repository includes all precompiled files for commonly used models so that it is publicly available and free to use for everyone. But there are two limitations:
You will not be able to push your own compiled files on this repo
It is public and you might want to use a private repo for private models
To alleviate that you can create your own private cache repository using the optimum-cli
or set the environment variable CUSTOM_CACHE_REPO
.
Using the Optimum CLI
The Optimum CLI offers 2 subcommands for cache creation and setting:
create
: To create a new cache repository that you can use as a private Neuron Model cache.set
: To set the name of the Nueron cache repository locally, the repository needs to exists and will be used by default byoptimum-neuron
.
Create a new Neuron cache repository:
Copied
The -n
/ --name
option allows you to specify a name for the Neuron cache repo, if not set the default name will be used. The --public
flag allows you to make your Neuron cache public as it will be created as a private repository by default.
Example:
Copied
Set a different Trainiun cache repository:
Copied
Example:
Copied
The optimum-cli neuron cache set
command is useful when working on a new instance to use your own cache.
Using the environment variable CUSTOM_CACHE_REPO
Using the CLI is not always feasible, and not very practical for small testing. In this case, you can simply set the environment variable CUSTOM_CACHE_REPO
.
For example, if you cache repo is called michaelbenayoun/my_custom_cache_repo
, you just need to do:
Copied
or:
Copied
You have to be logged into the BOINC AI Hub to be able to push and pull files from your private cache repository.
Cache system flow
Cache system flow
At each the beginning of each training step, the NeuronTrainer computes a NeuronHash
and checks the cache repo(s) (official and custom) on the BOINC AI Hub to see if there are compiled files associated to this hash. If that is the case, the files are downloaded directly to the local cache directory and no compilation is needed. Otherwise compilation is performed.
Just as for downloading compiled files, the NeuronTrainer will keep track of the newly created compilation files at each training step, and upload them to the BOINC AI Hub at save time or when training ends. This assumes that you have writing access to the cache repo, otherwise nothing will be pushed.
Optimum CLI
The Optimum CLI can be used to perform various cache-related tasks, as described by the optimum-cli neuron cache
command usage message:
Copied
Add a model to the cache
It is possible to add a model compilation files to a cache repo via the optimum-cli neuron cache add
command:
Copied
When running this command a small training session will be run and the resulting compilation files will be pushed.
Make sure that the Neuron cache repo to use is set up locally, this can be done by running the `optimum-cli neuron cache set` command. You also need to make sure that you are logged in to the BOINC AI Hub and that you have the writing rights for the specified cache repo, this can be done via the `boincai-cli login` command.
If at least one of those requirements is not met, the command will fail.
Example:
Copied
This will push compilation files for the prajjwal1/bert-tiny
model on the Neuron cache repo that was set up for the specified parameters.
List a cache repo
It can also be convenient to request the cache repo to know which compilation files are available. This can be done via the optimum-cli neuron cache list
command:
Copied
As you can see, it is possible to:
List all the models available for all compiler versions.
List all the models available for a given compiler version by specifying the
-v / --version
argument.List all compilation files for a given model, there can be many for different input shapes and so on, by specifying the
-m / --model
argument.
Example:
Copied
Last updated