Knowing how big of a model you can fit into memory
Understanding how big of a model can fit on your machine
One very difficult aspect when exploring potential models to use on your machine is knowing just how big of a model will fit into memory with your current graphics card (such as loading the model onto CUDA).
To help alleviate this, ๐ Accelerate has a CLI interface through accelerate estimate-memory
. This tutorial will help walk you through using it, what to expect, and at the end link to the interactive demo hosted on the ๐ Hub which will even let you post those results directly on the model repo!
Currently we support searching for models that can be used in timm
and transformers
.
This API will load the model into memory on the meta
device, so we are not actually downloading and loading the full weights of the model into memory, nor do we need to. As a result itโs perfectly fine to measure 8 billion parameter models (or more), without having to worry about if your CPU can handle it!
Last updated