Knowing how big of a model you can fit into memory

Understanding how big of a model can fit on your machine

One very difficult aspect when exploring potential models to use on your machine is knowing just how big of a model will fit into memory with your current graphics card (such as loading the model onto CUDA).

To help alleviate this, 🌍 Accelerate has a CLI interface through accelerate estimate-memory. This tutorial will help walk you through using it, what to expect, and at the end link to the interactive demo hosted on the 🌍 Hub which will even let you post those results directly on the model repo!

Currently we support searching for models that can be used in timm and transformers.

This API will load the model into memory on the meta device, so we are not actually downloading and loading the full weights of the model into memory, nor do we need to. As a result it’s perfectly fine to measure 8 billion parameter models (or more), without having to worry about if your CPU can handle it!

PreviousHow to perform inference on large models with small resources NextHow to quantize model

Last updated 1 year ago