Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  1. HOW-TO GUIDES

Knowing how big of a model you can fit into memory

Understanding how big of a model can fit on your machine

One very difficult aspect when exploring potential models to use on your machine is knowing just how big of a model will fit into memory with your current graphics card (such as loading the model onto CUDA).

To help alleviate this, ๐ŸŒ Accelerate has a CLI interface through accelerate estimate-memory. This tutorial will help walk you through using it, what to expect, and at the end link to the interactive demo hosted on the ๐ŸŒ Hub which will even let you post those results directly on the model repo!

Currently we support searching for models that can be used in timm and transformers.

This API will load the model into memory on the meta device, so we are not actually downloading and loading the full weights of the model into memory, nor do we need to. As a result itโ€™s perfectly fine to measure 8 billion parameter models (or more), without having to worry about if your CPU can handle it!

PreviousHow to perform inference on large models with small resourcesNextHow to quantize model

Last updated 1 year ago

๐ŸŒ