Accelerate
  • ๐ŸŒGETTING STARTED
    • BOINC AI Accelerate
    • Installation
    • Quicktour
  • ๐ŸŒTUTORIALS
    • Overview
    • Migrating to BOINC AI Accelerate
    • Launching distributed code
    • Launching distributed training from Jupyter Notebooks
  • ๐ŸŒHOW-TO GUIDES
    • Start Here!
    • Example Zoo
    • How to perform inference on large models with small resources
    • Knowing how big of a model you can fit into memory
    • How to quantize model
    • How to perform distributed inference with normal resources
    • Performing gradient accumulation
    • Accelerating training with local SGD
    • Saving and loading training states
    • Using experiment trackers
    • Debugging timeout errors
    • How to avoid CUDA Out-of-Memory
    • How to use Apple Silicon M1 GPUs
    • How to use DeepSpeed
    • How to use Fully Sharded Data Parallelism
    • How to use Megatron-LM
    • How to use BOINC AI Accelerate with SageMaker
    • How to use BOINC AI Accelerate with Intelยฎ Extension for PyTorch for cpu
  • ๐ŸŒCONCEPTS AND FUNDAMENTALS
    • BOINC AI Accelerate's internal mechanism
    • Loading big models into memory
    • Comparing performance across distributed setups
    • Executing and deferring jobs
    • Gradient synchronization
    • TPU best practices
  • ๐ŸŒREFERENCE
    • Main Accelerator class
    • Stateful configuration classes
    • The Command Line
    • Torch wrapper classes
    • Experiment trackers
    • Distributed launchers
    • DeepSpeed utilities
    • Logging
    • Working with large models
    • Kwargs handlers
    • Utility functions and classes
    • Megatron-LM Utilities
    • Fully Sharded Data Parallelism Utilities
Powered by GitBook
On this page
  • Accelerated PyTorch Training on Mac
  • How it works out of the box
  • A few caveats to be aware of
  1. HOW-TO GUIDES

How to use Apple Silicon M1 GPUs

PreviousHow to avoid CUDA Out-of-MemoryNextHow to use DeepSpeed

Last updated 1 year ago

Accelerated PyTorch Training on Mac

With PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac. Appleโ€™s Metal Performance Shaders (MPS) as a backend for PyTorch enables this and can be used via the new "mps" device. This will map computational graphs and primitives on the MPS Graph framework and tuned kernels provided by MPS. For more information please refer official documents and .

Benefits of Training and Inference using Apple Silicon Chips

  1. Enables users to train larger networks or batch sizes locally

  2. Reduces data retrieval latency and provides the GPU with direct access to the full memory store due to unified memory architecture. Therefore, improving end-to-end performance.

  3. Reduces costs associated with cloud-based development or the need for additional local GPUs.

Pre-requisites: To install torch with mps support, please follow this nice medium article .

How it works out of the box

It is enabled by default on MacOs machines with MPS enabled Apple Silicon GPUs. To disable it, pass --cpu flag to accelerate launch command or answer the corresponding question when answering the accelerate config questionnaire.

You can directly run the following script to test it out on MPS enabled Apple Silicon machines:

Copied

accelerate launch /examples/cv_example.py --data_dir images

A few caveats to be aware of

  1. We strongly recommend to install PyTorch >= 1.13 (nightly version at the time of writing) on your MacOS machine. It has major fixes related to model correctness and performance improvements for transformer based models. Please refer to for more details.

  2. Distributed setups gloo and nccl are not working with mps device. This means that currently only single GPU of mps device type can be used.

Finally, please, remember that, ๐ŸŒ Accelerate only integrates MPS backend, therefore if you have any problems or questions with regards to MPS backend usage, please, file an issue with .

๐ŸŒ
Introducing Accelerated PyTorch Training on Mac
MPS BACKEND
GPU-Acceleration Comes to PyTorch on M1 Macs
https://github.com/pytorch/pytorch/issues/82707
PyTorch GitHub