🌍CONCEPTUAL GUIDES

What are Habana’s first-generation Gaudi, Gaudi2 and HPUs?

Gaudiarrow-up-right and Gaudi2arrow-up-right are the first- and second-generation AI hardware accelerators designed by Habana Labs. A single server contains 8 devices called Habana Processing Units (HPUs) with 96GB of memory each on Gaudi2 and 32GB on first-gen Gaudi. Check out herearrow-up-right for more information about the underlying hardware architecture.

The Habana SDK is called SynapseAIarrow-up-right and is common to both first-gen Gaudi and Gaudi2. As a consequence, 🌍 Optimum Habana is fully compatible with both generations of accelerators.

Execution modes

Two execution modes are supported on HPUs for PyTorch, which is the main deep learning framework the 🌍 Transformers and 🌍 Diffusers libraries rely on:

  • Eager mode execution, where the framework executes one operation at a time as defined in standard PyTorch eager modearrow-up-right.

  • Lazy mode execution, where operations are internally accumulated in a graph. The execution of the operations in the accumulated graph is triggered in a lazy manner, only when a tensor value is required by the user or when it is explicitly required in the script. The SynapseAI graph compilerarrow-up-right will optimize the execution of the operations accumulated in the graph (e.g. operator fusion, data layout management, parallelization, pipelining and memory management, graph-level optimizations).

See herearrow-up-right how to use these execution modes in 🌍 Optimum Habana.

Distributed training

First-gen Gaudi and Gaudi2 are well-equipped for distributed training:

  • Scale-up to 8 devices on one server. See herearrow-up-right how to perform distributed training on a single node.

  • Scale-out to 1000s of devices on several servers. See herearrow-up-right how to do multi-node training.

Inference

HPUs can also be used to perform inference:

  • Through HPU graphs that are well-suited for latency-sensitive applications. Check out herearrow-up-right how to use them.

  • In lazy mode, which can be used the same way as for training.

Last updated