Efficient training on CPU
This guide focuses on training large models efficiently on CPU.
Mixed precision with IPEX
IPEX is optimized for CPUs with AVX-512 or above, and functionally works for CPUs with only AVX2. So, it is expected to bring performance benefit for Intel CPU generations with AVX-512 or above while CPUs with only AVX2 (e.g., AMD CPUs or older Intel CPUs) might result in a better performance under IPEX, but not guaranteed. IPEX provides performance optimizations for CPU training with both Float32 and BFloat16. The usage of BFloat16 is the main focus of the following sections.
Low precision data type BFloat16 has been natively supported on the 3rd Generation Xeon® Scalable Processors (aka Cooper Lake) with AVX512 instruction set and will be supported on the next generation of Intel® Xeon® Scalable Processors with Intel® Advanced Matrix Extensions (Intel® AMX) instruction set with further boosted performance. The Auto Mixed Precision for CPU backend has been enabled since PyTorch-1.10. At the same time, the support of Auto Mixed Precision with BFloat16 for CPU and BFloat16 optimization of operators has been massively enabled in Intel® Extension for PyTorch, and partially upstreamed to PyTorch master branch. Users can get better performance and user experience with IPEX Auto Mixed Precision.
Check more detailed information for Auto Mixed Precision.
IPEX installation:
IPEX release is following PyTorch, to install via pip:
PyTorch Version | IPEX version |
---|---|
1.13 | 1.13.0+cpu |
1.12 | 1.12.300+cpu |
1.11 | 1.11.200+cpu |
1.10 | 1.10.100+cpu |
Copied
Check more approaches for IPEX installation.
Usage in Trainer
To enable auto mixed precision with IPEX in Trainer, users should add use_ipex
, bf16
and no_cuda
in training command arguments.
Take an example of the use cases on Transformers question-answering
Training with IPEX using BF16 auto mixed precision on CPU:
Practice example
Blog: Accelerating PyTorch Transformers with Intel Sapphire Rapids
Last updated