Transformers
Ctrlk
  • ๐ŸŒGET STARTED
  • ๐ŸŒTUTORIALS
  • ๐ŸŒTASK GUIDES
  • ๐ŸŒDEVELOPER GUIDES
  • ๐ŸŒPERFORMANCE AND SCALABILITY
    • Overview
    • ๐ŸŒEFFICIENT TRAINING TECHNIQUES
    • ๐ŸŒOPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • ๐ŸŒCONTRIBUTE
  • ๐ŸŒCONCEPTUAL GUIDES
  • ๐ŸŒAPI
  • ๐ŸŒINTERNAL HELPERS
Powered by GitBook
On this page
  1. ๐ŸŒPERFORMANCE AND SCALABILITY

๐ŸŒOPTIMIZING INFERENCE

Inference on CPUInference on one GPUInference on many GPUsInference on Specialized Hardware
PreviousHyperparameter Search using Trainer APINextInference on CPU