Optimum
CtrlK
  • ๐ŸŒOVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • ๐ŸŒCONCEPTUAL GUIDES
      • Quantization
  • ๐ŸŒHABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • ๐ŸŒTUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • ๐ŸŒHOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • ๐ŸŒCONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • ๐ŸŒREFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • ๐ŸŒINTEL
    • BOINC AI Optimum Intel
    • Installation
    • ๐ŸŒNEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • ๐ŸŒOPENVINO
      • Models for inference
      • Optimization
      • Reference
  • ๐ŸŒAWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • ๐ŸŒFURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • ๐ŸŒHOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • ๐ŸŒREFERENCE
      • Models
      • Configuration
      • Quantization
  • ๐ŸŒONNX RUNTIME
    • Overview
    • Quick tour
    • ๐ŸŒHOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • ๐ŸŒCONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • ๐ŸŒREFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • ๐ŸŒEXPORTERS
    • Overview
    • The TasksManager
    • ๐ŸŒONNX
      • Overview
      • ๐ŸŒHOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • ๐ŸŒREFERENCE
        • ONNX configurations
        • Export functions
    • ๐ŸŒTFLITE
      • Overview
      • ๐ŸŒHOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • ๐ŸŒREFERENCE
        • TFLite configurations
        • Export functions
  • ๐ŸŒTORCH FX
    • Overview
    • ๐ŸŒHOW-TO GUIDES
      • Optimization
    • ๐ŸŒCONCEPTUAL GUIDES
      • Symbolic tracer
    • ๐ŸŒREFERENCE
      • Optimization
  • ๐ŸŒBETTERTRANSFORMER
    • Overview
    • ๐ŸŒTUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • ๐ŸŒLLM QUANTIZATION
    • GPTQ quantization
  • ๐ŸŒUTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page

๐ŸŒLLM QUANTIZATION

GPTQ quantization
PreviousHow to add support for new architectures?NextGPTQ quantization