Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  • Run Inference
  • With GaudiTrainer
  • In our Examples
  1. HABANA
  2. TUTORIALS

Run Inference

PreviousDistributed TrainingNextStable Diffusion

Last updated 1 year ago

Run Inference

This section shows how to run inference-only workloads on Gaudi. For more advanced information about how to speed up inference, check out .

With GaudiTrainer

You can find below a template to perform inference with a GaudiTrainer instance where we want to compute the accuracy over the given dataset:

Copied

import evaluate

metric = evaluate.load("accuracy")

# You can define your custom compute_metrics function. It takes an `EvalPrediction` object (a namedtuple with a
# predictions and label_ids field) and has to return a dictionary string to float.
def my_compute_metrics(p):
    return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)

# Trainer initialization
trainer = GaudiTrainer(
        model=my_model,
        gaudi_config=my_gaudi_config,
        args=my_args,
        train_dataset=None,
        eval_dataset=eval_dataset,
        compute_metrics=my_compute_metrics,
        tokenizer=my_tokenizer,
        data_collator=my_data_collator,
    )

# Run inference
metrics = trainer.evaluate()

In our Examples

Copied

python path_to_the_example_script \
  --model_name_or_path my_model_name \
  --gaudi_config_name my_gaudi_config_name \
  --dataset_name my_dataset_name \
  --do_eval \
  --per_device_eval_batch_size my_batch_size \
  --output_dir path_to_my_output_dir \
  --use_habana \
  --use_lazy_mode \
  --use_hpu_graphs_for_inference

The variable my_args should contain some inference-specific arguments, you can take a look to see the arguments that can be interesting to set for inference.

All contain instructions for running inference with a given model on a given dataset. The reasoning is the same for every example: run the example script with --do_eval and --per_device_eval_batch_size and without --do_train. A simple template is the following:

🌍
🌍
this guide
here
our examples