> For the complete documentation index, see [llms.txt](https://boinc-ai.gitbook.io/optimum/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://boinc-ai.gitbook.io/optimum/habana/how-to-guides/accelerating-inference.md).

# Accelerating Inference

## Accelerating Inference

Gaudi offers several possibilities to make inference faster.

### Lazy Mode

Two execution modes are proposed:

* *Lazy mode*, where operations are accumulated in a graph whose execution is triggered in a lazy manner. This allows the graph compiler to optimize the device execution for these operations.
* *Eager mode*, where one operation at a time is executed.

In lazy mode, the graph compiler generates optimized binary code that implements the given model topology on Gaudi. It performs operator fusion, data layout management, parallelization, pipelining and memory management, as well as graph-level optimizations.

To execute inference in lazy mode, you must provide the following arguments:

Copied

```
args = GaudiTrainingArguments(
    # same arguments as in Transformers,
    use_habana=True,
    use_lazy_mode=True,
)
```

In lazy mode, the last batch may trigger an extra compilation because it could be smaller than previous batches. To avoid this, you can discard the last batch with `dataloader_drop_last=True`.

### HPU Graphs

Gaudi provides a way to run fast inference with HPU Graphs. It consists in capturing a series of operations (i.e. graphs) in an HPU stream and then replaying them in an optimized way (more information [here](https://docs.habana.ai/en/latest/PyTorch/Inference_on_Gaudi/Inference_using_HPU_Graphs/Inference_using_HPU_Graphs.html)). Thus, you can apply this to the `forward` method of your model to run it efficiently at inference.

HPU Graphs are integrated into the [`GaudiTrainer`](https://huggingface.co/docs/optimum/habana/package_reference/trainer) and the [`GaudiStableDiffusionPipeline`](https://huggingface.co/docs/optimum/habana/package_reference/stable_diffusion_pipeline) so that one can use them very easily:

* `GaudiTrainer` needs the training argument `use_hpu_graphs_for_inference` to be set to `True` as follows:

Copied

```
from optimum.habana import GaudiTrainer, GaudiTrainingArguments

# define the training arguments
training_args = GaudiTrainingArguments(
    use_habana=True,
    use_lazy_mode=True,
    use_hpu_graphs_for_inference=True,
    gaudi_config_name=gaudi_config_name,
    ...
)

# Initialize our Trainer
trainer = GaudiTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
    ... # other arguments
)
```

* `GaudiStableDiffusionPipeline` needs its argument `use_hpu_graphs` to be set to `True` such as:

Copied

```
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline

model_name = "runwayml/stable-diffusion-v1-5"

scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")

pipeline = GaudiStableDiffusionPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion",
)

outputs = generator(
    ["An image of a squirrel in Picasso style"],
    num_images_per_prompt=16,
    batch_size=4,
)
```

With HPU Graphs and in lazy mode, the *first couple of iterations* may be slower due to graph compilations.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://boinc-ai.gitbook.io/optimum/habana/how-to-guides/accelerating-inference.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
