Google Cloud Trainium & Inferentia
  • 🌍Optimum Neuron
  • 🌍Installation
  • 🌍Quickstart
  • 🌍TUTORIALS
    • Overview
    • Fine-tune BERT for Text Classification on AWS Trainium
  • 🌍HOW-TO GUIDES
    • Overview
    • Set up AWS Trainium instance
    • Neuron model cache
    • Fine-tune Transformers with AWS Trainium
    • Export a model to Inferentia
    • Neuron models for inference
    • Inference pipelines with AWS Neuron
  • 🌍REFERENCE
    • Neuron Trainer
    • Neuron Exporter
    • Neuron Models
Powered by GitBook
On this page
  • Inferentia Exporter
  • Export functions
  • Configuration classes for Neuron exports
  • Supported architectures
  1. REFERENCE

Neuron Exporter

PreviousNeuron TrainerNextNeuron Models

Last updated 1 year ago

Inferentia Exporter

You can export a PyTorch model to Neuron with 🌍 Optimum to run inference on AWS and .

Export functions

There is an export function for each generation of the Inferentia accelerator, export_neuron for INF1 and export_neuronx on INF2, but you will be able to use directly the export function export, which will select the proper exporting function according to the environment.

Besides, you can check if the exported model is valid via validate_model_outputs, which compares the compiled model’s output on Neuron devices to the PyTorch model’s output on CPU.

Configuration classes for Neuron exports

Exporting a PyTorch model to a Neuron compiled model involves specifying:

  1. The input names.

  2. The output names.

  3. The dummy inputs used to trace the model. This is needed by the Neuron Compiler to record the computational graph and convert it to a TorchScript module.

  4. The compilation arguments used to control the trade-off between hardware efficiency (latency, throughput) and accuracy.

Depending on the choice of model and task, we represent the data above with configuration classes. Each configuration class is associated with a specific model architecture, and follows the naming convention ArchitectureNameNeuronConfig. For instance, the configuration which specifies the Neuron export of BERT models is BertNeuronConfig.

Since many architectures share similar properties for their Neuron configuration, 🌍 Optimum adopts a 3-level class hierarchy:

  1. Abstract and generic base classes. These handle all the fundamental features, while being agnostic to the modality (text, image, audio, etc).

  2. Middle-end classes. These are aware of the modality, but multiple can exist for the same modality depending on the inputs they support. They specify which input generators should be used for the dummy inputs, but remain model-agnostic.

  3. Model-specific classes like the BertNeuronConfig mentioned above. These are the ones actually used to export models.

Supported architectures

Architecture
Task

ALBERT

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

BERT

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

CamemBERT

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

ConvBERT

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

DeBERTa (INF2 only)

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

DeBERTa-v2 (INF2 only)

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

DistilBERT

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

ELECTRA

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

FlauBERT

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

GPT2

text-generation

MobileBERT

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

MPNet

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

RoBERTa

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

RoFormer

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

XLM

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

XLM-RoBERTa

feature-extraction, fill-mask, multiple-choice, question-answering, text-classification, token-classification

Stable Diffusion

text-to-image, image-to-image, inpaint

Stable Diffusion XL Base

text-to-image, image-to-image, inpaint

Stable Diffusion XL Refiner

image-to-image, inpaint

More architectures coming soon, stay tuned! πŸš€

More details for checking supported tasks .

🌍
Inferntia 1
Inferentia 2
here