Optimum
  • 🌍OVERVIEW
    • Optimum
    • Installation
    • Quick tour
    • Notebooks
    • 🌍CONCEPTUAL GUIDES
      • Quantization
  • 🌍HABANA
    • BOINC AI Optimum Habana
    • Installation
    • Quickstart
    • 🌍TUTORIALS
      • Overview
      • Single-HPU Training
      • Distributed Training
      • Run Inference
      • Stable Diffusion
      • LDM3D
    • 🌍HOW-TO GUIDES
      • Overview
      • Pretraining Transformers
      • Accelerating Training
      • Accelerating Inference
      • How to use DeepSpeed
      • Multi-node Training
    • 🌍CONCEPTUAL GUIDES
      • What are Habana's Gaudi and HPUs?
    • 🌍REFERENCE
      • Gaudi Trainer
      • Gaudi Configuration
      • Gaudi Stable Diffusion Pipeline
      • Distributed Runner
  • 🌍INTEL
    • BOINC AI Optimum Intel
    • Installation
    • 🌍NEURAL COMPRESSOR
      • Optimization
      • Distributed Training
      • Reference
    • 🌍OPENVINO
      • Models for inference
      • Optimization
      • Reference
  • 🌍AWS TRAINIUM/INFERENTIA
    • BOINC AI Optimum Neuron
  • 🌍FURIOSA
    • BOINC AI Optimum Furiosa
    • Installation
    • 🌍HOW-TO GUIDES
      • Overview
      • Modeling
      • Quantization
    • 🌍REFERENCE
      • Models
      • Configuration
      • Quantization
  • 🌍ONNX RUNTIME
    • Overview
    • Quick tour
    • 🌍HOW-TO GUIDES
      • Inference pipelines
      • Models for inference
      • How to apply graph optimization
      • How to apply dynamic and static quantization
      • How to accelerate training
      • Accelerated inference on NVIDIA GPUs
    • 🌍CONCEPTUAL GUIDES
      • ONNX And ONNX Runtime
    • 🌍REFERENCE
      • ONNX Runtime Models
      • Configuration
      • Optimization
      • Quantization
      • Trainer
  • 🌍EXPORTERS
    • Overview
    • The TasksManager
    • 🌍ONNX
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to ONNX
        • Add support for exporting an architecture to ONNX
      • 🌍REFERENCE
        • ONNX configurations
        • Export functions
    • 🌍TFLITE
      • Overview
      • 🌍HOW-TO GUIDES
        • Export a model to TFLite
        • Add support for exporting an architecture to TFLite
      • 🌍REFERENCE
        • TFLite configurations
        • Export functions
  • 🌍TORCH FX
    • Overview
    • 🌍HOW-TO GUIDES
      • Optimization
    • 🌍CONCEPTUAL GUIDES
      • Symbolic tracer
    • 🌍REFERENCE
      • Optimization
  • 🌍BETTERTRANSFORMER
    • Overview
    • 🌍TUTORIALS
      • Convert Transformers models to use BetterTransformer
      • How to add support for new architectures?
  • 🌍LLM QUANTIZATION
    • GPTQ quantization
  • 🌍UTILITIES
    • Dummy input generators
    • Normalized configurations
Powered by GitBook
On this page
  1. HABANA
  2. HOW-TO GUIDES

Pretraining Transformers

Pretraining Transformers with Optimum Habana

Pretraining a model from Transformers, like BERT, is as easy as fine-tuning it. The model should be instantiated from a configuration with .from_config and not from a pretrained checkpoint with .from_pretrained. Here is how it should look with GPT2 for instance:

Copied

from transformers import AutoConfig, AutoModelForXXX

config = AutoConfig.from_pretrained("gpt2")
model = AutoModelForXXX.from_config(config)

with XXX the task to perform, such as ImageClassification for example.

The following is a working example where BERT is pretrained for masked language modeling:

Copied

from datasets import load_dataset
from optimum.habana import GaudiTrainer, GaudiTrainingArguments
from transformers import AutoConfig, AutoModelForMaskedLM, AutoTokenizer, DataCollatorForLanguageModeling

# Load the training set (this one has already been preprocessed)
training_set = load_dataset("philschmid/processed_bert_dataset", split="train")
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("philschmid/bert-base-uncased-2022-habana")

# Instantiate an untrained model
config = AutoConfig.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_config(config)

model.resize_token_embeddings(len(tokenizer))

# The data collator will take care of randomly masking the tokens
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer)

training_args = GaudiTrainingArguments(
    output_dir="/tmp/bert-base-uncased-mlm",
    num_train_epochs=1,
    per_device_train_batch_size=8,
    use_habana=True,
    use_lazy_mode=True,
    gaudi_config_name="Habana/bert-base-uncased",
)

# Initialize our Trainer
trainer = GaudiTrainer(
    model=model,
    args=training_args,
    train_dataset=training_set,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

trainer.train()
PreviousOverviewNextAccelerating Training

Last updated 1 year ago

You can see another example of pretraining in .

🌍
🌍
this blog post