Transformers
  • 🌍GET STARTED
    • Transformers
    • Quick tour
    • Installation
  • 🌍TUTORIALS
    • Run inference with pipelines
    • Write portable code with AutoClass
    • Preprocess data
    • Fine-tune a pretrained model
    • Train with a script
    • Set up distributed training with BOINC AI Accelerate
    • Load and train adapters with BOINC AI PEFT
    • Share your model
    • Agents
    • Generation with LLMs
  • 🌍TASK GUIDES
    • 🌍NATURAL LANGUAGE PROCESSING
      • Text classification
      • Token classification
      • Question answering
      • Causal language modeling
      • Masked language modeling
      • Translation
      • Summarization
      • Multiple choice
    • 🌍AUDIO
      • Audio classification
      • Automatic speech recognition
    • 🌍COMPUTER VISION
      • Image classification
      • Semantic segmentation
      • Video classification
      • Object detection
      • Zero-shot object detection
      • Zero-shot image classification
      • Depth estimation
    • 🌍MULTIMODAL
      • Image captioning
      • Document Question Answering
      • Visual Question Answering
      • Text to speech
    • 🌍GENERATION
      • Customize the generation strategy
    • 🌍PROMPTING
      • Image tasks with IDEFICS
  • 🌍DEVELOPER GUIDES
    • Use fast tokenizers from BOINC AI Tokenizers
    • Run inference with multilingual models
    • Use model-specific APIs
    • Share a custom model
    • Templates for chat models
    • Run training on Amazon SageMaker
    • Export to ONNX
    • Export to TFLite
    • Export to TorchScript
    • Benchmarks
    • Notebooks with examples
    • Community resources
    • Custom Tools and Prompts
    • Troubleshoot
  • 🌍PERFORMANCE AND SCALABILITY
    • Overview
    • 🌍EFFICIENT TRAINING TECHNIQUES
      • Methods and tools for efficient training on a single GPU
      • Multiple GPUs and parallelism
      • Efficient training on CPU
      • Distributed CPU training
      • Training on TPUs
      • Training on TPU with TensorFlow
      • Training on Specialized Hardware
      • Custom hardware for training
      • Hyperparameter Search using Trainer API
    • 🌍OPTIMIZING INFERENCE
      • Inference on CPU
      • Inference on one GPU
      • Inference on many GPUs
      • Inference on Specialized Hardware
    • Instantiating a big model
    • Troubleshooting
    • XLA Integration for TensorFlow Models
    • Optimize inference using `torch.compile()`
  • 🌍CONTRIBUTE
    • How to contribute to transformers?
    • How to add a model to BOINC AI Transformers?
    • How to convert a BOINC AI Transformers model to TensorFlow?
    • How to add a pipeline to BOINC AI Transformers?
    • Testing
    • Checks on a Pull Request
  • 🌍CONCEPTUAL GUIDES
    • Philosophy
    • Glossary
    • What BOINC AI Transformers can do
    • How BOINC AI Transformers solve tasks
    • The Transformer model family
    • Summary of the tokenizers
    • Attention mechanisms
    • Padding and truncation
    • BERTology
    • Perplexity of fixed-length models
    • Pipelines for webserver inference
    • Model training anatomy
  • 🌍API
    • 🌍MAIN CLASSES
      • Agents and Tools
      • 🌍Auto Classes
        • Extending the Auto Classes
        • AutoConfig
        • AutoTokenizer
        • AutoFeatureExtractor
        • AutoImageProcessor
        • AutoProcessor
        • Generic model classes
          • AutoModel
          • TFAutoModel
          • FlaxAutoModel
        • Generic pretraining classes
          • AutoModelForPreTraining
          • TFAutoModelForPreTraining
          • FlaxAutoModelForPreTraining
        • Natural Language Processing
          • AutoModelForCausalLM
          • TFAutoModelForCausalLM
          • FlaxAutoModelForCausalLM
          • AutoModelForMaskedLM
          • TFAutoModelForMaskedLM
          • FlaxAutoModelForMaskedLM
          • AutoModelForMaskGenerationge
          • TFAutoModelForMaskGeneration
          • AutoModelForSeq2SeqLM
          • TFAutoModelForSeq2SeqLM
          • FlaxAutoModelForSeq2SeqLM
          • AutoModelForSequenceClassification
          • TFAutoModelForSequenceClassification
          • FlaxAutoModelForSequenceClassification
          • AutoModelForMultipleChoice
          • TFAutoModelForMultipleChoice
          • FlaxAutoModelForMultipleChoice
          • AutoModelForNextSentencePrediction
          • TFAutoModelForNextSentencePrediction
          • FlaxAutoModelForNextSentencePrediction
          • AutoModelForTokenClassification
          • TFAutoModelForTokenClassification
          • FlaxAutoModelForTokenClassification
          • AutoModelForQuestionAnswering
          • TFAutoModelForQuestionAnswering
          • FlaxAutoModelForQuestionAnswering
          • AutoModelForTextEncoding
          • TFAutoModelForTextEncoding
        • Computer vision
          • AutoModelForDepthEstimation
          • AutoModelForImageClassification
          • TFAutoModelForImageClassification
          • FlaxAutoModelForImageClassification
          • AutoModelForVideoClassification
          • AutoModelForMaskedImageModeling
          • TFAutoModelForMaskedImageModeling
          • AutoModelForObjectDetection
          • AutoModelForImageSegmentation
          • AutoModelForImageToImage
          • AutoModelForSemanticSegmentation
          • TFAutoModelForSemanticSegmentation
          • AutoModelForInstanceSegmentation
          • AutoModelForUniversalSegmentation
          • AutoModelForZeroShotImageClassification
          • TFAutoModelForZeroShotImageClassification
          • AutoModelForZeroShotObjectDetection
        • Audio
          • AutoModelForAudioClassification
          • AutoModelForAudioFrameClassification
          • TFAutoModelForAudioFrameClassification
          • AutoModelForCTC
          • AutoModelForSpeechSeq2Seq
          • TFAutoModelForSpeechSeq2Seq
          • FlaxAutoModelForSpeechSeq2Seq
          • AutoModelForAudioXVector
          • AutoModelForTextToSpectrogram
          • AutoModelForTextToWaveform
        • Multimodal
          • AutoModelForTableQuestionAnswering
          • TFAutoModelForTableQuestionAnswering
          • AutoModelForDocumentQuestionAnswering
          • TFAutoModelForDocumentQuestionAnswering
          • AutoModelForVisualQuestionAnswering
          • AutoModelForVision2Seq
          • TFAutoModelForVision2Seq
          • FlaxAutoModelForVision2Seq
      • Callbacks
      • Configuration
      • Data Collator
      • Keras callbacks
      • Logging
      • Models
      • Text Generation
      • ONNX
      • Optimization
      • Model outputs
      • Pipelines
      • Processors
      • Quantization
      • Tokenizer
      • Trainer
      • DeepSpeed Integration
      • Feature Extractor
      • Image Processor
    • 🌍MODELS
      • 🌍TEXT MODELS
        • ALBERT
        • BART
        • BARThez
        • BARTpho
        • BERT
        • BertGeneration
        • BertJapanese
        • Bertweet
        • BigBird
        • BigBirdPegasus
        • BioGpt
        • Blenderbot
        • Blenderbot Small
        • BLOOM
        • BORT
        • ByT5
        • CamemBERT
        • CANINE
        • CodeGen
        • CodeLlama
        • ConvBERT
        • CPM
        • CPMANT
        • CTRL
        • DeBERTa
        • DeBERTa-v2
        • DialoGPT
        • DistilBERT
        • DPR
        • ELECTRA
        • Encoder Decoder Models
        • ERNIE
        • ErnieM
        • ESM
        • Falcon
        • FLAN-T5
        • FLAN-UL2
        • FlauBERT
        • FNet
        • FSMT
        • Funnel Transformer
        • GPT
        • GPT Neo
        • GPT NeoX
        • GPT NeoX Japanese
        • GPT-J
        • GPT2
        • GPTBigCode
        • GPTSAN Japanese
        • GPTSw3
        • HerBERT
        • I-BERT
        • Jukebox
        • LED
        • LLaMA
        • LLama2
        • Longformer
        • LongT5
        • LUKE
        • M2M100
        • MarianMT
        • MarkupLM
        • MBart and MBart-50
        • MEGA
        • MegatronBERT
        • MegatronGPT2
        • Mistral
        • mLUKE
        • MobileBERT
        • MPNet
        • MPT
        • MRA
        • MT5
        • MVP
        • NEZHA
        • NLLB
        • NLLB-MoE
        • NystrΓΆmformer
        • Open-Llama
        • OPT
        • Pegasus
        • PEGASUS-X
        • Persimmon
        • PhoBERT
        • PLBart
        • ProphetNet
        • QDQBert
        • RAG
        • REALM
        • Reformer
        • RemBERT
        • RetriBERT
        • RoBERTa
        • RoBERTa-PreLayerNorm
        • RoCBert
        • RoFormer
        • RWKV
        • Splinter
        • SqueezeBERT
        • SwitchTransformers
        • T5
        • T5v1.1
        • TAPEX
        • Transformer XL
        • UL2
        • UMT5
        • X-MOD
        • XGLM
        • XLM
        • XLM-ProphetNet
        • XLM-RoBERTa
        • XLM-RoBERTa-XL
        • XLM-V
        • XLNet
        • YOSO
      • 🌍VISION MODELS
        • BEiT
        • BiT
        • Conditional DETR
        • ConvNeXT
        • ConvNeXTV2
        • CvT
        • Deformable DETR
        • DeiT
        • DETA
        • DETR
        • DiNAT
        • DINO V2
        • DiT
        • DPT
        • EfficientFormer
        • EfficientNet
        • FocalNet
        • GLPN
        • ImageGPT
        • LeViT
        • Mask2Former
        • MaskFormer
        • MobileNetV1
        • MobileNetV2
        • MobileViT
        • MobileViTV2
        • NAT
        • PoolFormer
        • Pyramid Vision Transformer (PVT)
        • RegNet
        • ResNet
        • SegFormer
        • SwiftFormer
        • Swin Transformer
        • Swin Transformer V2
        • Swin2SR
        • Table Transformer
        • TimeSformer
        • UperNet
        • VAN
        • VideoMAE
        • Vision Transformer (ViT)
        • ViT Hybrid
        • ViTDet
        • ViTMAE
        • ViTMatte
        • ViTMSN
        • ViViT
        • YOLOS
      • 🌍AUDIO MODELS
        • Audio Spectrogram Transformer
        • Bark
        • CLAP
        • EnCodec
        • Hubert
        • MCTCT
        • MMS
        • MusicGen
        • Pop2Piano
        • SEW
        • SEW-D
        • Speech2Text
        • Speech2Text2
        • SpeechT5
        • UniSpeech
        • UniSpeech-SAT
        • VITS
        • Wav2Vec2
        • Wav2Vec2-Conformer
        • Wav2Vec2Phoneme
        • WavLM
        • Whisper
        • XLS-R
        • XLSR-Wav2Vec2
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
        • Decision Transformer
        • Trajectory Transformer
      • 🌍TIME SERIES MODELS
        • Autoformer
        • Informer
        • Time Series Transformer
      • 🌍GRAPH MODELS
        • Graphormer
  • 🌍INTERNAL HELPERS
    • Custom Layers and Utilities
    • Utilities for pipelines
    • Utilities for Tokenizers
    • Utilities for Trainer
    • Utilities for Generation
    • Utilities for Image Processors
    • Utilities for Audio processing
    • General Utilities
    • Utilities for Time Series
Powered by GitBook
On this page
  • Graphormer
  • Overview
  • GraphormerConfig
  • GraphormerModel
  • GraphormerForGraphClassification
  1. API
  2. MODELS
  3. GRAPH MODELS

Graphormer

PreviousGRAPH MODELSNextINTERNAL HELPERS

Last updated 1 year ago

Graphormer

Overview

The Graphormer model was proposed in by Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen and Tie-Yan Liu. It is a Graph Transformer model, modified to allow computations on graphs instead of text sequences by generating embeddings and features of interest during preprocessing and collation, then using a modified attention.

The abstract from the paper is the following:

The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer.

Tips:

This model will not work well on large graphs (more than 100 nodes/edges), as it will make the memory explode. You can reduce the batch size, increase your RAM, or decrease the UNREACHABLE_NODE_DISTANCE parameter in algos_graphormer.pyx, but it will be hard to go above 700 nodes/edges.

This model does not use a tokenizer, but instead a special collator during training.

This model was contributed by . The original code can be found .

GraphormerConfig

class transformers.GraphormerConfig

( num_classes: int = 1num_atoms: int = 4608num_edges: int = 1536num_in_degree: int = 512num_out_degree: int = 512num_spatial: int = 512num_edge_dis: int = 128multi_hop_max_dist: int = 5spatial_pos_max: int = 1024edge_type: str = 'multi_hop'max_nodes: int = 512share_input_output_embed: bool = Falsenum_hidden_layers: int = 12embedding_dim: int = 768ffn_embedding_dim: int = 768num_attention_heads: int = 32dropout: float = 0.1attention_dropout: float = 0.1activation_dropout: float = 0.1layerdrop: float = 0.0encoder_normalize_before: bool = Falsepre_layernorm: bool = Falseapply_graphormer_init: bool = Falseactivation_fn: str = 'gelu'embed_scale: float = Nonefreeze_embeddings: bool = Falsenum_trans_layers_to_freeze: int = 0traceable: bool = Falseq_noise: float = 0.0qn_block_size: int = 8kdim: int = Nonevdim: int = Nonebias: bool = Trueself_attention: bool = Truepad_token_id = 0bos_token_id = 1eos_token_id = 2**kwargs )

Parameters

  • num_classes (int, optional, defaults to 1) β€” Number of target classes or labels, set to n for binary classification of n tasks.

  • num_atoms (int, optional, defaults to 512*9) β€” Number of node types in the graphs.

  • num_edges (int, optional, defaults to 512*3) β€” Number of edges types in the graph.

  • num_in_degree (int, optional, defaults to 512) β€” Number of in degrees types in the input graphs.

  • num_out_degree (int, optional, defaults to 512) β€” Number of out degrees types in the input graphs.

  • num_edge_dis (int, optional, defaults to 128) β€” Number of edge dis in the input graphs.

  • multi_hop_max_dist (int, optional, defaults to 20) β€” Maximum distance of multi hop edges between two nodes.

  • spatial_pos_max (int, optional, defaults to 1024) β€” Maximum distance between nodes in the graph attention bias matrices, used during preprocessing and collation.

  • edge_type (str, optional, defaults to multihop) β€” Type of edge relation chosen.

  • max_nodes (int, optional, defaults to 512) β€” Maximum number of nodes which can be parsed for the input graphs.

  • share_input_output_embed (bool, optional, defaults to False) β€” Shares the embedding layer between encoder and decoder - careful, True is not implemented.

  • num_layers (int, optional, defaults to 12) β€” Number of layers.

  • embedding_dim (int, optional, defaults to 768) β€” Dimension of the embedding layer in encoder.

  • ffn_embedding_dim (int, optional, defaults to 768) β€” Dimension of the β€œintermediate” (often named feed-forward) layer in encoder.

  • num_attention_heads (int, optional, defaults to 32) β€” Number of attention heads in the encoder.

  • self_attention (bool, optional, defaults to True) β€” Model is self attentive (False not implemented).

  • activation_function (str or function, optional, defaults to "gelu") β€” The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.

  • dropout (float, optional, defaults to 0.1) β€” The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

  • attention_dropout (float, optional, defaults to 0.1) β€” The dropout probability for the attention weights.

  • activation_dropout (float, optional, defaults to 0.1) β€” The dropout probability for the activation of the linear transformer layer.

  • layerdrop (float, optional, defaults to 0.0) β€” The LayerDrop probability for the encoder. See the [LayerDrop paper](see ) for more details.

  • bias (bool, optional, defaults to True) β€” Uses bias in the attention module - unsupported at the moment.

  • embed_scale(float, optional, defaults to None) β€” Scaling factor for the node embeddings.

  • num_trans_layers_to_freeze (int, optional, defaults to 0) β€” Number of transformer layers to freeze.

  • encoder_normalize_before (bool, optional, defaults to False) β€” Normalize features before encoding the graph.

  • pre_layernorm (bool, optional, defaults to False) β€” Apply layernorm before self attention and the feed forward network. Without this, post layernorm will be used.

  • apply_graphormer_init (bool, optional, defaults to False) β€” Apply a custom graphormer initialisation to the model before training.

  • freeze_embeddings (bool, optional, defaults to False) β€” Freeze the embedding layer, or train it along the model.

  • encoder_normalize_before (bool, optional, defaults to False) β€” Apply the layer norm before each encoder block.

  • q_noise (float, optional, defaults to 0.0) β€” Amount of quantization noise (see β€œTraining with Quantization Noise for Extreme Model Compression”). (For more detail, see fairseq’s documentation on quant_noise).

  • qn_block_size (int, optional, defaults to 8) β€” Size of the blocks for subsequent quantization with iPQ (see q_noise).

  • kdim (int, optional, defaults to None) β€” Dimension of the key in the attention, if different from the other values.

  • vdim (int, optional, defaults to None) β€” Dimension of the value in the attention, if different from the other values.

  • use_cache (bool, optional, defaults to True) β€” Whether or not the model should return the last key/values attentions (not used by all models).

  • traceable (bool, optional, defaults to False) β€” Changes return value of the encoder’s inner_state to stacked tensors.

    Example β€”

GraphormerModel

class transformers.GraphormerModel

( config: GraphormerConfig )

The Graphormer model is a graph-encoder model.

It goes from a graph to its representation. If you want to use the model for a downstream classification task, use GraphormerForGraphClassification instead. For any other downstream task, feel free to add a new class, or combine this model with a downstream model of your choice, following the example in GraphormerForGraphClassification.

forward

( input_nodes: LongTensorinput_edges: LongTensorattn_bias: Tensorin_degree: LongTensorout_degree: LongTensorspatial_pos: LongTensorattn_edge_type: LongTensorperturb: typing.Optional[torch.FloatTensor] = Nonemasked_tokens: None = Nonereturn_dict: typing.Optional[bool] = None**unused )

GraphormerForGraphClassification

class transformers.GraphormerForGraphClassification

( config: GraphormerConfig )

This model can be used for graph-level classification or regression tasks.

It can be trained on

  • regression (by setting config.num_classes to 1); there should be one float-type label per graph

  • one task classification (by setting config.num_classes to the number of classes); there should be one integer label per graph

  • binary multi-task classification (by setting config.num_classes to the number of labels); there should be a list of integer labels for each graph.

forward

( input_nodes: LongTensorinput_edges: LongTensorattn_bias: Tensorin_degree: LongTensorout_degree: LongTensorspatial_pos: LongTensorattn_edge_type: LongTensorlabels: typing.Optional[torch.LongTensor] = Nonereturn_dict: typing.Optional[bool] = None**unused )

This is the configuration class to store the configuration of a . It is used to instantiate an Graphormer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Graphormer architecture.

Configuration objects inherit from and can be used to control the model outputs. Read the documentation from for more information.

🌍
🌍
🌍
Do Transformers Really Perform Bad for Graph Representation?
clefourrier
here
<source>
https://arxiv.org/abs/1909.11556
~GraphormerModel
graphormer-base-pcqm4mv1
PretrainedConfig
PretrainedConfig
<source>
<source>
<source>
<source>