Transformers
search
⌘Ctrlk
Transformers
  • 🌍GET STARTED
  • 🌍TUTORIALS
  • 🌍TASK GUIDES
  • 🌍DEVELOPER GUIDES
  • 🌍PERFORMANCE AND SCALABILITY
  • 🌍CONTRIBUTE
  • 🌍CONCEPTUAL GUIDES
  • 🌍API
    • 🌍MAIN CLASSES
    • 🌍MODELS
      • 🌍TEXT MODELS
      • 🌍VISION MODELS
      • 🌍AUDIO MODELS
      • 🌍MULTIMODAL MODELS
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELS
      • 🌍TIME SERIES MODELS
      • 🌍GRAPH MODELS
  • 🌍INTERNAL HELPERS
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. 🌍APIchevron-right
  2. 🌍MODELS

🌍MULTIMODAL MODELS

ALIGNchevron-rightAltCLIPchevron-rightBLIPchevron-rightBLIP-2chevron-rightBridgeTowerchevron-rightBROSchevron-rightChinese-CLIPchevron-rightCLIPchevron-rightCLIPSegchevron-rightData2Vecchevron-rightDePlotchevron-rightDonutchevron-rightFLAVAchevron-rightGITchevron-rightGroupViTchevron-rightIDEFICSchevron-rightInstructBLIPchevron-rightLayoutLMchevron-rightLayoutLMV2chevron-rightLayoutLMV3chevron-rightLayoutXLMchevron-rightLiLTchevron-rightLXMERTchevron-rightMatChachevron-rightMGP-STRchevron-rightNougatchevron-rightOneFormerchevron-rightOWL-ViTchevron-rightPerceiverchevron-rightPix2Structchevron-rightSegment Anythingchevron-rightSpeech Encoder Decoder Modelschevron-rightTAPASchevron-rightTrOCRchevron-rightTVLTchevron-rightViLTchevron-rightVision Encoder Decoder Modelschevron-rightVision Text Dual Encoderchevron-rightVisualBERTchevron-rightX-CLIPchevron-right
PreviousXLSR-Wav2Vec2chevron-leftNextALIGNchevron-right