Transformers
search
Ctrlk
  • 🌍GET STARTEDchevron-right
  • 🌍TUTORIALSchevron-right
  • 🌍TASK GUIDESchevron-right
  • 🌍DEVELOPER GUIDESchevron-right
  • 🌍PERFORMANCE AND SCALABILITYchevron-right
  • 🌍CONTRIBUTEchevron-right
  • 🌍CONCEPTUAL GUIDESchevron-right
  • 🌍APIchevron-right
    • 🌍MAIN CLASSESchevron-right
    • 🌍MODELSchevron-right
      • 🌍TEXT MODELSchevron-right
      • 🌍VISION MODELSchevron-right
      • 🌍AUDIO MODELSchevron-right
      • 🌍MULTIMODAL MODELSchevron-right
        • ALIGN
        • AltCLIP
        • BLIP
        • BLIP-2
        • BridgeTower
        • BROS
        • Chinese-CLIP
        • CLIP
        • CLIPSeg
        • Data2Vec
        • DePlot
        • Donut
        • FLAVA
        • GIT
        • GroupViT
        • IDEFICS
        • InstructBLIP
        • LayoutLM
        • LayoutLMV2
        • LayoutLMV3
        • LayoutXLM
        • LiLT
        • LXMERT
        • MatCha
        • MGP-STR
        • Nougat
        • OneFormer
        • OWL-ViT
        • Perceiver
        • Pix2Struct
        • Segment Anything
        • Speech Encoder Decoder Models
        • TAPAS
        • TrOCR
        • TVLT
        • ViLT
        • Vision Encoder Decoder Models
        • Vision Text Dual Encoder
        • VisualBERT
        • X-CLIP
      • 🌍REINFORCEMENT LEARNING MODELSchevron-right
      • 🌍TIME SERIES MODELSchevron-right
      • 🌍GRAPH MODELSchevron-right
  • 🌍INTERNAL HELPERSchevron-right
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. 🌍APIchevron-right
  2. 🌍MODELS

🌍MULTIMODAL MODELS

ALIGNchevron-rightAltCLIPchevron-rightBLIPchevron-rightBLIP-2chevron-rightBridgeTowerchevron-rightBROSchevron-rightChinese-CLIPchevron-rightCLIPchevron-rightCLIPSegchevron-rightData2Vecchevron-rightDePlotchevron-rightDonutchevron-rightFLAVAchevron-rightGITchevron-rightGroupViTchevron-rightIDEFICSchevron-rightInstructBLIPchevron-rightLayoutLMchevron-rightLayoutLMV2chevron-rightLayoutLMV3chevron-rightLayoutXLMchevron-rightLiLTchevron-rightLXMERTchevron-rightMatChachevron-rightMGP-STRchevron-rightNougatchevron-rightOneFormerchevron-rightOWL-ViTchevron-rightPerceiverchevron-rightPix2Structchevron-rightSegment Anythingchevron-rightSpeech Encoder Decoder Modelschevron-rightTAPASchevron-rightTrOCRchevron-rightTVLTchevron-rightViLTchevron-rightVision Encoder Decoder Modelschevron-rightVision Text Dual Encoderchevron-rightVisualBERTchevron-rightX-CLIPchevron-right
PreviousXLSR-Wav2Vec2chevron-leftNextALIGNchevron-right