🌍MULTIMODAL MODELS
ALIGNAltCLIPBLIPBLIP-2BridgeTowerBROSChinese-CLIPCLIPCLIPSegData2VecDePlotDonutFLAVAGITGroupViTIDEFICSInstructBLIPLayoutLMLayoutLMV2LayoutLMV3LayoutXLMLiLTLXMERTMatChaMGP-STRNougatOneFormerOWL-ViTPerceiverPix2StructSegment AnythingSpeech Encoder Decoder ModelsTAPASTrOCRTVLTViLTVision Encoder Decoder ModelsVision Text Dual EncoderVisualBERTX-CLIP