Model Zoo

Summary: This is a non-exhaustive list of internal and external pretrained models. Internal pretrained models are stored on Zeonodo.

Table of Contents

Representation Learning Models (RLMs)
Downstream Task Models (DTMs)

Representation Learning Models (RLMs)

Note: RLMs are typically task-agnostic models which are used at the state of Preprocessing (which is often incorrectly called "pretraining" phase). They are usually trained using Self-Supervised Learning, but Supervised, (narrow-sense) Unsupervised Learning and Reinforcement Learning can also be used. The end goal is to extract dense representations / features / embeddings. This is typically trained at an external dataset (i.e. source dataset) whihch is larger than the one for the Downstream Task (i.e. target dataset).

Computer Vision

ViT

Paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Code: https://github.com/google-research/vision_transformer
Pretrained Model: https://github.com/google-research/vision_transformer
Released: 2020
Learning: Supervised
Architecture: Transformer

BEiT

Paper: BEiT: BERT Pre-Training of Image Transformers
Code: https://github.com/microsoft/unilm/tree/master/beit
Pretrained Model: https://github.com/microsoft/unilm/tree/master/beit
Released: 2021
Learning: Self-Supervised
Architecture: Transformer

DeiT 3

Paper: DeiT III: Revenge of the ViT
Code: https://github.com/facebookresearch/deit
Pretrained Model: https://github.com/facebookresearch/deit
Released: 2022
Learning: Self-Supervised
Architecture: Transformer

NLP

BERT

Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Code: https://github.com/google-research/bert
Pretrained Model: https://github.com/google-research/bert
Released: 2018
Learning: Self-Supervised
Architecture: Transformer

GPT-J (GPT-3 open-source alternative)

Paper: Language Models are Few-Shot Learners
Code: https://github.com/kingoflolz/mesh-transformer-jax
Pretrained Model: https://github.com/kingoflolz/mesh-transformer-jax
Released: 2020
Learning: Self-Supervised
Architecture: Transformer

Speech Processing

HuBERT

Paper: HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Code: https://github.com/facebookresearch/fairseq/tree/main/examples/hubert
Pretrained model: https://github.com/facebookresearch/fairseq/tree/main/examples/hubert
Released: 2021
Learning: Self-Supervised
Architecture: Transformer

Multimodal Learning

ViLBERT

Paper: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Code: https://github.com/facebookresearch/vilbert-multi-task
Pretrained model:
Released: 2019
Learning: Self-Supervised
Architecture: Transformer
Modalities: images, text

MMFT-BERT

Paper: MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Code: https://github.com/aurooj/MMFT-BERT
Pretrained model:
Released: 2020
Learning: Self-Supervised
Architecture: Transformer
Modalities: images, text

ViLT

Paper: ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Code: https://github.com/dandelin/vilt
Pretrained model: https://github.com/dandelin/vilt
Released: 2021
Learning: Self-Supervised
Architecture: Transformer
Modalities: images, text

AV-HuBERT

Paper: Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Code: https://github.com/facebookresearch/av_hubert
Pretrained model: https://github.com/facebookresearch/av_hubert
Released: 2022
Learning: Self-Supervised
Architecture: Transformer
Modalities: images, audio

Downstream Task Models (DTMs)

Note: DTMs are typically task-specific models which are used at the Downstream Task. They are usually trained using Supervised Learning, but Self-Supervised, (narrow-sense) Unsupervised Learning and Reinforcement Learning can also be used. The end goal depends on the specific task. The DTMs listed below are only the ones compatible with dense-representation RLMs (e.g. Transformer-based encoders).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Model Zoo

Representation Learning Models (RLMs)

Computer Vision

NLP

Speech Processing

Multimodal Learning

Downstream Task Models (DTMs)

Computer Vision

NLP

Speech Processing

Multimodal Learning

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Model Zoo