This repository provides a brief summary of algorithms from our review paper A Survey of Self-Supervised Learning from Multiple Perspectives: Algorithms, Theory, Applications and Future Trends.
SSL research breakthroughs in CV have been achieved in recent years. In this work, we therefore mainly include SSL research derived from the CV community in recent years, especially classic and influential research results. The objectives of this review are to explain what SSL is, its categories and subcategories, how it differs and relates to other machine learning paradigms, and its theoretical underpinnings. We present an up-to-date and comprehensive review of the frontiers of visual SSL and divide visual SSL into three parts: context-based, contrastive, and generative SSL, in the hope of sorting the trends for researchers.
See our paper for more details.
-
(Rotation): Unsupervised representation learning by predicting image rotations. [paper] [code]
-
(Jigsaw): Scaling and Benchmarking Self-Supervised Visual Representation Learning. [paper] [code]
-
CL methods based on negative examples:
-
(MoCo v1): Momentum Contrast for Unsupervised Visual Representation Learning. [paper] [code]
-
(MoCo v2): Improved Baselines with Momentum Contrastive Learning. [paper] [code]
-
(MoCo v3): An Empirical Study of Training Self-Supervised Vision Transformers. [paper] [code]
-
(SimCLR V1): A Simple Framework for Contrastive Learning of Visual Representations. [paper] [code]
-
(SimCLR V2): Big Self-Supervised Models are Strong Semi-Supervised Learners. [paper] [code]
-
-
CL methods based on self-distillation:
-
CL methods based on feature decorrelation:
-
Others:
- methods that combinate CL and MIM
- ...
-
(BEiT): Beit: Bert pre-training of image transformers. [paper] [code]
-
(MAE): Masked Autoencoders Are Scalable Vision Learners. [paper] [code]
-
(iBOT): iBOT: Image BERT Pre-Training with Online Tokenizer. [paper] [code]
-
(CAE): Context Autoencoder for Self-Supervised Representation Learning. [paper] [code]
-
(SimMIM): SimMIM: a Simple Framework for Masked Image Modeling. [paper] [code]
Natural language processing (NLP)
-
(Skip-Gram): Distributed Representations of Words and Phrases and their Compositionality. [paper] [code]
-
(BERT): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. [paper] [code]
-
(GPT): Improving Language Understanding by Generative Pre-Training. [paper]
Sequential models for image processing and computer vision
-
(CPC): Representation learning with contrastive predictive coding. [paper]
-
(Image GPT): Distributed Representations of Words and Phrases and their Compositionality. [paper] [code]
-
(MIL-NCE): End-to-End Learning of Visual Representations From Uncurated Instructional Videos. [paper] [code]
-
Unsupervised Learning of Visual Representations using Videos. [paper]
-
Unsupervised Learning of Video Representations using LSTMs. [paper] [code]
The order of the frames:
-
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification. [paper]
-
Self-Supervised Video Representation Learning With Odd-One-Out Networks. [paper]
Video playing direction:
- Learning and Using the Arrow of Time. [paper]
Video playing speed:
- (SpeedNet): SpeedNet: Learning the Speediness in Videos. [paper]
-
(DynamoNet): DynamoNet: Dynamic Action and Motion Network. [paper]
-
(CoCLR): Self-supervised Co-training for Video Representation Learning. [paper] [code]
-
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization. [paper]
-
Time-Contrastive Networks: Self-Supervised Learning from Video. [paper]
-
Learning Correspondence from the Cycle-Consistency of Time. [paper]
-
(VCP): Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning. [paper]
-
Joint-task Self-supervised Learning for Temporal Correspondence. [paper] [code]
-
medical field: Preservational Learning Improves Self-supervised Medical Image Models by Reconstructing Diverse Contexts. [paper] [code]
-
medical image segmentation: Contrastive learning of global and local features for medical image segmentation with limited annotations. [paper] [code]
-
3D medical image analysis: Rubik’s Cube+: A self-supervised feature learning framework for 3D medical image analysis. [paper]
If you have any suggestions or find our work helpful, feel free to contact us
Email: {guijie,tchen}@seu.edu.cn