Skip to content

Large Scale Deep Learning Models course at Constructor University Bremen

License

Notifications You must be signed in to change notification settings

isadrtdinov/lsdl-cub

Repository files navigation

CUB Large Scale Deep Learning Models, Fall 2024

The "Large Scale Deep Learning Models" course focuses on the methodologies and techniques used to train large models on extensive datasets across various data domains, including images, text, and audio. The course provides in-depth coverage of self-supervised learning approaches, which have become crucial for leveraging vast amounts of unlabeled data. Topics include data preprocessing and augmentation for different modalities, architectural considerations for scaling deep learning models, and strategies for distributed and parallel training.

Instructors: Alexander Shabalin, Ildus Sadrtdinov, Dmitry Kropotov

Classes: on Mondays offline in the classroom EH-4 in time slots 14:15 - 15:30 and 15:45 - 17:00

Telegram chat for questions and discussion: link

Practical assignments: all asssignments are given and checked in the corresponding Teams space. If you don't have access to Teams space, please write directly to one of the instructors or in the course chat.

Course assessment criteria

Assessment Component 1: written examination, Duration: 60 min, Weight: 50 %

Assessment Component 2: programming assignments, Weight: 50 %

Completion: To pass this module, the examination of each module component must be passed with at least 45%.

Lectures

Date Number Topic Materials
09.09.24 01 Introduction to the course. Large models, large datasets and self-supervised learning. What to do with a pretrained model? Linear probing, Fine-tuning, in-distribution (ID) and out-of-distribution (OOD) performance. CLIP model, Zero-shot and WiSE-FT (robust weights ensemble). Fine-tuning distorts features, Comparing pre-training algorithms, CLIP, WiSE-FT, Do ImageNet Classifiers Generalize to ImageNet?
23.09.24 02 Classical pretext tasks for images: inpainting, colorization, jigsaw puzzles Exemplar, Context Prediction, Inpainting, Jigsaw Puzzles, Colorization, Rotations, Damaged Jigsaw Puzzles, Task Ensemble
30.09.24 03 Modern architectures for images: ViT, DeiT, MLP Mixer, Swin, ConvNeXt, Neighborhood Attention Transformer (NAT).
Efficient training & inference: Automatic Mixed Precision (AMP), Data-Parallel and Model-Parallel training
Big Transfer, ViT, DeiT, MLP Mixer, Swin, ConvNeXt, NAT
07.10.24 04 Contrastive learning for images. Mutual information, SimCLR, MoCo, BYOL, SimSiam, DeepCluster, SwAV. Deriving contrastive loss SimCLR, MoCo, BYOL, SimSiam, DeepCluster, SwAV
14.10.24 05 Self-supervised learning for ViT. Masked image modeling. DINO, BEiT, MAE, MaskFeat. Different approaches for improving contrastive learning. DINO, BEiT, MAE, MaskFeat
Dense CL, Supervised CL, DiLo, LooC
21.10.24 06 Mode connectivity and Linear mode connectivity (LMC). Ensembling: Deep Ensemble (DE), SSE, FGE, cSGLD, KFAC-Laplace, SWAG, SPRO, StarSSE. Model averaging: SWA, model soups. Weight averaging in optimization: Lookahead, Lookaround, WATT LMC, LMC in transfer learning,
DE, DE and loss landscape, DE and distribution shifts,
SSE, FGE, cSGLD, KFAC-Laplace, SWAG, SPRO, DE Equivalent, StarSSE,
SWA, model soups, Lookahead, Lookaround, WATT
28.10.24 08 Modern architectures for texts. Recap of transformers. Modern architectures. Transformer training tricks. Flash attention, FA blogpost, KV-caching, Multi-Query attention, Relative Position Encodings, RoPE, ALiBi, GLU, Mixture of Experts, Pre-normalization, RMSNorm
04.11.24 07 Pruning, Quantization, Distillation Pruning, Quantization 1, Quantization 2, Distillation, DistilBERT
11.11.24 09 Parameter-Efficient Fine-tuning. GPT zero-shot, Prompt Tuning, Adapters, LoRA, BitFit GPT-3, Prompt Tuning, P-Tuning, Adapters, LoRA, BitFit
18.11.24 10 Retrieval Augmented Generation
25.11.24 11 Text Diffuion Models
02.12.24 12 Introduction to audio processing. Text-to-speech (TTS): WaveNet, Tacotron 2, WaveGlow, HiFi-GAN. Automatic Speech Recognition (ASR): CTC Loss, Jasper, Whisper. Self-supervised learning for audio: CPC, Wav2Vec 2.0, HUBERT, Multi-format contrastive learning, BYOL-A, CLAP WaveNet, Tacotron 2, WaveGlow, HiFi-GAN,
CTC Loss, Jasper, Whisper,
CPC, Wav2Vec 2.0, HuBERT, Multi-format CL, BYOL-A, CLAP

Home assignments

Number Release date Deadline Topic
01 15.09.24 01.10.24 23:59 Robust fine-tuning of CLIP
02 01.10.24 18.10.24 23:59 Classical pre-text tasks
03 18.10.24 06.11.24 23:59 Contrastive learning

About

Large Scale Deep Learning Models course at Constructor University Bremen

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •