Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for Unit 2 - ConvNext #306

Merged
merged 2 commits into from
Jul 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions chapters/en/unit2/cnns/convnext.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ The key improvements are:
- Training techniques
- Macro design
- ResNeXt-ify
- Inverted bottleneck
- Large kernel sizes
- Micro design
- Inverted Bottleneck
- Large Kernel Sizes
- Micro Design

We will go through each of the key improvements.
These designs are not novel in itself. However, you can learn how researchers adapt and modify designs systematically to improve existing models.
Expand All @@ -28,10 +28,9 @@ The researchers first discerned that, while architectural design choices are cru
Inspired by DeiT and Swin Transformers, ConvNext closely adapts their training techniques. Some of the notable changes are:
- Epochs: Extending the epochs from the original 90 epochs to 300 epochs.
- Optimizer: Using AdamW optimizer instead of Adam optimizer, which differs in how it handles weight decay.
- Regularization: Using Stochastic Depth and Label Smoothing as regularization techniques.
- Mixup (generates a weighted combination of random image pairs), Cutmix (cuts part of an image and replace it with a patch from another image), RandAugment (applies a series of random augmentations such as rotation, translation, and shear), and Random Erasing (randomly selects a rectangle region in an image and erases its pixels with random values) to increase training data.
Modifying these training procedures has improved ResNet-50's accuracy from 76.1% to 78.8%.
- Regularization: Using Stochastic Depth and Label Smoothing as regularization techniques.

Modifying these training procedures has improved ResNet-50's accuracy from 76.1% to 78.8%.


Expand Down
Loading