Skip to content

AllenWrong/Deep-Model-Play

Repository files navigation

Deep-Model-Play

reproduce for the modern basic deep model

Model

  • Attention
  • Multi-Head Attention
  • GPT-2

Optimizer

This following the custom optimizer with my understand. Your issue and question is welcomed!

  • SGD
  • Momentum SGD
  • Nestrov SGD
  • Adam
  • Nadam
  • Adamw,(but maybe some bug here)

no weight decay supported. This will be added soon!

About

reproduce for the modern basic deep model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published