All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Compositional Attention [#41]
- bugfix Favor, single feature map [#183]
- Much faster fused dropout [#164]
- Fused dropout repeatability [#173]
- Embedding weight tying option [#172]
- Dropout setting not properly passed in many attentions [#123]
- Fix self attention optimization not being triggered, broken residual path [#119]
- Improve speed by not using contiguous Tensors when not needed [#119]
- Attention mask wrapper [#113]
- ViT comparison benchmark [#117]
- Homogenizing the masks, additive or bool [#79][#85][#86]
- Fix causality flag not being respected [#103]
- Enabling FusedLayerNorm by default in the factory if Triton is available
- Fixing Favor with fp16
- Fixing Favor trainability
- Fused dropout/bias/activation layer [#58]
- Fused layernorm used by default in the factory [#92]
- Nystrom causal attention [#75]
- More robust blocksparse [#24]
- Rotary embeddings [#32]
- More flexible layernorm [#50]