Pytorch Implementation of Memorizing Transformers by Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian Szegedy [https://arxiv.org/abs/2203.08913] The "Memorizing Transformers" paper introduces the Memory-Augmented Transformer (MAT) architecture, integrating a Memory Attention Module (MAM) utilizing k-nearest neighbors (KNN) search for efficient retrieval of relevant information from memory. This enhances the transformer's ability to handle tasks requiring memorization by facilitating effective storage and retrieval of key information during decoding.
Trained for 160 epochs where the loss dropped from 5.03 to 2.39