diff --git a/README.md b/README.md index 58b67f80..21552f14 100644 --- a/README.md +++ b/README.md @@ -1731,6 +1731,7 @@ #### [Manifold Diffusion Fields](summaries/2305.15586.md) #### [QLoRA: Efficient Finetuning of Quantized LLMs](summaries/2305.14314.md) #### [Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization](summaries/2305.14152.md) +#### [GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints](summaries/2305.13245.md) #### [RWKV: Reinventing RNNs for the Transformer Era](summaries/2305.13048.md) #### [Accurate Knowledge Distillation with n-best Reranking](summaries/2305.12057.md) #### [LLM-Pruner: On the Structural Pruning of Large Language Models](summaries/2305.11627.md) diff --git a/summaries/2305.13245.md b/summaries/2305.13245.md new file mode 100644 index 00000000..e93fd2b2 --- /dev/null +++ b/summaries/2305.13245.md @@ -0,0 +1,5 @@ +# GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints +## TL;DR +## Summary +- [https://arxiv.org/pdf/2305.13245.pdf](https://arxiv.org/pdf/2305.13245.pdf) +