YaRN

This repo contains the code and data for the YaRN context window extension method.

Paper

Paper (ICLR 2024): YaRN: Efficient Context Window Extension of Large Language Models
Old Preprint (arXiv)

Models

LLaMA

We publish variants of Llama 2 fine-tuned with YaRN at 32K, 64K and 128K context window length. They are available under the Llama 2 license on 🤗 Hugging Face.

Size	Context	Link
7B	64K	NousResearch/Yarn-Llama-2-7b-64k
7B	128K	NousResearch/Yarn-Llama-2-7b-128k
13B	64K	NousResearch/Yarn-Llama-2-13b-64k
13B	128K	NousResearch/Yarn-Llama-2-13b-128k
70B	32K	NousResearch/Yarn-Llama-2-70b-32k

In addition, we also publish 8K context window versions of Llama 2 7B fine-tuned with NTK-aware and YaRN (Table 1 in the conference paper).

Mistral

With the release of v2 of our paper we are also publishing 64K and 128K variants of Mistral 7B v0.1.

Size	Context	Link
7B	64K	NousResearch/Yarn-Mistral-7b-64k
7B	128K	NousResearch/Yarn-Mistral-7b-128k

SOLAR

The SOLAR 10.7B v1.0 model utilizes depth-up scaling to add layers to Mistral 7B v0.1, which may potentially improve long context performance on a per-parameter basis. We publish 32K and 64K variants.

Size	Context	Link
10.7B	32K	NousResearch/Yarn-Solar-10b-32k
10.7B	64K	NousResearch/Yarn-Solar-10b-64k

Reproduction

We strongly believe in open science, and thus publish all code and data to reproduce the results in our paper. To reproduce, clone the repository and perform a local installation.

git clone https://github.com/jquesnelle/yarn
cd yarn
pip install -e .

Training

To train the models, run accelerate config and enable DeepSpeed acceleration. deepspeed/zero3.json was the configuration file used for training.

# ./train.sh

The tokenized training data is available on 🤗Hugging Face and was derived from the pg19 dataset. For the Mistral models, a mix of the pretrain and fine-tune splits of Long-Data-Collections was used and the tokenized dataset is also available on 🤗Hugging Face.

Evaluation

To reproduce the evaluations, install lm-evaluation-harness with pip install git+https://github.com/EleutherAI/lm-evaluation-harness and then run the two provided scripts.

# ./eval.sh
# ./eval-harness.sh

Citation

@inproceedings{
      peng2024yarn,
      title={Ya{RN}: Efficient Context Window Extension of Large Language Models},
      author={Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole},
      booktitle={The Twelfth International Conference on Learning Representations},
      year={2024},
      url={https://openreview.net/forum?id=wHBfxhZu1u}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
deepspeed		deepspeed
eval		eval
paper		paper
scaled_rope		scaled_rope
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval-harness.sh		eval-harness.sh
eval.sh		eval.sh
finetune.py		finetune.py
requirements.txt		requirements.txt
setup.py		setup.py
tokenization.py		tokenization.py
train.sh		train.sh
truncate.py		truncate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YaRN

Paper

Models

LLaMA

Mistral

SOLAR

Reproduction

Training

Evaluation

Citation

About

Releases

Packages

Contributors 4

Languages

License

jquesnelle/yarn

Folders and files

Latest commit

History

Repository files navigation

YaRN

Paper

Models

LLaMA

Mistral

SOLAR

Reproduction

Training

Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages