An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Updates

(2024.09.26) Our Paper have been accepted by NeurIPS 2024🔥🔥.
(2024.06.11) Paper Release on Arxiv.

🚀 Overview

We propose Continuity-Relativity indExing with gAussian Middle (CREAM), which interpolates positional encodings by manipulating position indices.

Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K).

To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the “Lost-in-the-Middle” problem faced by long-context LLMs.

Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of Llama2-7B with “Never Miss A Beat”.

⚙️ Installation

# clone project
git clone git@github.com:wutong4012/CREAM.git
cd CREAM

# create conda environment
conda create -n cream python=3.9
conda activate cream

# install requirements
pip install -r requirements.txt
conda install -c nvidia cuda-nvcc
pip install flash_attn-2.5.7+cu122torch2.2cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

# replace lm-evaluation-harness
git clone https://github.com/EleutherAI/lm-evaluation-harness.git
"replace lm_eval folder"

💡 How to run

You can download all the finetune data and evaluation data from pile_4k_train, pile_val, ShareGPT_4k_train, ShareGPT_val, gov_report, proof-pile, book3, pg19_long, LongChat-Lines, Needle in a Haystack, LongBench

Attention: You have to modify the "root" path in every file in the scripts folder.

Train model

bash scripts/run_CREAM.sh 8 linear llama2 5946 CREAM

bash scripts/run_CREAM_chat.sh 8 linear llama2_chat 5946 CREAM

Evaluate model

bash scripts/eval_longchat_lines.sh 8 linear llama2 CREAM 1000

bash scripts/eval_lost_in_the_middle.sh 8 linear llama2 CREAM 1000

bash scripts/eval_needle.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_longbench.sh 8 linear llama2_chat CREAM 100

bash scripts/eval_ppl.sh 8 linear llama2 CREAM 1000

bash scripts/eval_long_ppl.sh 64 linear llama2 CREAM 1000

bash scripts/eval_benchmark.sh 8 linear llama2 CREAM 1000

⚽ Evaluation Results

LongChat-Lines

Lost in the Middle

Needle in a Haystack

LongBench

Acknowledgement

Data / Code:

📜 Citation

Please cite our paper if you use CREAM in your work:

@inproceedings{wu2024cream,
    title={An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding},
    author={Wu, Tong and Zhao, Yanpeng and Zheng, Zilong},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
    volume = {37},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
img		img
lm-evaluation-harness/lm_eval		lm-evaluation-harness/lm_eval
scripts		scripts
src		src
README.md		README.md
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Updates

🚀 Overview

⚙️ Installation

💡 How to run

⚽ Evaluation Results

Acknowledgement

📜 Citation

About

Releases

Packages

Contributors 2

Languages

bigai-nlco/CREAM

Folders and files

Latest commit

History

Repository files navigation

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Updates

🚀 Overview

⚙️ Installation

💡 How to run

⚽ Evaluation Results

Acknowledgement

📜 Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages