Skip to content

Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Spotlight

Notifications You must be signed in to change notification settings

joonkeekim/Instructive-Decoding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions

Open In Colab

This official repository contains the implementation for the research paper "Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions". We provide a tutorial in our Colab Notebook.

🚀 Accepted to ICLR 2024 Spotlight [Link]
🎉 Accepted to Instruction Workshop @ NeurIPS 2023 [Link]

Taehyeon Kim*, Joonkee Kim*, Gihun Lee*, Se-Young Yun
*: Equal Contribution

💡 Introduction

🎤 TL;DR: The paper presents "Instructive Decoding" (ID), a method enhancing instruction-following in language models by using "noisy instructions" to refine understanding and adherence to tasks. Tested across multiple models and tasks, ID consistently improves performance, especially in generalizing to new tasks, without needing extra training or parameter updates.

🤔 Getting Started

💻 Environmental setup

1. Create a Conda Environment:

Use Conda to create a new environment specifically for this project. This helps keep dependencies organized and avoids conflicts with other projects. Run the following commands in your terminal:

conda create -n id python=3.9
conda activate id

2. Install Required Packages:

Next, install all the necessary packages. We've listed all the required dependencies in requirements.txt. To install them, simply execute:

pip install -r requirements.txt

📑 Data preparation

1. Create the Directories:

Set up the directory structure for downloading and storing the datasets. Run these commands in your terminal:

mkdir -p data/downloads
mkdir -p data

2. SuperNatural Instruction Dataset:

Clone the SuperNatural Instruction dataset (Link) and organize it into the correct directory:

git clone https://github.com/allenai/natural-instructions.git data/downloads
mkdir -p data/supni
mv data/downloads/tasks data/downloads/splits data/supni/
rm -rf data/downloads/natural-instructions

3. MMLU Dataset:

Download and extract the MMLU dataset:

wget -O data/downloads/mmlu_data.tar https://people.eecs.berkeley.edu/~hendrycks/data.tar
mkdir -p data/mmlu
tar -xvf data/downloads/mmlu_data.tar -C data/mmlu
rm -rf data/downloads/mmlu_data.tar

Then, you will have a directory structure as follows 👇🏻👇🏻:

Instructive-Decoding
├── data
│   ├── supni
│   │   ├── splits
│   │   └── tasks
│   ├── mmlu
│   │   ├── test
│   │   └── ...
├── scripts
│   ├── run_sni.sh
│   ├── run_mmlu.sh
│   └── ...
├── src
│   ├── run_eval.py
│   ├── base_generator.py
│   └── ...
├── requirements.txt
└── ...

📝 How to Use

💻 Prepare the Pretrained Weights

We utilized various models in our paper. You can directly load these models from the Huggingface Hub or use specific weights as required. Here are the relevant links and information:

💻 Run Experiments

To customize and experiment with your own noisy instructions, modify the instructions in the inst_aware_batchify function within xxx_generator.py.

To reproduce our results, execute the following scripts in your terminal:

bash scripts/run_sni.sh
bash scripts/run_mmlu.sh

💻 Key Arguments Explained

  • noisy: This argument determines the decoding method to be used.

    • If this is set, the script employs Instructive Decoding, which involves the use of both the original and noisy instructions.
    • If this is not set, it executes Standard Decoding, using only the original instruction without any noisy variants.
  • neg_type: This specifies the type of noisy instruction to be used.

    • It allows you to choose from a range of predefined noisy instruction variants, each designed to test different aspects of the model's instruction-following capabilities.
  • eps: This is a crucial hyperparameter for Instructive Decoding. We recommend to use -0.3

    • It represents the balance factor between predictions that are guided by the original instruction and those influenced by the noisy instructions.
    • A higher value of eps gives more weight to the influence of noisy instructions, while a lower value leans more towards the original instruction.
  • is_decoder: This argument defines the architecture of the model in use.

    • If this is set, it indicates that the model is a decoder-only transformer model.
    • If this is not set, it suggests that the model uses an encoder-decoder architecture.

😁 Misc.

Feel free to cite us.

 @article{instructivedecoding,
  title={Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions},
  author={Kim, Taehyeon and Kim, Joonkee and Lee, Gihun and Yun, Se-Young},
  journal={arXiv preprint arXiv:2311.00233},
  year={2023}
}

About

Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Spotlight

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published