SELF-RAG: Learning to Retrieve, Generate and Critique through Self-reflection #778
Labels
AI-Chatbots
Topics related to advanced chatbot platforms integrating multiple AI models
code-generation
code generation models and tools like copilot and aider
llm-applications
Topics related to practical applications of Large Language Models in various fields
llm-experiments
experiments with large language models
New-Label
Choose this option if the existing labels are insufficient to describe the content accurately
Papers
Research papers
Research
personal research notes for a topic
SELF-RAG: Learning to Retrieve, Generate and Critique through Self-reflection
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection (ICLR 2024, Oral top 1%) by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
Website | 7B Model | 13B Model | Paper | Training data | Twitter summary | Updates
Self-RAG (Figure right) is a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the versatility of LLMs.
Unlike a widely-adopted Retrieval-Augmented Generation (RAG; Figure left) approach, Self-RAG retrieves on demand (e.g., can retrieve multiple times or completely skip retrieval) given diverse queries, and criticize its own generation from multiple fine-grained aspects by predicting reflection tokens as an integral part of generation.
We conduct a segment-wise beam search to select the output that maximizes the utility for diverse preferences.
If you find our code, data, models, or the paper useful, please cite the paper:
Updates
Content
Installation
Install dependent Python libraries by running the command below.
Please use the latest version of
vllm
, as the older version may not enable you to setskip_special_tokens
viaSamplingParam
, which is added by (this PR).You can also create a conda environment by running the command below.
Quick start
You can download Self-RAG from HuggingFace Hub. For inference, we recommend using vllm as it significantly speeds up inferences.
Output:
As you can see, Self-RAG starts generating responses without retrieval in the first query when it does not require retrieval. On the other hand, Self-RAG output
[Retrieve]
tokens for the second, as this question requires more fine-grained factual grounding.For queries that require factual grounding, you can insert a paragraph. Self-RAG can retrieve and insert paragraphs anytime while generating, and recognizes them as long as they are surrounded by context markup special tokens
<paragraph>
,</paragraph>
.Self-RAG finds the relevant inserted document and generates answers that are fully supported by the evidence.
Run your evaluation using the online retrieval model
You can also run retrieval on-demand and use it with Self-RAG. As running retrieval over full English Wikipedia requires large RAM and multiple GPUs, we created a subset of Wikipedia, including intro paragraphs of Wikipedia articles only for demo purposes.
First, please download the corpus and embeddings (9GB in total).
If the script does not work, you can download the data from google drive or HF dataset.
Then, you can run the script under
retrieval_lm
. We tested the script using on 1 RTX 6000 with 24GB and 100G RAM (but should be runnable with much smaller RAM).Output:
The retriever system properly retrieves necessary document and generate fully grounded output.
Note that this demo uses a smaller corpus and Self-RAG with the full inference algorithm. For a full evaluation, you either need to set up a retriever or download our retrieved results. Please follow instructions at Inference.
Retriever Setup
By default, we use Contriever as our retrieval component.
Download data
Download preprocessed passage data used in DPR.
Then, download the generated passages. We use Contriever-MSMARCO
Run retriever
You can run passage retrieval by running the command below.
Your input file should be either a
json
orjsonl
. Each instance must contain eitherquestion
orinstruction
, which will be used as a query during retrieval.Generate embeddings for your own data
You can generate embeddings for your own data by running the following command. (The script is adapted from the Contriever repository.) Note that generating embeddings from a large-scale corpus (>10M docs) can take time, and we recommend running it on multiple GPUs.
Training
Self-RAG trains two models, Critic and Generator, both of which expand token vocabularies with reflection tokens and are trained with the standard next token prediction objective.
Alternatively, you can download our training data consisting of 150K instances here.
Collect reflection tokens
We collect training data from GPT-4. The scripts to call GPT-4 for each special token type are available at data_creation/critic.
Alternatively, you can download our training data at here.
Critic training
Once you create or download training data, run the command below to fine-tune Llama2-7B on critic training.
Generator Data Creation
The code to create Generator training data is under generator_data_creation. See the instructions at README.md.
Alternatively, you can download our training data at HuggingFace dataset or here
Generator Training
For generator training, we use DeepSpeed to make training more efficient. You can run training by running the script below, after setting the training data path.
For 13B model training, use
training_13b
. We use 8 A100 with 40 GRAM for 7B model training and 4 a100 with 80 GB GRAM for 13B training. 7B should fit 1-2 A100 although training can be slow.Inference
For the task evaluation conducted in the paper, please download the data here.
Each file already comes with retrieved documents, so if you don't want to run a retriever as a part of inference, you can simply load the retrieved docs at
contexts
.Below, we describe Self-RAG and baselines.
Short-form (PubHealth, ARC-Challenge, TriviaQA, PopQA)
As we typically retrieve only once for a short-form generation task, we provide an easy-to-run evaluation script that leverages pre-given documents retrieved by Contriever offline. See the individual command below.
Question Answering
The text was updated successfully, but these errors were encountered: