Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment. Maintaining their up-to-date status is a pressing concern in the current era. How can we refresh LLMs to align with the ever-changing world knowledge without expensive retraining from scratch?
An LLM after training is static and can be quickly outdated. For example, ChatGPT has a knowledge
cutoff date of September 2021. Without web browsing, it does not know the latest information ever since.
- [2023-10] Our survey paper is now available on Arxiv: How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances.
- [2023-10] Our survey paper: "How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances" has been accepted by EMNLP 2023! We will release the camera-ready version soon.
- [2023-10] We create this repository to maintain a paper list on refreshing LLMs without retraining.
- π’ News
- π Table of Contents
- π Papers
- π» Resources
- π© Citation
- π Acknowledgement & Contribution
To refresh LLMs to align with the ever-changing world knowledge without retraining, we roughly categorize existing methods into Implicit and Explicit approaches. Implicit means the approaches seek to directly alter the knowledge stored in LLMs, such as parameters or weights, while Explicit means more often incorporating external resources to override internal knowledge, such as augmenting a search engine.
Please see our paper for more details.
Taxonomy of methods to align LLMs with the ever-changing world knowledge.
A high-level comparison of different approaches.
Knowledge editing (KE) is an arising and promising research area that aims to alter the parameters of some specific knowledge stored in pre-trained models so that the model can make new predictions on those revised instances while keeping other irrelevant knowledge unchanged. We categorize existing methods into meta-learning, hypernetwork, and locate-and-edit -based methods.
Year | Venue | Paper | Link |
---|---|---|---|
2023 | Arxiv | RECKONING: Reasoning through Dynamic Knowledge Encoding | |
2020 | ICLR | Editable Neural Networks |
Continual learning (CL) aims to enable a model to learn from a continuous data stream across time while reducing catastrophic forgetting of previously acquired knowledge. With CL, a deployed LLM has the potential to adapt to the changing world without costly re-training from scratch. Below papers employ CL for aligning language models with the current world knowledge, including Continual Pre-training and Continual Knowledge Editing.
Pairing a static LLM with a growing non-parametric memory enables it to capture information beyond its memorized knowledge during inference. The external memory can store a recent corpus or feedback that contains new information to guide the model generation.
Leveraging an off-the-shelf retriever and the in-context learning ability of LLMs, this line of work designs better retrieval strategies to incorporate world knowledge into a fixed LLM through prompting, which can be divided into single-stage and multi-stage.
Single-Stage (left) typically retrieves once, while Multi-Stage (right) involves multiple retrievals or revisions to solve complex questions
A recent trend uses the whole web as the knowledge source and equips LLMs with the Internet to support real-time information seeking.
- Augmented Language Models: a Survey, 2023
- The Life Cycle of Knowledge in Big Language Models: A Survey, 2023
- Interactive Natural Language Processing, 2023
- Editing Large Language Models: Problems, Methods, and Opportunities, 2023
- Tool Learning with Foundation Models, 2023
- Unifying Large Language Models and Knowledge Graphs: A Roadmap, 2023
- A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning, 2023
- Large Language Models for Information Retrieval: A Survey
- A Review on Language Models as Knowledge Bases, 2022
- A Survey of Knowledge-enhanced Text Generation, 2022
- A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models, 2022
- A Survey on Knowledge-Enhanced Pre-trained Language Models, 2022
- Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering, 2021
- Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey, 2021
- LangChain: a framework for developing applications powered by language models.
- ChatGPT plugins: designed specifically for language models with safety as a core principle, and help ChatGPT access up-to-date information, run computations, or use third-party services.
- EasyEdit: an Easy-to-use Knowledge Editing Framework for LLMs.
- FastEdit: injecting fresh and customized knowledge into large language models efficiently using one single command.
- PyContinual: an Easy and Extendible Framework for Continual Learning.
- Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
If our research helps you, please kindly cite our paper.
@article{zhang2023large,
title={How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances},
author={Zhang, Zihan and Fang, Meng and Chen, Ling and Namazi-Rad, Mohammad-Reza and Wang, Jun},
journal={arXiv preprint arXiv:2310.07343},
year={2023}
}
This field is evolving very fast, and we may miss important works. Please don't hesitate to share your work. Pull requests are always welcome if you spot anything wrong (e.g., broken links, typos, etc.) or share new papers! We thank all contributors for their valuable efforts.