This curriculum traces the development of artificial intelligence and large language models (LLMs) from the early days of AI research to the emergence of GPT-4 and beyond. It highlights pivotal papers and breakthroughs that have shaped the field, focusing on developments relevant to LLMs.
I am also tracing my progress through this curriculum, to keep myself accountable. My current progress is tagged with: (I am currently here).
-
1943 – McCulloch and Pitts Model
- A Logical Calculus of the Ideas Immanent in Nervous Activity
- Authors: Warren McCulloch and Walter Pitts
- Introduced the concept of an artificial neuron, laying the foundation for later neural network models.
- [Youtube video explaining the paper in simple terms]
-
1950 – Turing Test
- Computing Machinery and Intelligence
- Author: Alan Turing
- Introduced the concept of the Turing Test, an early philosophical framework for evaluating machine intelligence.
-
1956 – Dartmouth Conference (Birth of AI)
- A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence
- Contributors: John McCarthy, Marvin Minsky, Claude Shannon, and others
- Formalized AI as a field, proposing the study of "machines that can simulate human intelligence."
- (I am currently here).
-
1958 – Perceptron by Frank Rosenblatt
- The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain
- Author: Frank Rosenblatt
- Introduced the Perceptron, one of the first trainable neural networks.
-
1960 – Hebbian Learning (Influence on Neural Networks)
- The Organization of Behavior
- Author: Donald Hebb
- Proposed Hebbian learning, introducing the principle "cells that fire together wire together."
-
1969 – Minsky and Papert Critique of Perceptrons
- Perceptrons
- Authors: Marvin Minsky and Seymour Papert
- Highlighted the limitations of the perceptron, leading to a decline in interest in neural networks.
-
1974 – Backpropagation Algorithm (Paul Werbos)
- Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences
- Author: Paul Werbos
- Introduced the backpropagation algorithm in his Ph.D. thesis.
-
1980 – Neocognitron (Precursor to CNNs)
- Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position
- Author: Kunihiko Fukushima
- Developed an early convolutional neural network (CNN) that inspired later advancements in deep learning.
-
1986 – Backpropagation Popularized
- Learning Representations by Back-propagating Errors
- Authors: David E. Rumelhart, Geoffrey Hinton, Ronald J. Williams
- Popularized backpropagation, making it practical for training multilayer neural networks.
-
1989 – Hidden Markov Models (HMMs)
- A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
- Author: Lawrence Rabiner
- Provided a foundational understanding of HMMs, critical for early speech recognition and sequence modeling.
-
1990s – Emergence of Statistical NLP
- The shift from rule-based systems to statistical approaches in NLP, utilizing n-gram models and probabilistic methods for tasks like part-of-speech tagging and machine translation.
-
1993 – IBM Model 1 for Statistical Machine Translation
- The Mathematics of Statistical Machine Translation
- Authors: Peter F. Brown et al.
- Laid the foundation for modern translation systems by modeling word alignment between languages.
-
1993 – Class-Based n-gram Models
- Class-Based n-gram Models of Natural Language
- Authors: Peter F. Brown et al.
- Introduced class-based n-gram models, an early statistical approach to language modeling.
-
1997 – Long Short-Term Memory (LSTM)
- Long Short-Term Memory
- Authors: Sepp Hochreiter and Jürgen Schmidhuber
- Introduced the LSTM architecture, addressing the vanishing gradient problem in RNNs.
-
1998 – LeNet and Convolutional Neural Networks (CNNs)
- Gradient-Based Learning Applied to Document Recognition
- Author: Yann LeCun et al.
- Developed LeNet, one of the first successful CNN architectures, used for handwritten digit recognition.
-
2003 – Neural Probabilistic Language Model
- A Neural Probabilistic Language Model
- Authors: Yoshua Bengio et al.
- Introduced word embeddings and neural networks for language modeling.
-
2012 – AlexNet and the Deep Learning Boom
- ImageNet Classification with Deep Convolutional Neural Networks
- Authors: Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
- Marked the success of deep learning in image recognition, reigniting interest in neural networks.
-
2013 – Word2Vec (Efficient Word Representations)
- Efficient Estimation of Word Representations in Vector Space
- Author: Tomas Mikolov et al.
- Introduced Word2Vec, learning continuous vector representations of words.
-
2014 – Sequence to Sequence (Seq2Seq) Models
- Sequence to Sequence Learning with Neural Networks
- Authors: Ilya Sutskever, Oriol Vinyals, Quoc V. Le
- Introduced the Seq2Seq architecture using LSTMs, enabling machine translation tasks.
-
2014 – Gated Recurrent Units (GRUs)
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
- Authors: Kyunghyun Cho et al.
- Introduced GRUs as a simpler alternative to LSTMs for sequence modeling.
-
2014 – Adam Optimizer
- Adam: A Method for Stochastic Optimization
- Authors: Diederik P. Kingma, Jimmy Ba
- Presented the Adam optimizer, widely used in training deep neural networks.
-
2015 – Attention Mechanism in Neural Networks
- Neural Machine Translation by Jointly Learning to Align and Translate
- Authors: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
- Introduced the attention mechanism, greatly improving machine translation.
-
2017 – ELMo (Embeddings from Language Models)
- Deep Contextualized Word Representations
- Authors: Matthew Peters et al.
- Provided contextualized word embeddings by modeling words in the context of entire sentences.
-
2018 – ULMFiT (Universal Language Model Fine-tuning)
- Universal Language Model Fine-tuning for Text Classification
- Authors: Jeremy Howard, Sebastian Ruder
- Demonstrated the effectiveness of pre-training a language model and fine-tuning it for specific tasks.
-
2017 – Transformer Model (Self-Attention)
- Attention is All You Need
- Authors: Ashish Vaswani et al.
- Introduced the Transformer model, replacing recurrence with self-attention.
-
2018 – GPT (Generative Pretrained Transformer)
- Improving Language Understanding by Generative Pre-Training
- Authors: Alec Radford et al.
- Introduced the first GPT model, using unsupervised pre-training followed by supervised fine-tuning.
-
2018 – BERT (Bidirectional Transformers)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Authors: Jacob Devlin et al.
- Introduced BERT, pre-training on masked language modeling tasks.
-
2019 – Transformer-XL (Handling Longer Contexts)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
- Authors: Zihang Dai et al.
- Extended the Transformer to capture long-term dependencies via recurrence mechanisms.
-
2019 – XLNet (Permutation-Based Pre-training)
- XLNet: Generalized Autoregressive Pretraining for Language Understanding
- Authors: Zhilin Yang et al.
- Proposed a permutation-based training objective overcoming BERT's limitations.
-
2019 – RoBERTa (Robustly Optimized BERT)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- Authors: Yinhan Liu et al.
- Showed BERT's performance could be improved by training longer on more data.
-
2019 – T5 (Text-to-Text Transfer Transformer)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Authors: Colin Raffel et al.
- Unified NLP tasks under a text-to-text format, demonstrating effective transfer learning.
-
2019 – GPT-2 (OpenAI’s Transformer-based Model)
- Language Models are Unsupervised Multitask Learners
- Authors: Alec Radford et al.
- Scaled up the Transformer architecture with 1.5 billion parameters.
-
2020 – GPT-3 (Few-Shot Learning at Scale)
- Language Models are Few-Shot Learners
- Authors: Tom B. Brown et al.
- Introduced GPT-3, a 175-billion-parameter model demonstrating impressive few-shot learning.
-
2020 – Electra (Efficient Pre-training)
- Electra: Pre-training Text Encoders as Discriminators Rather Than Generators
- Authors: Kevin Clark et al.
- Presented a more sample-efficient pre-training method improving upon BERT.
-
2020 – Reformer (Efficient Transformers)
- Reformer: The Efficient Transformer
- Authors: Nikita Kitaev et al.
- Introduced techniques to reduce memory footprint and computational cost in Transformers.
-
2021 – Scaling Laws for Neural Language Models
- Scaling Laws for Neural Language Models
- Authors: Jared Kaplan et al.
- Demonstrated that performance improvements follow predictable scaling laws.
-
2021 – Switch Transformer (Sparse Mixture-of-Experts)
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Authors: William Fedus et al.
- Proposed a Mixture-of-Experts model allowing scaling to trillions of parameters efficiently.
-
2021 – Megatron-Turing NLG 530B
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B
- Authors: Mohammad Shoeybi et al.
- Detailed training one of the largest dense LLMs, contributing insights into large-scale training.
-
2021 – Codex and Code Generation
- Evaluating Large Language Models Trained on Code
- Authors: Mark Chen et al.
- Introduced Codex, an LLM fine-tuned on source code, enabling applications like GitHub Copilot.
-
2022 – Chain-of-Thought Prompting
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Authors: Jason Wei et al.
- Showed that prompting LLMs to produce intermediate reasoning steps improves performance on complex tasks.
-
2022 – Chinchilla Scaling Laws
- Training Compute-Optimal Large Language Models
- Authors: Jordan Hoffmann et al.
- Presented evidence that LLM performance is a function of both model size and training data.
-
2022 – PaLM (Pathways Language Model)
- PaLM: Scaling Language Modeling with Pathways
- Authors: Aakanksha Chowdhery et al.
- Introduced a 540-billion-parameter model demonstrating strong performance on reasoning and code tasks.
-
2022 – GLAM (Mixture-of-Experts)
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Authors: Nan Du et al.
- Presented a generalist language model using MoE to achieve strong performance with reduced computation.
-
2022 – BLOOM (Open-Access Multilingual Model)
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Authors: BigScience Workshop
- Introduced an open-source LLM supporting 46 languages, promoting transparency and inclusivity.
-
2022 – Emergent Abilities of Large Language Models
- Emergent Abilities of Large Language Models
- Authors: Jason Wei et al.
- Explored emergent behaviors arising in large models like GPT-3 and GPT-4 as a result of scale.
-
2022 – Instruction Tuning and RLHF (Human Feedback)
- Training Language Models to Follow Instructions with Human Feedback
- Authors: Long Ouyang et al.
- Deep reinforcement learning from human preferences
- Authors: Amodei et al.
- Detailed how models were fine-tuned using reinforcement learning from human feedback.
-
2023 – GPT-4 (Multimodal Capabilities)
- GPT-4 Technical Report
- Authors: OpenAI
- Described GPT-4, a large-scale, multimodal model capable of processing both text and images.
-
2023 – Sparks of AGI in GPT-4 (Microsoft Research)
- Sparks of Artificial General Intelligence: Early experiments with GPT-4
- Authors: Microsoft Research
- Explored potential AGI-like behaviors in GPT-4.
-
2023 – Toolformer: Language Models Using Tools
- Toolformer: Language Models Can Teach Themselves to Use Tools
- Authors: Timo Schick et al.
- Presented a method where LLMs decide when and how to use external tools to improve performance.
-
2023 – ChatGPT and Instruction Following
- Organization: OpenAI
- Demonstrated the effectiveness of fine-tuning LLMs with RLHF to follow instructions and engage in natural dialogues.
-
2023 – Self-Consistency in Chain-of-Thought
- Self-Consistency Improves Chain-of-Thought Reasoning in Language Models
- Authors: Xuezhi Wang et al.
- Improved reasoning by sampling multiple reasoning paths and choosing the most consistent answer.
-
2016 – Concrete Problems in AI Safety
- Concrete Problems in AI Safety
- Authors: Dario Amodei et al.
- Outlined key challenges in ensuring AI systems operate safely and align with human values.
-
2018 – Gender Shades (Bias in AI Systems)
- Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
- Authors: Joy Buolamwini, Timnit Gebru
- Highlighted biases in AI systems, emphasizing the need for fairness and ethics.
-
2020 – Ethical and Social Implications of AI
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
- Authors: Emily M. Bender et al.
- Discussed risks associated with large language models, including environmental impact and bias.
-
2022 – AI Alignment and Interpretability
- Ongoing research into understanding and interpreting the decision-making processes of LLMs, aiming to align AI outputs with human values.
-
2024 - Frugal Transformer: Efficient Training at Scale
- Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors
- Authors: AC et al.
- Link prediction models can benefit from incorporating textual descriptions of entities and relations.
-
2024 - AI on the Edge
- Mobile Edge Intelligence for Large Language Models
- Authors: Kaibin et al.
- Detailed advancements in deploying LLMs on edge devices, opening new possibilities for privacy-focused AI applications.
-
2024 - Federated GPT
- Towards Building the Federated GPT
- Authors: Zhang et al.
- Expanded federated learning into GPT models, enabling privacy-preserving training across distributed networks.
-
2023 – Generative Agents and Interactive AI Systems
- Generative Agents: Interactive Simulacra of Human Behavior
- Authors: Park et al.
- Introduced generative agents capable of interacting autonomously in complex environments.
-
2023 – Memory-Augmented Models
- MemGPT: Towards LLMs as Operating Systems
- Explored integrating memory mechanisms into LLMs for better handling of long-term dependencies.
-
2023 – OpenAI Function Calling and Plugins
- Function Calling and Other API Updates
- Organization: OpenAI
- Introduced structured data output and plugin systems for LLMs to interact with external tools and APIs.
-
2023 – Sparse Expert Models
- Research into sparse models like Mixture-of-Experts that scale efficiently by activating relevant parts of the network.
-
2023 – Scaling Instruction Tuning
- Finetuned Language Models Are Zero-Shot Learners
- Authors: Jason Wei et al.
- Demonstrated that instruction tuning on a mixture of tasks improves zero-shot performance.
-
2023 – Advances in Multimodal Learning
- Integration of text, image, audio, and video data in unified models, expanding LLM capabilities.
- Multimodal Models and Unified AI Systems: Development of models like OpenAI's DALL·E and CLIP, integrating multiple modalities.
- Tool-Using AI and Autonomous Interaction: Enabling models to interact with external tools autonomously, enhancing practical capabilities.
- Memory-Augmented Models and Retrieval-Augmented Generation (RAG): Combining LLMs with dynamic access to knowledge bases, allowing real-time information retrieval.
- Self-Supervised Learning and Unsupervised Learning Improvements: Making self-supervised learning more efficient from unstructured data sources.
- Continuous and Lifelong Learning: AI systems that continuously learn from new data without retraining from scratch, preventing catastrophic forgetting.
- AI Safety, Alignment, and Ethics: Ensuring AI aligns with human values, with research into RLHF and reducing harmful behaviors.
- Federated Learning and Decentralized AI: Training AI models across distributed datasets without centralizing data, preserving privacy.
- Sparsity and Efficient AI Models: Techniques like Sparse Transformers and MoE for computational efficiency, enabling scaling to trillions of parameters.