GitHub - CristiVlad25/ai-papers: Tracing the evolution of AI and large language models from early neural networks to GPT-4 and beyond.

The AI Papers (Breakthroughs Leading to GPT-4 and Beyond)

This curriculum traces the development of artificial intelligence and large language models (LLMs) from the early days of AI research to the emergence of GPT-4 and beyond. It highlights pivotal papers and breakthroughs that have shaped the field, focusing on developments relevant to LLMs.

I am also tracing my progress through this curriculum, to keep myself accountable. My current progress is tagged with: (I am currently here).

1. Early AI and Symbolic Systems (1940s-1980s)

1943 – McCulloch and Pitts Model
- A Logical Calculus of the Ideas Immanent in Nervous Activity
- Authors: Warren McCulloch and Walter Pitts
- Introduced the concept of an artificial neuron, laying the foundation for later neural network models.
- [Youtube video explaining the paper in simple terms]
1950 – Turing Test
- Computing Machinery and Intelligence
- Author: Alan Turing
- Introduced the concept of the Turing Test, an early philosophical framework for evaluating machine intelligence.
1956 – Dartmouth Conference (Birth of AI)
- A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence
- Contributors: John McCarthy, Marvin Minsky, Claude Shannon, and others
- Formalized AI as a field, proposing the study of "machines that can simulate human intelligence."
- (I am currently here).
1958 – Perceptron by Frank Rosenblatt
- The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain
- Author: Frank Rosenblatt
- Introduced the Perceptron, one of the first trainable neural networks.
1960 – Hebbian Learning (Influence on Neural Networks)
- The Organization of Behavior
- Author: Donald Hebb
- Proposed Hebbian learning, introducing the principle "cells that fire together wire together."
1969 – Minsky and Papert Critique of Perceptrons
- Perceptrons
- Authors: Marvin Minsky and Seymour Papert
- Highlighted the limitations of the perceptron, leading to a decline in interest in neural networks.
1974 – Backpropagation Algorithm (Paul Werbos)
- Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences
- Author: Paul Werbos
- Introduced the backpropagation algorithm in his Ph.D. thesis.
1980 – Neocognitron (Precursor to CNNs)
- Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position
- Author: Kunihiko Fukushima
- Developed an early convolutional neural network (CNN) that inspired later advancements in deep learning.
1986 – Backpropagation Popularized
- Learning Representations by Back-propagating Errors
- Authors: David E. Rumelhart, Geoffrey Hinton, Ronald J. Williams
- Popularized backpropagation, making it practical for training multilayer neural networks.
1989 – Hidden Markov Models (HMMs)
- A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
- Author: Lawrence Rabiner
- Provided a foundational understanding of HMMs, critical for early speech recognition and sequence modeling.

2. Shift to Statistical NLP and Early Machine Learning (1990s-2000s)

1990s – Emergence of Statistical NLP
- The shift from rule-based systems to statistical approaches in NLP, utilizing n-gram models and probabilistic methods for tasks like part-of-speech tagging and machine translation.
1993 – IBM Model 1 for Statistical Machine Translation
- The Mathematics of Statistical Machine Translation
- Authors: Peter F. Brown et al.
- Laid the foundation for modern translation systems by modeling word alignment between languages.
1993 – Class-Based n-gram Models
- Class-Based n-gram Models of Natural Language
- Authors: Peter F. Brown et al.
- Introduced class-based n-gram models, an early statistical approach to language modeling.
1997 – Long Short-Term Memory (LSTM)
- Long Short-Term Memory
- Authors: Sepp Hochreiter and Jürgen Schmidhuber
- Introduced the LSTM architecture, addressing the vanishing gradient problem in RNNs.
1998 – LeNet and Convolutional Neural Networks (CNNs)
- Gradient-Based Learning Applied to Document Recognition
- Author: Yann LeCun et al.
- Developed LeNet, one of the first successful CNN architectures, used for handwritten digit recognition.
2003 – Neural Probabilistic Language Model
- A Neural Probabilistic Language Model
- Authors: Yoshua Bengio et al.
- Introduced word embeddings and neural networks for language modeling.

3. Deep Learning Breakthroughs and Seq2Seq Models (2010s)

2012 – AlexNet and the Deep Learning Boom
- ImageNet Classification with Deep Convolutional Neural Networks
- Authors: Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
- Marked the success of deep learning in image recognition, reigniting interest in neural networks.
2013 – Word2Vec (Efficient Word Representations)
- Efficient Estimation of Word Representations in Vector Space
- Author: Tomas Mikolov et al.
- Introduced Word2Vec, learning continuous vector representations of words.
2014 – Sequence to Sequence (Seq2Seq) Models
- Sequence to Sequence Learning with Neural Networks
- Authors: Ilya Sutskever, Oriol Vinyals, Quoc V. Le
- Introduced the Seq2Seq architecture using LSTMs, enabling machine translation tasks.
2014 – Gated Recurrent Units (GRUs)
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
- Authors: Kyunghyun Cho et al.
- Introduced GRUs as a simpler alternative to LSTMs for sequence modeling.
2014 – Adam Optimizer
- Adam: A Method for Stochastic Optimization
- Authors: Diederik P. Kingma, Jimmy Ba
- Presented the Adam optimizer, widely used in training deep neural networks.
2015 – Attention Mechanism in Neural Networks
- Neural Machine Translation by Jointly Learning to Align and Translate
- Authors: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
- Introduced the attention mechanism, greatly improving machine translation.
2017 – ELMo (Embeddings from Language Models)
- Deep Contextualized Word Representations
- Authors: Matthew Peters et al.
- Provided contextualized word embeddings by modeling words in the context of entire sentences.
2018 – ULMFiT (Universal Language Model Fine-tuning)
- Universal Language Model Fine-tuning for Text Classification
- Authors: Jeremy Howard, Sebastian Ruder
- Demonstrated the effectiveness of pre-training a language model and fine-tuning it for specific tasks.

4. Transformer Revolution and Modern NLP (2017-Present)

2017 – Transformer Model (Self-Attention)
- Attention is All You Need
- Authors: Ashish Vaswani et al.
- Introduced the Transformer model, replacing recurrence with self-attention.
2018 – GPT (Generative Pretrained Transformer)
- Improving Language Understanding by Generative Pre-Training
- Authors: Alec Radford et al.
- Introduced the first GPT model, using unsupervised pre-training followed by supervised fine-tuning.
2018 – BERT (Bidirectional Transformers)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Authors: Jacob Devlin et al.
- Introduced BERT, pre-training on masked language modeling tasks.
2019 – Transformer-XL (Handling Longer Contexts)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
- Authors: Zihang Dai et al.
- Extended the Transformer to capture long-term dependencies via recurrence mechanisms.
2019 – XLNet (Permutation-Based Pre-training)
- XLNet: Generalized Autoregressive Pretraining for Language Understanding
- Authors: Zhilin Yang et al.
- Proposed a permutation-based training objective overcoming BERT's limitations.
2019 – RoBERTa (Robustly Optimized BERT)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- Authors: Yinhan Liu et al.
- Showed BERT's performance could be improved by training longer on more data.
2019 – T5 (Text-to-Text Transfer Transformer)
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Authors: Colin Raffel et al.
- Unified NLP tasks under a text-to-text format, demonstrating effective transfer learning.
2019 – GPT-2 (OpenAI’s Transformer-based Model)
- Language Models are Unsupervised Multitask Learners
- Authors: Alec Radford et al.
- Scaled up the Transformer architecture with 1.5 billion parameters.

5. Scaling Laws, Emergent Abilities, and GPT-4 (2020-Present)

2020 – GPT-3 (Few-Shot Learning at Scale)
- Language Models are Few-Shot Learners
- Authors: Tom B. Brown et al.
- Introduced GPT-3, a 175-billion-parameter model demonstrating impressive few-shot learning.
2020 – Electra (Efficient Pre-training)
- Electra: Pre-training Text Encoders as Discriminators Rather Than Generators
- Authors: Kevin Clark et al.
- Presented a more sample-efficient pre-training method improving upon BERT.
2020 – Reformer (Efficient Transformers)
- Reformer: The Efficient Transformer
- Authors: Nikita Kitaev et al.
- Introduced techniques to reduce memory footprint and computational cost in Transformers.
2021 – Scaling Laws for Neural Language Models
- Scaling Laws for Neural Language Models
- Authors: Jared Kaplan et al.
- Demonstrated that performance improvements follow predictable scaling laws.
2021 – Switch Transformer (Sparse Mixture-of-Experts)
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- Authors: William Fedus et al.
- Proposed a Mixture-of-Experts model allowing scaling to trillions of parameters efficiently.
2021 – Megatron-Turing NLG 530B
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B
- Authors: Mohammad Shoeybi et al.
- Detailed training one of the largest dense LLMs, contributing insights into large-scale training.
2021 – Codex and Code Generation
- Evaluating Large Language Models Trained on Code
- Authors: Mark Chen et al.
- Introduced Codex, an LLM fine-tuned on source code, enabling applications like GitHub Copilot.
2022 – Chain-of-Thought Prompting
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Authors: Jason Wei et al.
- Showed that prompting LLMs to produce intermediate reasoning steps improves performance on complex tasks.
2022 – Chinchilla Scaling Laws
- Training Compute-Optimal Large Language Models
- Authors: Jordan Hoffmann et al.
- Presented evidence that LLM performance is a function of both model size and training data.
2022 – PaLM (Pathways Language Model)
- PaLM: Scaling Language Modeling with Pathways
- Authors: Aakanksha Chowdhery et al.
- Introduced a 540-billion-parameter model demonstrating strong performance on reasoning and code tasks.
2022 – GLAM (Mixture-of-Experts)
- GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
- Authors: Nan Du et al.
- Presented a generalist language model using MoE to achieve strong performance with reduced computation.
2022 – BLOOM (Open-Access Multilingual Model)
- BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
- Authors: BigScience Workshop
- Introduced an open-source LLM supporting 46 languages, promoting transparency and inclusivity.
2022 – Emergent Abilities of Large Language Models
- Emergent Abilities of Large Language Models
- Authors: Jason Wei et al.
- Explored emergent behaviors arising in large models like GPT-3 and GPT-4 as a result of scale.
2022 – Instruction Tuning and RLHF (Human Feedback)
- Training Language Models to Follow Instructions with Human Feedback
- Authors: Long Ouyang et al.
- Deep reinforcement learning from human preferences
- Authors: Amodei et al.
- Detailed how models were fine-tuned using reinforcement learning from human feedback.
2023 – GPT-4 (Multimodal Capabilities)
- GPT-4 Technical Report
- Authors: OpenAI
- Described GPT-4, a large-scale, multimodal model capable of processing both text and images.
2023 – Sparks of AGI in GPT-4 (Microsoft Research)
- Sparks of Artificial General Intelligence: Early experiments with GPT-4
- Authors: Microsoft Research
- Explored potential AGI-like behaviors in GPT-4.
2023 – Toolformer: Language Models Using Tools
- Toolformer: Language Models Can Teach Themselves to Use Tools
- Authors: Timo Schick et al.
- Presented a method where LLMs decide when and how to use external tools to improve performance.
2023 – ChatGPT and Instruction Following
- Organization: OpenAI
- Demonstrated the effectiveness of fine-tuning LLMs with RLHF to follow instructions and engage in natural dialogues.
2023 – Self-Consistency in Chain-of-Thought
- Self-Consistency Improves Chain-of-Thought Reasoning in Language Models
- Authors: Xuezhi Wang et al.
- Improved reasoning by sampling multiple reasoning paths and choosing the most consistent answer.

6. Ethics, Alignment, and Safety in AI

2016 – Concrete Problems in AI Safety
- Concrete Problems in AI Safety
- Authors: Dario Amodei et al.
- Outlined key challenges in ensuring AI systems operate safely and align with human values.
2018 – Gender Shades (Bias in AI Systems)
- Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification
- Authors: Joy Buolamwini, Timnit Gebru
- Highlighted biases in AI systems, emphasizing the need for fairness and ethics.
2020 – Ethical and Social Implications of AI
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
- Authors: Emily M. Bender et al.
- Discussed risks associated with large language models, including environmental impact and bias.
2022 – AI Alignment and Interpretability
- Ongoing research into understanding and interpreting the decision-making processes of LLMs, aiming to align AI outputs with human values.

7. Emerging and Future Directions (2023 and Beyond)

2024 - Frugal Transformer: Efficient Training at Scale
- Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors
- Authors: AC et al.
- Link prediction models can benefit from incorporating textual descriptions of entities and relations.
2024 - AI on the Edge
- Mobile Edge Intelligence for Large Language Models
- Authors: Kaibin et al.
- Detailed advancements in deploying LLMs on edge devices, opening new possibilities for privacy-focused AI applications.
2024 - Federated GPT
- Towards Building the Federated GPT
- Authors: Zhang et al.
- Expanded federated learning into GPT models, enabling privacy-preserving training across distributed networks.
2023 – Generative Agents and Interactive AI Systems
- Generative Agents: Interactive Simulacra of Human Behavior
- Authors: Park et al.
- Introduced generative agents capable of interacting autonomously in complex environments.
2023 – Memory-Augmented Models
- MemGPT: Towards LLMs as Operating Systems
- Explored integrating memory mechanisms into LLMs for better handling of long-term dependencies.
2023 – OpenAI Function Calling and Plugins
- Function Calling and Other API Updates
- Organization: OpenAI
- Introduced structured data output and plugin systems for LLMs to interact with external tools and APIs.
2023 – Sparse Expert Models
- Research into sparse models like Mixture-of-Experts that scale efficiently by activating relevant parts of the network.
2023 – Scaling Instruction Tuning
- Finetuned Language Models Are Zero-Shot Learners
- Authors: Jason Wei et al.
- Demonstrated that instruction tuning on a mixture of tasks improves zero-shot performance.
2023 – Advances in Multimodal Learning
- Integration of text, image, audio, and video data in unified models, expanding LLM capabilities.

Additional Emerging Areas:

Multimodal Models and Unified AI Systems: Development of models like OpenAI's DALL·E and CLIP, integrating multiple modalities.
Tool-Using AI and Autonomous Interaction: Enabling models to interact with external tools autonomously, enhancing practical capabilities.
Memory-Augmented Models and Retrieval-Augmented Generation (RAG): Combining LLMs with dynamic access to knowledge bases, allowing real-time information retrieval.
Self-Supervised Learning and Unsupervised Learning Improvements: Making self-supervised learning more efficient from unstructured data sources.
Continuous and Lifelong Learning: AI systems that continuously learn from new data without retraining from scratch, preventing catastrophic forgetting.
AI Safety, Alignment, and Ethics: Ensuring AI aligns with human values, with research into RLHF and reducing harmful behaviors.
Federated Learning and Decentralized AI: Training AI models across distributed datasets without centralizing data, preserving privacy.
Sparsity and Efficient AI Models: Techniques like Sparse Transformers and MoE for computational efficiency, enabling scaling to trillions of parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The AI Papers (Breakthroughs Leading to GPT-4 and Beyond)

1. Early AI and Symbolic Systems (1940s-1980s)

2. Shift to Statistical NLP and Early Machine Learning (1990s-2000s)

3. Deep Learning Breakthroughs and Seq2Seq Models (2010s)

4. Transformer Revolution and Modern NLP (2017-Present)

5. Scaling Laws, Emergent Abilities, and GPT-4 (2020-Present)

6. Ethics, Alignment, and Safety in AI

7. Emerging and Future Directions (2023 and Beyond)

Additional Emerging Areas:

About

Releases

Packages

License

CristiVlad25/ai-papers

Folders and files

Latest commit

History

Repository files navigation

The AI Papers (Breakthroughs Leading to GPT-4 and Beyond)

1. Early AI and Symbolic Systems (1940s-1980s)

2. Shift to Statistical NLP and Early Machine Learning (1990s-2000s)

3. Deep Learning Breakthroughs and Seq2Seq Models (2010s)

4. Transformer Revolution and Modern NLP (2017-Present)

5. Scaling Laws, Emergent Abilities, and GPT-4 (2020-Present)

6. Ethics, Alignment, and Safety in AI

7. Emerging and Future Directions (2023 and Beyond)

Additional Emerging Areas:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages