Skip to content

Tracing the evolution of AI and large language models from early neural networks to GPT-4 and beyond.

License

Notifications You must be signed in to change notification settings

CristiVlad25/ai-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

The AI Papers (Breakthroughs Leading to GPT-4 and Beyond)

This curriculum traces the development of artificial intelligence and large language models (LLMs) from the early days of AI research to the emergence of GPT-4 and beyond. It highlights pivotal papers and breakthroughs that have shaped the field, focusing on developments relevant to LLMs.

I am also tracing my progress through this curriculum, to keep myself accountable. My current progress is tagged with: (I am currently here).

1. Early AI and Symbolic Systems (1940s-1980s)

  1. 1943 – McCulloch and Pitts Model

  2. 1950 – Turing Test

  3. 1956 – Dartmouth Conference (Birth of AI)

  4. 1958 – Perceptron by Frank Rosenblatt

  5. 1960 – Hebbian Learning (Influence on Neural Networks)

    • The Organization of Behavior
    • Author: Donald Hebb
    • Proposed Hebbian learning, introducing the principle "cells that fire together wire together."
  6. 1969 – Minsky and Papert Critique of Perceptrons

    • Perceptrons
    • Authors: Marvin Minsky and Seymour Papert
    • Highlighted the limitations of the perceptron, leading to a decline in interest in neural networks.
  7. 1974 – Backpropagation Algorithm (Paul Werbos)

  8. 1980 – Neocognitron (Precursor to CNNs)

  9. 1986 – Backpropagation Popularized

  10. 1989 – Hidden Markov Models (HMMs)

2. Shift to Statistical NLP and Early Machine Learning (1990s-2000s)

  1. 1990s – Emergence of Statistical NLP

    • The shift from rule-based systems to statistical approaches in NLP, utilizing n-gram models and probabilistic methods for tasks like part-of-speech tagging and machine translation.
  2. 1993 – IBM Model 1 for Statistical Machine Translation

  3. 1993 – Class-Based n-gram Models

  4. 1997 – Long Short-Term Memory (LSTM)

    • Long Short-Term Memory
    • Authors: Sepp Hochreiter and Jürgen Schmidhuber
    • Introduced the LSTM architecture, addressing the vanishing gradient problem in RNNs.
  5. 1998 – LeNet and Convolutional Neural Networks (CNNs)

  6. 2003 – Neural Probabilistic Language Model

3. Deep Learning Breakthroughs and Seq2Seq Models (2010s)

  1. 2012 – AlexNet and the Deep Learning Boom

  2. 2013 – Word2Vec (Efficient Word Representations)

  3. 2014 – Sequence to Sequence (Seq2Seq) Models

  4. 2014 – Gated Recurrent Units (GRUs)

  5. 2014 – Adam Optimizer

  6. 2015 – Attention Mechanism in Neural Networks

  7. 2017 – ELMo (Embeddings from Language Models)

  8. 2018 – ULMFiT (Universal Language Model Fine-tuning)

4. Transformer Revolution and Modern NLP (2017-Present)

  1. 2017 – Transformer Model (Self-Attention)

    • Attention is All You Need
    • Authors: Ashish Vaswani et al.
    • Introduced the Transformer model, replacing recurrence with self-attention.
  2. 2018 – GPT (Generative Pretrained Transformer)

  3. 2018 – BERT (Bidirectional Transformers)

  4. 2019 – Transformer-XL (Handling Longer Contexts)

  5. 2019 – XLNet (Permutation-Based Pre-training)

  6. 2019 – RoBERTa (Robustly Optimized BERT)

  7. 2019 – T5 (Text-to-Text Transfer Transformer)

  8. 2019 – GPT-2 (OpenAI’s Transformer-based Model)

5. Scaling Laws, Emergent Abilities, and GPT-4 (2020-Present)

  1. 2020 – GPT-3 (Few-Shot Learning at Scale)

  2. 2020 – Electra (Efficient Pre-training)

  3. 2020 – Reformer (Efficient Transformers)

  4. 2021 – Scaling Laws for Neural Language Models

  5. 2021 – Switch Transformer (Sparse Mixture-of-Experts)

  6. 2021 – Megatron-Turing NLG 530B

  7. 2021 – Codex and Code Generation

  8. 2022 – Chain-of-Thought Prompting

  9. 2022 – Chinchilla Scaling Laws

  10. 2022 – PaLM (Pathways Language Model)

  11. 2022 – GLAM (Mixture-of-Experts)

  12. 2022 – BLOOM (Open-Access Multilingual Model)

  13. 2022 – Emergent Abilities of Large Language Models

  14. 2022 – Instruction Tuning and RLHF (Human Feedback)

  15. 2023 – GPT-4 (Multimodal Capabilities)

    • GPT-4 Technical Report
    • Authors: OpenAI
    • Described GPT-4, a large-scale, multimodal model capable of processing both text and images.
  16. 2023 – Sparks of AGI in GPT-4 (Microsoft Research)

  17. 2023 – Toolformer: Language Models Using Tools

  18. 2023 – ChatGPT and Instruction Following

    • Organization: OpenAI
    • Demonstrated the effectiveness of fine-tuning LLMs with RLHF to follow instructions and engage in natural dialogues.
  19. 2023 – Self-Consistency in Chain-of-Thought

6. Ethics, Alignment, and Safety in AI

  1. 2016 – Concrete Problems in AI Safety

    • Concrete Problems in AI Safety
    • Authors: Dario Amodei et al.
    • Outlined key challenges in ensuring AI systems operate safely and align with human values.
  2. 2018 – Gender Shades (Bias in AI Systems)

  3. 2020 – Ethical and Social Implications of AI

  4. 2022 – AI Alignment and Interpretability

    • Ongoing research into understanding and interpreting the decision-making processes of LLMs, aiming to align AI outputs with human values.

7. Emerging and Future Directions (2023 and Beyond)

  1. 2024 - Frugal Transformer: Efficient Training at Scale

  2. 2024 - AI on the Edge

  3. 2024 - Federated GPT

  4. 2023 – Generative Agents and Interactive AI Systems

  5. 2023 – Memory-Augmented Models

  6. 2023 – OpenAI Function Calling and Plugins

  7. 2023 – Sparse Expert Models

    • Research into sparse models like Mixture-of-Experts that scale efficiently by activating relevant parts of the network.
  8. 2023 – Scaling Instruction Tuning

  9. 2023 – Advances in Multimodal Learning

    • Integration of text, image, audio, and video data in unified models, expanding LLM capabilities.

Additional Emerging Areas:

  • Multimodal Models and Unified AI Systems: Development of models like OpenAI's DALL·E and CLIP, integrating multiple modalities.
  • Tool-Using AI and Autonomous Interaction: Enabling models to interact with external tools autonomously, enhancing practical capabilities.
  • Memory-Augmented Models and Retrieval-Augmented Generation (RAG): Combining LLMs with dynamic access to knowledge bases, allowing real-time information retrieval.
  • Self-Supervised Learning and Unsupervised Learning Improvements: Making self-supervised learning more efficient from unstructured data sources.
  • Continuous and Lifelong Learning: AI systems that continuously learn from new data without retraining from scratch, preventing catastrophic forgetting.
  • AI Safety, Alignment, and Ethics: Ensuring AI aligns with human values, with research into RLHF and reducing harmful behaviors.
  • Federated Learning and Decentralized AI: Training AI models across distributed datasets without centralizing data, preserving privacy.
  • Sparsity and Efficient AI Models: Techniques like Sparse Transformers and MoE for computational efficiency, enabling scaling to trillions of parameters.

About

Tracing the evolution of AI and large language models from early neural networks to GPT-4 and beyond.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published