Skip to content

Master the essential steps of pretraining large language models (LLMs). Learn to create high-quality datasets, configure model architectures, execute training runs, and assess model performance for efficient and effective LLM pretraining.

Notifications You must be signed in to change notification settings

ksm26/Pretraining-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to the "Pretraining LLMs" course! 🧑‍🏫 The course dives into the essential steps of pretraining large language models (LLMs).

📘 Course Summary

In this course, you’ll explore pretraining, the foundational step in training LLMs, which involves teaching an LLM to predict the next token using vast text datasets.

🧠 You'll learn the essential steps to pretrain an LLM, understand the associated costs, and discover cost-effective methods by leveraging smaller, existing open-source models.

Detailed Learning Outcomes:

  1. 🧠 Pretraining Basics: Understand the scenarios where pretraining is the optimal choice for model performance. Compare text generation across different versions of the same model to grasp the performance differences between base, fine-tuned, and specialized pre-trained models.
  2. 🗃️ Creating High-Quality Datasets: Learn how to create and clean a high-quality training dataset using web text and existing datasets, and how to package this data for use with the Hugging Face library.
  3. 🔧 Model Configuration: Explore ways to configure and initialize a model for training, including modifying Meta’s Llama models and initializing weights either randomly or from other models.
  4. 🚀 Executing Training Runs: Learn how to configure and execute a training run to train your own model effectively.
  5. 📊 Performance Assessment: Assess your trained model’s performance and explore common evaluation strategies for LLMs, including benchmark tasks used to compare different models’ performance.

🔑 Key Points

  • 🧩 Pretraining Process: Gain in-depth knowledge of the steps to pretrain an LLM, from data preparation to model configuration and performance assessment.
  • 🏗️ Model Architecture Configuration: Explore various options for configuring your model’s architecture, including modifying Meta’s Llama models and innovative pretraining techniques like Depth Upscaling, which can reduce training costs by up to 70%.
  • 🛠️ Practical Implementation: Learn how to pretrain a model from scratch and continue the pretraining process on your own data using existing pre-trained models.

👩‍🏫 About the Instructors

  • 👨‍🏫 Sung Kim: CEO of Upstage, bringing extensive expertise in LLM pretraining and optimization.
  • 👩‍🔬 Lucy Park: Chief Scientific Officer of Upstage, with a deep background in scientific research and LLM development.

🔗 To enroll in the course or for further information, visit 📚 deeplearning.ai.