< Previous Challenge - Home - Next Challenge >
- Azure Cognitive Search resource for indexing and retrieving relevant information
- Azure OpenAI service for Generative AI Models and Embedding Models
- Add required credentials of above resources in .env file
- Install the required libraries in the
requirements.txt
file viapip install -r requirements.txt
When working with large language models, it is important to understand how to ground them with the right data. In addition, you will take a look at how to deal with token limits when you have a lot of data. Finally, you will experiment with embeddings. This challenge will teach you all the fundamental concepts - Grounding, Chunking, Embedding - before you see them in play in Challenge 4. Below are brief introductions to the concepts you will learn.
Grounding is a technique used when you want the model to return reliable answers to a given question. Chunking is the process of breaking down a large document. It helps limit the amount of information we pass into the model. An embedding is an information dense representation of the semantic meaning of a piece of text.
Questions you should be able to answer by the end of the challenge:
- Why is grounding important and how can you ground a LLM model?
- What is a token limit?
- How can you deal with token limits? What are techniques of chunking?
- What do embedding help accomplish?
You will run the following three Jupyter notebooks for this challenge. You can find them in the /Notebooks
folder of Resources.zip
file.
CH-03-A-Grounding.ipynb
CH-03-B-Chunking.ipynb
CH-03-C-Embeddings.ipynb
To complete this challenge successfully, you should be able to:
- Verify that you are able to ground a model through the system message
- Demonstrate various chunking techniques
- Demonstrate how to create embeddings