In the previous module, we used OpenAI via OpenAI API. It's a very convenient way to use an LLM, but you have to pay for the usage, and you don't have control over the model you get to use.
In this module, we'll look at using open-source LLMs instead.
- Open-Source LLMs
- Replacing the LLM box in the RAG flow
Video
import os
os.environ['HF_HOME'] = '/run/cache/'
Model: google/flan-t5-xl
Links:
- https://huggingface.co/google/flan-t5-xl
- https://huggingface.co/docs/transformers/en/model_doc/flan-t5
Explanation of Parameters:
max_length: Set this to a higher value if you want longer responses. For example, max_length=300. num_beams: Increasing this can lead to more thorough exploration of possible sequences. Typical values are between 5 and 10. do_sample: Set this to True to use sampling methods. This can produce more diverse responses. temperature: Lowering this value makes the model more confident and deterministic, while higher values increase diversity. Typical values range from 0.7 to 1.5. top_k and top_p: These parameters control nucleus sampling. top_k limits the sampling pool to the top k tokens, while top_p uses cumulative probability to cut off the sampling pool. Adjust these based on the desired level of randomness.
microsoft/Phi-3-mini-128k-instruct
Links:
mistralai/Mistral-7B-v0.1
Links:
- https://huggingface.co/docs/transformers/en/llm_tutorial
- https://huggingface.co/settings/tokens
- https://huggingface.co/mistralai/Mistral-7B-v0.1
LLM360/Amber
- ``
Where to find them:
- Leaderboards
- ChatGPT
Video
The easiest way to run an LLM without a GPU is using Ollama
For Linux:
curl -fsSL https://ollama.com/install.sh | sh
ollama start
ollama serve phi3
Connecting to it with OpenAI API:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama',
)
Docker
docker run -it \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
Pulling the model
docker exec -it bash
ollama pull phi3
Video
- Creating a Docker-Compose file
- Re-running the module 1 notebook
- Putting it in Streamlit