Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HF embedings, add custom tuned prompts, add GPU accel #45

Merged
merged 74 commits into from
May 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
b40a8ef
update requirements.txt
hippalectryon-0 May 12, 2023
3aa1d16
Merge branch 'su77ungr:main' into main
hippalectryon-0 May 12, 2023
599362d
Merge branch 'su77ungr:main' into main
hippalectryon-0 May 14, 2023
1861358
remove state_of_the_union.txt
hippalectryon-0 May 14, 2023
a4b5724
Merge remote-tracking branch 'origin/main' into main-fork
hippalectryon-0 May 14, 2023
66d0677
add poetry config
hippalectryon-0 May 14, 2023
42b9454
update streamlit version
hippalectryon-0 May 14, 2023
c4b2a33
update Dockerfile
hippalectryon-0 May 14, 2023
f5cdbcb
update Dockerfile
hippalectryon-0 May 14, 2023
3621789
fix Dockerfile
hippalectryon-0 May 14, 2023
5eae6c2
update Dockerfile
hippalectryon-0 May 14, 2023
d932557
update README.md
hippalectryon-0 May 14, 2023
a35f7c0
update README.md
hippalectryon-0 May 14, 2023
1cd7df1
update convert.py & pyproject.toml
hippalectryon-0 May 14, 2023
b14b601
Merge remote-tracking branch 'fork/main' into main-fork
hippalectryon-0 May 14, 2023
82f6af6
add tokenizer model
hippalectryon-0 May 14, 2023
e371e23
update README & lint
hippalectryon-0 May 14, 2023
6ffcf25
Merge remote-tracking branch 'fork/main' into main-fork
hippalectryon-0 May 14, 2023
83b8454
add pre-commit
hippalectryon-0 May 14, 2023
8a9ba1f
run pre-commit
hippalectryon-0 May 14, 2023
e3a0b6a
merge
hippalectryon-0 May 14, 2023
01c27f2
fix README.md
hippalectryon-0 May 14, 2023
44f0e18
fix (?) convert.py
hippalectryon-0 May 14, 2023
6c0a46d
fix (?) convert.py
hippalectryon-0 May 14, 2023
1b3e653
fix package versions
hippalectryon-0 May 14, 2023
568ece2
clean for merge
hippalectryon-0 May 14, 2023
821c28e
Merge branch 'main' into main-mr
hippalectryon-0 May 14, 2023
4c132e5
fix README.md
hippalectryon-0 May 14, 2023
f070eae
update README.md for new convert
hippalectryon-0 May 14, 2023
0cdfd80
redirect to main repo
su77ungr May 14, 2023
f41263d
fix ingest.py
hippalectryon-0 May 14, 2023
09b4c1a
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 14, 2023
5b23c15
pre-commit formatting
hippalectryon-0 May 14, 2023
b4d5c1b
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 14, 2023
e306509
rollback README.md
hippalectryon-0 May 14, 2023
1b4cc3a
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 14, 2023
249d2f6
fix Dockerfile and README.md for streamlit
hippalectryon-0 May 14, 2023
488e59d
Merge branch 'main' into main-fork
hippalectryon-0 May 14, 2023
925b779
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 14, 2023
6231aa9
fix README.md
hippalectryon-0 May 14, 2023
d3ff124
cleaner document handling in ingest.py
hippalectryon-0 May 14, 2023
595a75f
add support for ptt, docx
hippalectryon-0 May 14, 2023
0901716
add sample documents
hippalectryon-0 May 14, 2023
520c211
load env variables in centralized file
hippalectryon-0 May 14, 2023
e403170
Merge branch 'main' into main-fork
hippalectryon-0 May 14, 2023
9fd396e
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 14, 2023
c7c367a
remove CI on merge
hippalectryon-0 May 14, 2023
f61e571
check for empty query
hippalectryon-0 May 14, 2023
5d7c81c
Merge remote-tracking branch 'origin/main'
hippalectryon-0 May 14, 2023
26a8ef6
print embedding progress
hippalectryon-0 May 14, 2023
1f37b66
Merge branch 'main' into main-fork
hippalectryon-0 May 14, 2023
273b818
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 15, 2023
3c0f7da
fix model_stop
hippalectryon-0 May 15, 2023
37216f8
fix model_stop
hippalectryon-0 May 15, 2023
94b4ed4
Merge branch 'main' into main-mr
hippalectryon-0 May 15, 2023
b25d551
Merge branch 'main' into main-fork
hippalectryon-0 May 15, 2023
40c4f8a
several minor improvements to startLLM.py
hippalectryon-0 May 15, 2023
7775490
pre-commit formatting
hippalectryon-0 May 15, 2023
2bd0591
Add support for HuggingFace embeddings
hippalectryon-0 May 15, 2023
8bf70da
- add custom prompt templates tailored for vic7b-5, and better than t…
hippalectryon-0 May 15, 2023
1cb53d4
Merge branch 'main' into main-fork
hippalectryon-0 May 15, 2023
9dbfdad
update example.env
hippalectryon-0 May 15, 2023
d1a8f34
Merge remote-tracking branch 'origin/main'
hippalectryon-0 May 15, 2023
ce14ee9
update prompts
hippalectryon-0 May 15, 2023
e351a51
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 15, 2023
432f307
fix typo
hippalectryon-0 May 15, 2023
46b2d47
fix typo
hippalectryon-0 May 15, 2023
1f3e5c2
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 15, 2023
1ba7d27
update example.env
hippalectryon-0 May 15, 2023
ae56b96
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 15, 2023
28b021f
re-add strip
hippalectryon-0 May 15, 2023
d8b89f2
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 15, 2023
078281f
Add N_GPU_LAYERS to .env
hippalectryon-0 May 15, 2023
3af2ae5
Merge branch 'main-fork' into main-mr
hippalectryon-0 May 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ WORKDIR CASALIOY
RUN pip3 install poetry
RUN python3 -m poetry config virtualenvs.create false
RUN python3 -m poetry install
RUN python3 -m pip install --force streamlit # Temp fix, see pyproject.toml
RUN python3 -m pip install --force streamlit sentence_transformers # Temp fix, see pyproject.toml
RUN python3 -m pip uninstall -y llama-cpp-python
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 python3 -m pip install llama-cpp-python # GPU support
RUN pre-commit install
COPY example.env .env
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ for older docker without GUI use `casalioy:latest` might deprecate soon
```
cd models
wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin &&
wget https://huggingface.co/datasets/dnato/ggjt-v1-vic7b-uncensored-q4_0.bin/resolve/main/ggjt-v1-vic7b-uncensored-q4_0.bin
wget https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/resolve/main/ggml-vic7b-q5_1.bin
cd ../
```

Expand All @@ -59,15 +59,21 @@ cd ../
python -m pip install poetry
python -m poetry config virtualenvs.in-project true
python -m poetry install
python -m pip install --force streamlit # Temporary bandaid fix, waiting for streamlit >=1.23
. .venv/bin/activate
python -m pip install --force streamlit sentence_transformers # Temporary bandaid fix, waiting for streamlit >=1.23
pre-commit install
```

If you want GPU support for llama-ccp:
```shell
pip uninstall -y llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --force llama-cpp-python
```

> Download the 2 models and place them in a folder called `./models`:

- LLM: default
is [ggjt-v1-vic7b-uncensored-q4_0](https://huggingface.co/datasets/dnato/ggjt-v1-vic7b-uncensored-q4_0.bin/resolve/main/ggjt-v1-vic7b-uncensored-q4_0.bin)
is [ggml-vic7b-q5_1](https://huggingface.co/eachadea/ggml-vicuna-7b-1.1/resolve/main/ggml-vic7b-q5_1.bin)
- Embedding: default
to [ggml-model-q4_0](https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin).

Expand Down Expand Up @@ -102,7 +108,7 @@ This should look like this
│ └── shor.pdfstate_of_the_union.txt
│ └── state_of_the_union.txt
├── models
│ ├── ggjt-v1-vic7b-uncensored-q4_0.bin
│ ├── ggml-vic7b-q5_1.bin
│ └── ggml-model-q4_0.bin
└── .env, convert.py, Dockerfile
```
Expand Down
8 changes: 5 additions & 3 deletions example.env
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you find stop?

Copy link
Contributor Author

@hippalectryon-0 hippalectryon-0 May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited the prompt to tell the model to say it

Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generic
MODEL_N_CTX=1024
LLAMA_EMBEDDINGS_MODEL=models/ggml-model-q4_0.bin
TEXT_EMBEDDINGS_MODEL=all-MiniLM-L6-v2
TEXT_EMBEDDINGS_MODEL_TYPE=HF # LlamaCpp or HF
USE_MLOCK=true

# Ingestion
Expand All @@ -11,6 +12,7 @@ INGEST_CHUNK_OVERLAP=50

# Generation
MODEL_TYPE=LlamaCpp # GPT4All or LlamaCpp
MODEL_PATH=models/ggjt-v1-vic7b-uncensored-q4_0.bin
MODEL_PATH=models/ggml-vic7b-q5_1.bin
MODEL_TEMP=0.8
MODEL_STOP=###,\n
MODEL_STOP=[STOP]
CHAIN_TYPE=stuff
16 changes: 8 additions & 8 deletions ingest.py
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, did not know this even exists. Gonna ping me docs for later dev

Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import sys
from hashlib import md5
from pathlib import Path
from typing import Callable

from langchain.docstore.document import Document
from langchain.document_loaders import (
Expand All @@ -15,11 +16,10 @@
UnstructuredHTMLLoader,
UnstructuredPowerPointLoader,
)
from langchain.embeddings import LlamaCppEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

from load_env import chunk_overlap, chunk_size, documents_directory, llama_embeddings_model, model_n_ctx, persist_directory, use_mlock
from load_env import chunk_overlap, chunk_size, documents_directory, get_embedding_model, persist_directory

file_loaders = { # extension -> loader
"txt": lambda path: TextLoader(path, encoding="utf8"),
Expand All @@ -41,13 +41,13 @@ def load_one_doc(filepath: Path) -> list[Document]:
return file_loaders[filepath.suffix[1:]](str(filepath)).load()


def embed_documents_with_progress(embedding_model: LlamaCppEmbeddings, texts: list[str]) -> list[list[float]]:
def embed_documents_with_progress(embedding_function: Callable, texts: list[str]) -> list[list[float]]:
"""wrapper around embed_documents that prints progress"""
embeddings = []
N_chunks = len(texts)
for i, text in enumerate(texts):
print(f"embedding chunk {i+1}/{N_chunks}")
embeddings.append(embedding_model.client.embed(text))
print(f"embedding chunk {i + 1}/{N_chunks}")
embeddings.append(embedding_function(text))

return [list(map(float, e)) for e in embeddings]

Expand Down Expand Up @@ -76,12 +76,12 @@ def main(sources_directory: str, cleandb: str) -> None:

# Generate embeddings
print("Generating embeddings...")
embedding_model = LlamaCppEmbeddings(model_path=llama_embeddings_model, n_ctx=model_n_ctx, use_mlock=use_mlock)
embeddings = embed_documents_with_progress(embedding_model, texts)
embedding_model, encode_fun = get_embedding_model()
embeddings = embed_documents_with_progress(encode_fun, texts)

# Store embeddings
print("Storing embeddings...")
client = QdrantClient(path=db_dir) # using Qdrant.from_documents recreates the db each time
client = QdrantClient(path=db_dir, prefer_grpc=True) # using Qdrant.from_documents recreates the db each time
try:
collection = client.get_collection("test")
except ValueError: # doesn't exist
Expand Down
68 changes: 65 additions & 3 deletions load_env.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
"""load env variables"""
import os
from typing import Callable

from dotenv import load_dotenv
from langchain.embeddings import HuggingFaceEmbeddings, LlamaCppEmbeddings
from langchain.prompts import PromptTemplate

load_dotenv()

# generic
llama_embeddings_model = os.environ.get("LLAMA_EMBEDDINGS_MODEL")
text_embeddings_model = os.environ.get("TEXT_EMBEDDINGS_MODEL")
text_embeddings_model_type = os.environ.get("TEXT_EMBEDDINGS_MODEL_TYPE")
model_n_ctx = int(os.environ.get("MODEL_N_CTX"))
use_mlock = os.environ.get("USE_MLOCK").lower() == "true"

Expand All @@ -19,5 +23,63 @@
# generate
model_type = os.environ.get("MODEL_TYPE")
model_path = os.environ.get("MODEL_PATH")
model_temp = float(os.environ.get("MODEL_TEMP"))
model_stop = os.environ.get("MODEL_STOP").split(",")
model_temp = float(os.environ.get("MODEL_TEMP", "0.8"))
model_stop = os.environ.get("MODEL_STOP", "")
model_stop = model_stop.split(",") if model_stop else []
chain_type = os.environ.get("CHAIN_TYPE", "refine")
n_gpu_layers = int(os.environ.get("N_GPU_LAYERS", 0))


def get_embedding_model() -> tuple[HuggingFaceEmbeddings, Callable] | tuple[LlamaCppEmbeddings, Callable]:
"""get the text embedding model
:returns: tuple[the model, its encoding function]"""
match text_embeddings_model_type:
case "HF":
model = HuggingFaceEmbeddings(model_name=text_embeddings_model)
return model, model.client.encode
case "LlamaCpp":
model = LlamaCppEmbeddings(model_path=text_embeddings_model, n_ctx=model_n_ctx)
return model, model.client.embed
case _:
raise ValueError(f"Unknown embedding type {text_embeddings_model_type}")


def get_prompt_template_kwargs() -> dict[str, PromptTemplate]:
"""get an improved prompt template"""
match chain_type:
case "stuff":
question_prompt = """HUMAN: Answer the question using ONLY the given context. If you are unsure of the answer, respond with "Unknown[STOP]". Conclude your response with "[STOP]" to indicate the completion of the answer.

Context: {context}

Question: {question}

ASSISTANT:"""
return {"prompt": PromptTemplate(template=question_prompt, input_variables=["context", "question"])}
case "refine":
question_prompt = """HUMAN: Answer the question using ONLY the given context.
Indicate the end of your answer with "[STOP]" and refrain from adding any additional information beyond that which is provided in the context.

Question: {question}

Context: {context_str}

ASSISTANT:"""
refine_prompt = """HUMAN: Refine the original answer to the question using the new context.
Use ONLY the information from the context and your previous answer.
If the context is not helpful, use the original answer.
Indicate the end of your answer with "[STOP]" and avoid adding any extraneous information.

Original question: {question}

Existing answer: {existing_answer}

New context: {context_str}

ASSISTANT:"""
return {
"question_prompt": PromptTemplate(template=question_prompt, input_variables=["context_str", "question"]),
"refine_prompt": PromptTemplate(template=refine_prompt, input_variables=["context_str", "existing_answer", "question"]),
}
case _:
return {}
Loading