- Enabled
chat_history
to be passed togen_response
function. - Added FastAPI support
- Fixed bug with memory limit
- Fixed problem with context prompt for non-RAG models
- Added
condense_plus_context
option for chat mode ingen_response
, which is better at answering follow-up questions with RAG. - Added ability to tweak
context_prompt
ingen_response
- Fixed issue with roles in
pg_dump
andpg_restore
- Bug fix in
pg_dump
andpg_restore
functions
- Bug fix in
pgdump
- Added dockerfiles and instructions
- Added functions to dump and restore vector databases
memory_limit
in chat mode would previously overload the context window and stop working, fixed now.context_window
moved to be only at LLM instantiation, removed from other locations ingen_response
which didn't work.
- split the LLM from the vector DB/chat engine model, meaning now you can have multiple separate model objects use the same LLM. Temperature, context window, max new tokens, system prompt, etc. can also all be changed at inference time via the
model.gen_response()
function.
- added automatic handling of CSV data files by converting them to chunked markdown tables
- added
streaming
option to.gen_response
'mps'
device
- separated dropping database and dropping table
- persistent vector databases
- ability to have persistent sessions with a chat engine
- ability to change
chunk_overlap
andparagraph_separator
parameters inSentenceSplitter
- ability to close vector database connection
- two additional methods for converting CSVs to LLM-readable files
- ability to convert tabular CSVs to LLM-readable text files
- initial release