We provide the conversation demo with a multi-modal agent, using the chainlit framework. For more information, please visit their official website here.
For a simple chat experience, we load an LLM, such as meta-llama/Meta-Llama-3-8B-Instruct, by specifying the configuration of like so:
task: text-generation
model: "meta-llama/Meta-Llama-3-8B-Instruct"
do_sample: false
max_new_tokens: 300
Then, run the following:
CONFIG=config/regular_chat.yaml chainlit run fastrag/ui/chainlit_no_rag.py
For a chat using a RAG pipeline, specify the tools you wish to use in the following format:
chat_model:
generator_kwargs:
model: microsoft/Phi-3-mini-128k-instruct
task: "text-generation"
generation_kwargs:
max_new_tokens: 300
do_sample: false
huggingface_pipeline_kwargs:
torch_dtype: torch.bfloat16
max_new_tokens: 300
do_sample: false
trust_remote_code: true
generator_class: haystack.components.generators.hugging_face_local.HuggingFaceLocalGenerator
tools:
- type: doc
query_handler:
type: "haystack_yaml"
params:
pipeline_yaml_path: "config/empty_doc_only_retrieval_pipeline.yaml"
index_handler:
type: "haystack_yaml"
params:
pipeline_yaml_path: "config/empty_index_pipeline.yaml"
params:
name: "docRetriever"
description: 'useful for when you need to retrieve text to answer questions. Use the following format: {{ "input": [your tool input here ] }}.'
Then, run the application using the command:
CONFIG=config/rag_pipeline_chat.yaml chainlit run fastrag/ui/chainlit_pipeline.py
In this demo, we use the [xtuner/llava-llama-3-8b-v1_1-transformers
]https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) model as a conversational agent, that can decide which retriever to use to respond to the user's query.
To perform that, we use dynamic reasoning with ReAct prompts, resulting in multiple logical turns.
To explore all the steps to build the agent system, you can check out our Example Notebook.
For more information on how to use ReAct, feel free to visit Haystack's original tutorial, which our demo is based on.
To run the demo, simply run:
CONFIG=config/visual_chat_agent.yaml chainlit run fastrag/ui/chainlit_multi_modal_agent.py
The following is a conversation between a human and an AI. Do not generate the user response to your output.
{memory}
Human: {query}
AI:
<s>[INST] <<SYS>>
The following is a conversation between a human and an AI. Do not generate the user response to your output.
<</SYS>>
{memory}{query} [/INST]
Notice that here we, the user messages will be:
<s>[INST] {USER_QUERY} [/INST]
And the model messages will be:
{ASSISTATN_RESPONSE} </s>
### System:
The following is a conversation between a human and an AI. Do not generate the user response to your output.
{memory}
### User: {query}
### Assistant:
For the v1.5 llava models, we define a specific template, as shown in this post regardin Llava models.
{memory}
USER: {query}
ASSISTANT: