Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create any VectorStore - Error building Component Astra DB / Chroma DB #3795

Closed
oschan77 opened this issue Sep 13, 2024 · 13 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@oschan77
Copy link

Bug Description

I am trying to create a VectorStore for RAG, but I cannot create any vectorstore. I have tried both Astra DB and Chroma DB and none of them works for me.

I am currently installing Langflow in my machine and using a conda environment. I have ensured the terminal to have sudo permission before I ran python -m pip install langflow -U to install Langflow.

Reproduction

I am using the GUI, so I will provide screenshots instead of code snippets.

Astra DB:
image

This is the error message for Astra DB:

Error building Component Astra DB: 

Error initializing AstraDBVectorStore: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74b3266a0a60>: Failed to establish a new connection: [Errno 111] Connection refused'))

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 496, in _make_request
    conn.request(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 400, in request
    self.endheaders()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 238, in connect
    self.sock = self._new_conn()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 213, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x74b3266a0a60>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74b3266a0a60>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 221, in build_vector_store
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langchain_astradb/vectorstores.py", line 435, in __init__
    embedding_dimension_m = self._get_embedding_dimension()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langchain_astradb/vectorstores.py", line 471, in _get_embedding_dimension
    self._get_safe_embedding().embed_query(
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 465, in embed_query
    return self.embed_documents([text])[0]
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 446, in embed_documents
    response = requests.post(
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74b3266a0a60>: Failed to establish a new connection: [Errno 111] Connection refused'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/graph/vertex/base.py", line 694, in _build_results
    result = await initialize.loading.get_instance_results(
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/interface/initialize/loading.py", line 64, in get_instance_results
    return await build_component(params=custom_params, custom_component=custom_component)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/interface/initialize/loading.py", line 151, in build_component
    build_results, artifacts = await custom_component.build_results()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/custom/custom_component/component.py", line 617, in build_results
    return await self._build_with_tracing()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/custom/custom_component/component.py", line 605, in _build_with_tracing
    _results, _artifacts = await self._build_results()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/custom/custom_component/component.py", line 640, in _build_results
    result = method()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/base/vectorstores/model.py", line 132, in build_base_retriever
    vector_store = self.build_vector_store()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/base/vectorstores/model.py", line 25, in check_cached
    result = f(self, *args, **kwargs)
  File "<string>", line 223, in build_vector_store
ValueError: Error initializing AstraDBVectorStore: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74b3266a0a60>: Failed to establish a new connection: [Errno 111] Connection refused'))

Chroma DB:
image

This is the error message for Chroma DB

Error building Component Chroma DB: 

HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74b3266c27d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn
    sock = connection.create_connection(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 496, in _make_request
    conn.request(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 400, in request
    self.endheaders()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 238, in connect
    self.sock = self._new_conn()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connection.py", line 213, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x74b3266c27d0>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen
    retries = retries.increment(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74b3266c27d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/graph/vertex/base.py", line 694, in _build_results
    result = await initialize.loading.get_instance_results(
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/interface/initialize/loading.py", line 64, in get_instance_results
    return await build_component(params=custom_params, custom_component=custom_component)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/interface/initialize/loading.py", line 151, in build_component
    build_results, artifacts = await custom_component.build_results()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/custom/custom_component/component.py", line 617, in build_results
    return await self._build_with_tracing()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/custom/custom_component/component.py", line 605, in _build_with_tracing
    _results, _artifacts = await self._build_results()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/custom/custom_component/component.py", line 640, in _build_results
    result = method()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/base/vectorstores/model.py", line 132, in build_base_retriever
    vector_store = self.build_vector_store()
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langflow/base/vectorstores/model.py", line 25, in check_cached
    result = f(self, *args, **kwargs)
  File "<string>", line 139, in build_vector_store
  File "<string>", line 170, in _add_documents_to_vector_store
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langchain_core/vectorstores/base.py", line 282, in add_documents
    return self.add_texts(texts, metadatas, **kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langchain_chroma/vectorstores.py", line 508, in add_texts
    embeddings = self._embedding_function.embed_documents(texts)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 446, in embed_documents
    response = requests.post(
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/ubuntu/anaconda3/envs/langflow-env/lib/python3.10/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x74b3266c27d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Expected behavior

I would expect the vectorstores to be successfully created for both Astra DB and Chroma DB.

Who can help?

No response

Operating System

Ubuntu 22.04

Langflow Version

1.0.18

Python Version

3.10

Screenshot

No response

Flow File

No response

@oschan77 oschan77 added the bug Something isn't working label Sep 13, 2024
Copy link

dosubot bot commented Sep 13, 2024

Hey @oschan77! I'm here to help you with any bugs, questions, or contributions you have. Let's work together to solve this issue.

The connection errors you're encountering with both Astra DB and Chroma DB in Langflow on Ubuntu 22.04 with Python 3.10 could be due to several reasons. Here are some potential issues and their solutions:

Astra DB

  1. Missing Dependencies: Ensure that the required packages langchain-astradb and astrapy are installed:

    pip install langchain-astradb astrapy
  2. Invalid Setup Mode: Ensure the setup mode is correctly set in your configuration. It must be one of the valid options ("Sync", "Async", "Off").

  3. Authentication Issues: Verify that the username, password, and api_endpoint provided are correct and valid.

  4. Network Issues: Ensure your machine has network access to the Astra DB API endpoint. You can test this by trying to reach the endpoint using curl or ping.

  5. Configuration Errors: Ensure all required inputs such as collection_name, username, password, and api_endpoint are correctly provided and valid.

Chroma DB

For Chroma DB, ensure the following configuration parameters are correctly set:

  1. collection_name (String)
  2. persist_directory (String)
  3. search_query (Multiline String)
  4. ingest_data (Data, is_list=True)
  5. embedding (Handle, input_types=["Embeddings"])
  6. chroma_server_cors_allow_origins (String, advanced)
  7. chroma_server_host (String, advanced)
  8. chroma_server_http_port (Integer, advanced)
  9. chroma_server_grpc_port (Integer, advanced)
  10. chroma_server_ssl_enabled (Boolean, advanced)
  11. allow_duplicates (Boolean, advanced, info="If false, will not add documents that are already in the Vector Store.")
  12. search_type (Dropdown, options=["Similarity", "MMR"], advanced, value="Similarity")
  13. number_of_results (Integer, advanced, info="Number of results to return.", value=10)
  14. limit (Integer, advanced, info="Limit the number of records to compare when Allow Duplicates is False.") [1][2].

Environment Variables

For Astra DB, ensure the following environment variables are set:

  • ASTRA_DB_APPLICATION_TOKEN
  • ASTRA_DB_API_ENDPOINT

For Chroma DB, there are no specific environment variables mentioned, but you can manage and configure environment variables in Langflow by setting LANGFLOW_STORE_ENVIRONMENT_VARIABLES to true in your .env file and listing the variables in LANGFLOW_VARIABLES_TO_GET_FROM_ENVIRONMENT [3][4].

Debugging Information

Langflow provides specific logging and debugging information when connection errors occur. For example, it logs the number of documents being added to the vector store and raises a ValueError with the error message if an error occurs [5][6].

Ensure all the required fields are correctly set, the necessary packages are installed, and network connectivity and credentials are valid. If the issue persists, check the logs for specific error messages to help diagnose the problem.

To continue talking to Dosu, mention @dosu.

@edwinjosechittilappilly
Copy link
Collaborator

Thank you for your message. We were experiencing some issues with PyTorch dependencies when working with Hugging Face local embeddings. To resolve this, we have implemented support using Hugging Face’s text embeddings inference. This approach is straightforward to integrate into production environments. For optimal performance, please refer to this guide on running Hugging Face Embeddings locally on CPU.

This update reflects an enhancement in how we interact with Hugging Face embeddings, offering a more efficient and streamlined approach compared to prior implementations.

Additionally, we addressed a similar issue in PR #3758 and Issue #3595

@edwinjosechittilappilly
Copy link
Collaborator

Hi @oschan77 ,

In addition to the Hugging Face support I mentioned earlier, I would also recommend exploring the AstraVectorize component, which supports NVIDIA embeddings as well as several other providers. You can find more details here:

The AstraVectorize component provides a flexible and scalable solution for embedding generation, leveraging the power of NVIDIA embeddings and other integrations. This can be a great alternative for your Langflow deployments, especially if you are looking for enhanced performance and support for various embedding providers.

Feel free to explore these resources and see if they fit your needs. If you have any more questions or need further assistance, just let me know!

@oschan77
Copy link
Author

Hi @edwinjosechittilappilly ,

Thank you for your responses. I'd like to confirm whether my Langflow design is functional and can be used in the future. It seems the main issue stems from the current lack of support for Huggingface Embeddings in Langflow.

For my specific use case, Huggingface models and embeddings aren't essential. Given this context, is there a straightforward method to implement RAG using Langflow? Please note that I don't have access to OpenAI API keys.

As an alternative, I'm considering using Ollama models and embeddings. Are these well-supported in the current version of Langflow? I appreciate your guidance on these matters.

@edwinjosechittilappilly
Copy link
Collaborator

edwinjosechittilappilly commented Sep 16, 2024

Hi @oschan77
Yes we do Support Ollama model and Embeddings components.
Also feel free to explore AstraVectorize for embeddings if you are using AstraDB as Vector Database.
Hope this helps.

@oschan77
Copy link
Author

Hi @edwinjosechittilappilly
Should I use AstraVectorize for embeddings if I use AstraDB as the Vector Database, even though I'm using the Ollama model? Or do you mean that both options are feasible? Thanks.

@edwinjosechittilappilly
Copy link
Collaborator

Both are feasible.

@edwinjosechittilappilly
Copy link
Collaborator

Currently Closing this issue. Feel free to reopen if you face any further issues.

@umais2005
Copy link

umais2005 commented Oct 9, 2024

Hi @edwinjosechittilappilly
I am facing this issue and still cant find a clear solution to this. I am using huggingface embeddings, and the chunking is fine, but the problem is when the vector_store is created. It gives this error:
Error building Component Chroma DB:
HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002C6A7FF6A80>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

@kavinkumarbaskar1
Copy link

Hi @edwinjosechittilappilly I am facing this issue and still cant find a clear solution to this. I am using huggingface embeddings, and the chunking is fine, but the problem is when the vector_store is created. It gives this error: Error building Component Chroma DB: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002C6A7FF6A80>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

Facing same issue

@edwinjosechittilappilly
Copy link
Collaborator

@umais2005 and @kavinkumarbaskar1, if you are using local models, please ensure that you are using the updated component of the Huggingface embeddings and that the Huggingface Text embeddings Inference https://huggingface.co/docs/text-embeddings-inference/en/local_cpu is running locally.

You can also replace the local host URL with the serverless API from Huggingface and use it with the HF token.

If you encounter any further issues, please reopen this issue.

@umais2005
Copy link

Hey i fixed this issue for chroma db. You actually have to run chromadb on localhost with port 8080. Here are the commands to run in terminal
pip install chromadb
chroma run --host localhost --port 8080 --path ./my_chroma_data

@kavinkumarbaskar1
Copy link

Hey, I fixed this issue for chromaDB. You actually have to run chromadb on localhost with port 8080. Here are the commands to run in terminal pip install chromadb chroma run --host localhost --port 8080 --path ./my_chroma_data

@umais2005 Yeah, it is working with chromadb, but AstraDB is not working. Anyway, thanks for the reply it really helped ☺️.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants