Many open source projects support the compatibility of the completions
and the chat/completions
endpoints of the OpenAI API, but do not support the embeddings
endpoint.
The goal of this project is to create an OpenAI API-compatible version of the embeddings
endpoint, which serves open source sentence-transformers models and other models supported by the LangChain's HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings and HuggingFaceBgeEmbeddings class.
Below is a compilation of open-source models that are tested via the embeddings
endpoint:
- BAAI/bge-large-en
- intfloat/e5-large-v2
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/all-mpnet-base-v2
- universal-sentence-encoder-large/5 (Please refer to the
universal_sentence_encoder
branch for more details)
The models mentioned above have undergone testing and verification. It is worth noting that all sentence-transformers models are expected to perform seamlessly with the endpoint.
It may not be immediately apparent that utilizing the BAAI/bge-*
and intfloat/e5-*
series of models with the embeddings
endpoint can yield different embeddings for the same input
value, depending on how it is sent to the embeddings
endpoint. Consider the following examples:
Example 1:
{
"input": "The food was delicious and the waiter..."
}
Example 2:
{
"input": ["The food was delicious and the waiter..."]
}
This discrepancy arises because the BAAI/bge-*
and intfloat/e5-*
series of models require the addition of specific prefix text to the input
value before creating embeddings to achieve optimal performance. In the first example, where the input
is of type str
, it is assumed that the embeddings will be used for queries. Conversely, in the second example, where the input
is of type List[str]
, it is assumed that you will store the embeddings in a vector database. Adhering to these guidelines is essential to ensure the intended functionality and optimal performance of the models.
Try out open-text-embeddings in your browser:
To run the embeddings endpoint locally as a standalone FastAPI server, follow these steps:
-
Install the dependencies by executing the following commands:
pip install --no-cache-dir open-text-embeddings[server]
-
Download the desired model using the following command, for example
intfloat/e5-large-v2
:./download.sh intfloat/e5-large-v2
-
Run the server with the desired model using the following command which normalize embeddings is enabled by default:
MODEL=intfloat/e5-large-v2 python -m open.text.embeddings.server
Set the
NORMALIZE_EMBEDDINGS
to0
orFalse
if the model doesn't support normalize embeddings, for example:MODEL=intfloat/e5-large-v2 NORMALIZE_EMBEDDINGS=0 python -m open.text.embeddings.server
If a GPU is detected in the runtime environment, the server will automatically execute using the
cuba
mode. However, you have the flexibility to specify theDEVICE
environment variable to choose betweencpu
andcuba
. Here's an example of how to run the server with your desired configuration:MODEL=intfloat/e5-large-v2 DEVICE=cpu python -m open.text.embeddings.server
This setup allows you to seamlessly switch between CPU and GPU modes, giving you control over the server's performance based on your specific requirements.
You can enabled verbose logging by setting the
VERBOSE
to1
, for example:MODEL=intfloat/e5-large-v2 VERBOSE=1 python -m open.text.embeddings.server
-
You will see the following text from your console once the server has started:
INFO: Started server process [19705] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
To deploy the embeddings endpoint to the cloud platform using GitHub Actions, fork the repo first, then follow these steps for:
-
Add your AWS credentials (
AWS_KEY
andAWS_SECRET
) to the repository secrets. You can do this by navigating to https://github.com/your-username/open-text-embeddings/settings/secrets/actions. -
Manually trigger the
Deploy Dev
orRemove Dev
GitHub Actions to deploy or remove the AWS Lambda Function.
-
Add your Modal credentials (
MODAL_TOKEN_ID
,MODAL_TOKEN_SECRET
andMODAL_USERNAME
) to the repository secrets. You can do this by navigating to https://github.com/your-username/open-text-embeddings/settings/secrets/actions. -
Manually trigger the
Deploy Modal
GitHub Actions to deploy the Modal web endpoints.
To test the embeddings
endpoint, the repository includes an embeddings.ipynb notebook with a LangChain-compatible OpenAIEmbeddings
class.
To get started:
-
Install the dependencies by executing the following command:
pip install --no-cache-dir open-text-embeddings openai
-
Execute the cells in the notebook to test the embeddings endpoint.
Contributions are welcome! Please check out the issues on the repository, and feel free to open a pull request. For more information, please see the contributing guidelines.
Thank you very much for the following contributions:
- Vokturz contributed #2: support for CPU/GPU choice and initialization before starting the app.
- jayxuz contributed #5: improved OpenAI API compatibility, better support for previous versions of Python (start from v3.7), better defaults and bug fixes.
This project is licensed under the terms of the MIT license.
If you utilize this repository, please consider citing it with:
@misc{open-text-embeddings,
author = {Lim Chee Kin},
title = {open-text-embeddings: Open Source Text Embedding Models with OpenAI API-Compatible Endpoint},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/limcheekin/open-text-embeddings}},
}