Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using locally-running OpenAI API-compatible service #277

Closed
igor-elbert opened this issue May 20, 2024 · 18 comments
Closed

Allow using locally-running OpenAI API-compatible service #277

igor-elbert opened this issue May 20, 2024 · 18 comments

Comments

@igor-elbert
Copy link

Is your feature request related to a problem? Please describe.
We have Ollama and Jan.ai running locally and want to use them instead of OpenAI for data privacy reasons.

Describe the solution you'd like
Please allow adding URL to the configuration. If the services conform to OpenAI API the rest fo the code should work.

Describe alternatives you've considered
IP forwarding but it's clunky.

Additional context
Other similar services (e.g. CodeGPT) allow custom URLs to the LLM services

@cyyeh
Copy link
Member

cyyeh commented May 20, 2024

Thanks for raising the issue and suggest a great solution for supporting other LLM services! We'll definitely take a look and think about it.

@cyyeh
Copy link
Member

cyyeh commented May 20, 2024

@igor-elbert There is another issue considering how do we support embedding models using other than OpenAI. As of now, I suppose we can't directly use Ollama's supported embedding models since they don't conform to OpenAI's api. Am I correct?

Reference: https://ollama.com/blog/embedding-models. In the "Coming soon" section, OpenAI API Compatibility is one of the items.

@cyyeh
Copy link
Member

cyyeh commented May 20, 2024

I have created a branch for this issue: https://github.com/Canner/WrenAI/tree/feature/ai-service/changing-providers

However, I think one issue we need to tackle first is maybe we should allow community members more easily use their preferred embedding models. As of now, we only use OpenAI's embedding models. There should be three things that community members would like to change on their own, namely generators, vector databases, embedding models. And one caveat in our design as of now is that generators and embedding models should be the same LLM provider such as OpenAI.

What's your thoughts about it?

@ccollie
Copy link

ccollie commented May 20, 2024

Apropros of this, support for defog's SqlCoder would be nice

@igor-elbert
Copy link
Author

igor-elbert commented May 20, 2024 via email

@ccollie
Copy link

ccollie commented May 20, 2024

I think Llama does support it:OpenAI compatibility

Yup. Found it here

@cyyeh
Copy link
Member

cyyeh commented May 21, 2024

@igor-elbert @ccollie I've tested Ollama's text generation model and embedding model support for OpenAI API compatibility. The result is that the embedding model can't be used using OpenAI API. Please check out the attached gist url for reproduction. Please correct me if I am wrong, thanks.

https://gist.github.com/cyyeh/b1042006b4ca067f2a75abd97e3749fb

@ccollie
Copy link

ccollie commented May 21, 2024

Unfortunately I don't (yet) have an ollama setup. However, I did find this which led me to believe this is possible

@ccollie
Copy link

ccollie commented May 21, 2024

Sorry, I misread the gist (re embeddings), but as far as the Ollama docs go, they only have support currently for the chat completions api

@cyyeh
Copy link
Member

cyyeh commented May 21, 2024

@igor-elbert @ccollie

Hi, we just refined how you can add your preferred LLM and Document Store. You only need to define LLM and Document Store and their environment variables! For details please check out the detailed guide here: https://docs.getwren.ai/installation/custom_llm

For adding ollama, I've created a branch for this and made a minimal implementation, feel free to check it out: https://github.com/Canner/WrenAI/tree/feature/ai-service/add-ollama

ONE CAVEAT: after you define your own LLM, you may find ai pipelines break, that's because your LLM may not suitable for the prompts, so you need to do prompt engineering at the moment. In the future, we'll come up with ways for you to easily extend and customize your prompts. Or welcome to share your thoughts here. As of now, I suppose prompts and respective LLM should match to have the best performance.

@cyyeh
Copy link
Member

cyyeh commented May 21, 2024

If there are no more issues, I'll close this issue then. Thank you :)

@qdrddr
Copy link
Contributor

qdrddr commented May 22, 2024

You might also be interested in these models that you can run locally and they can generate SQL:

https://ollama.com/library/sqlcoder
https://ollama.com/library/codeqwen
https://ollama.com/library/starcoder2

@cyyeh
Copy link
Member

cyyeh commented May 22, 2024

We'll merge the add-ollama branch to the main branch after we make sure it won't break our ai pipelines currently. We will investigate some ways to solve the issue. For example, https://github.com/noamgat/lm-format-enforcer

@qdrddr
Copy link
Contributor

qdrddr commented May 31, 2024

Might also be useful for this project since WrenAi has DuckDB already.
duckdb-nsql model

@qdrddr
Copy link
Contributor

qdrddr commented May 31, 2024

We'll merge the add-ollama branch to the main branch after we make sure it won't break our ai pipelines currently. We will investigate some ways to solve the issue. For example, https://github.com/noamgat/lm-format-enforcer

To enforce some format you might need a model that supports function calling such as Mistral 7B v0.3. please note this model might not be particularly powerful with SQL generation. @cyyeh

@cyyeh cyyeh added the module/ai-service ai-service related label Jun 1, 2024
@qdrddr
Copy link
Contributor

qdrddr commented Jun 6, 2024

FYI, there are two most popular inference engines:
Ollama (partly compatible with OpenAI APIs), mostly using its own APIs.
and LocalAI (Tend to be almost fully compatible with OpenAI APIs)

I would suggest using LiteLLM framework that can augment different LLM providers, and make it easier to maintain and add new once.

@cyyeh
Copy link
Member

cyyeh commented Jun 11, 2024

all, the ollama has been integrated in this branch, also you can use openai api compatible llm: chore/ai-service/update-env
we'll merge this branch to the main branch in the near future and update the documentation
as of now, I'll delete the original ollama branch
Thank you all for your patience

related pr: #376

@cyyeh
Copy link
Member

cyyeh commented Jun 28, 2024

All, we now support using Ollama and OpenAI API-compatible LLMs now with the latest release: https://github.com/Canner/WrenAI/releases/tag/0.6.0

Setups on how to run Wren AI using custom LLMs: https://docs.getwren.ai/installation/custom_llm#running-wren-ai-with-your-custom-llm-or-document-store

Currently, there is one obvious limitation for custom LLMs: you need to use the same provder(such as OpenAI, or Ollama) for LLM and embedding model. We'll fix that and release a new version soon. Stay tuned 🙂

I'll close this issue as completed now.

@cyyeh cyyeh closed this as completed Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants