-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for SQL Server Vector Search #722
Conversation
I think the amount of SQL strings in the code is getting to a very risky state, hard to review for SQL injections (something we keep getting security warnings about). Would it be hard to refactor out all the SQL manipulation in a dependency, having one class for the old SQL and one for the new SQL with vector support? |
@dluc I have extracted all the SQL manipulations in external dependencies (see https://github.com/microsoft/kernel-memory/blob/0d81477a626e55c3f8dbc38461cfa884749f6a6e/extensions/SQLServer/SQLServer/QueryProviders/ISqlServerQueryProvider.cs). Here it is how it is used:
Currently, there are some code duplication in the two implementations, but for the moment I would like to know if it is the correct approach. Then, we could optimize the code. |
yup, exactly what I was hoping for, thank you. Let me know when the PR is ready |
I have changed the |
What db do you use? For pgvector https://github.com/pgvector/pgvector-dotnet there is a client. So it looks like it’s possible to use EntityFrameworkCore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review in progress...
…rva/kernel-memory into sqlserver-vectorsearch
Motivation and Context (Why the change? What's the scenario?)
Azure SQL Database now provides Vector Support, currently in EAP (https://devblogs.microsoft.com/azure-sql/announcing-eap-native-vector-support-in-azure-sql-database). This PR extends SQL Server Memory DB with a flag that allows to use this new feature.
High level description (Approach, Design)
Just pass
true
to the newuseVectorSearch
argument:With this value, the Memory will use the new Vector Support. In particular, the table
KMEmbeddings_index
will no longer be necessary, because vectors are now stored in binary format and vector distance can be performed using directly the embedding column in theKMMemories
table.In other words, SqlServerMemory will act differently based on the value of the
useVectorSearch
flag. I have chosen this approach, rather than creating a brand new MemoryDb implementation, because that are a lot of code in common. Of course, we can decide to split the implementation.At this moment, vector dimension is limited to 1998, so we can use models like
text-embedding-ada-002
ortext-embedding-3-small
. In case oftext-embedding-3-large
, we need to specify the value of vector dimension when configuring Kernel Memory.Important
Remember that, at this time, Vector Support is available only on Azure SQL Database. On the other hand, the current SQL Server Memory DB requires a COLUMNSTORE INDEX that, on Azure, is available only on vCore databases and Standard databases in S3 and above pricing tiers (https://azure.microsoft.com/en-us/blog/columnstore-support-in-standard-tier-azure-sql-databases).