Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for SQL Server Vector Search #722

Merged
merged 21 commits into from
Sep 20, 2024

Conversation

marcominerva
Copy link
Contributor

Motivation and Context (Why the change? What's the scenario?)

Azure SQL Database now provides Vector Support, currently in EAP (https://devblogs.microsoft.com/azure-sql/announcing-eap-native-vector-support-in-azure-sql-database). This PR extends SQL Server Memory DB with a flag that allows to use this new feature.

High level description (Approach, Design)

Just pass true to the new useVectorSearch argument:

builder.Services.AddKernelMemory(options =>
{
    options
      //..
      .WithSqlServerMemoryDb("connecstion_string", useVectorSearch: true)
      //...
      ;
});

With this value, the Memory will use the new Vector Support. In particular, the table KMEmbeddings_index will no longer be necessary, because vectors are now stored in binary format and vector distance can be performed using directly the embedding column in the KMMemories table.

In other words, SqlServerMemory will act differently based on the value of the useVectorSearch flag. I have chosen this approach, rather than creating a brand new MemoryDb implementation, because that are a lot of code in common. Of course, we can decide to split the implementation.

At this moment, vector dimension is limited to 1998, so we can use models like text-embedding-ada-002 or text-embedding-3-small. In case of text-embedding-3-large, we need to specify the value of vector dimension when configuring Kernel Memory.

Important

Remember that, at this time, Vector Support is available only on Azure SQL Database. On the other hand, the current SQL Server Memory DB requires a COLUMNSTORE INDEX that, on Azure, is available only on vCore databases and Standard databases in S3 and above pricing tiers (https://azure.microsoft.com/en-us/blog/columnstore-support-in-standard-tier-azure-sql-databases).

@marcominerva marcominerva requested a review from dluc as a code owner July 30, 2024 14:18
@dluc
Copy link
Collaborator

dluc commented Jul 30, 2024

I think the amount of SQL strings in the code is getting to a very risky state, hard to review for SQL injections (something we keep getting security warnings about). Would it be hard to refactor out all the SQL manipulation in a dependency, having one class for the old SQL and one for the new SQL with vector support?

@marcominerva
Copy link
Contributor Author

@dluc I have extracted all the SQL manipulations in external dependencies (see https://github.com/microsoft/kernel-memory/blob/0d81477a626e55c3f8dbc38461cfa884749f6a6e/extensions/SQLServer/SQLServer/QueryProviders/ISqlServerQueryProvider.cs).

Here it is how it is used:

this._queryProvider = this._config.UseVectorSearch ? new VectorQueryProvider(this._config) : new DefaultQueryProvider(this._config);
)

Currently, there are some code duplication in the two implementations, but for the moment I would like to know if it is the correct approach. Then, we could optimize the code.

@dluc
Copy link
Collaborator

dluc commented Jul 31, 2024

@dluc I have extracted all the SQL manipulations in external dependencies (see https://github.com/microsoft/kernel-memory/blob/0d81477a626e55c3f8dbc38461cfa884749f6a6e/extensions/SQLServer/SQLServer/QueryProviders/ISqlServerQueryProvider.cs).

Currently, there are some code duplication in the two implementations, but for the moment I would like to know if it is the correct approach. Then, we could optimize the code.

yup, exactly what I was hoping for, thank you. Let me know when the PR is ready

@marcominerva
Copy link
Contributor Author

I have changed the ISqlServerQueryProvider interface to an abstract base class so that it can contain the common logic. Now the PR is ready for the review ;)

@marcominerva marcominerva requested a review from dluc August 1, 2024 08:12
@KSemenenko
Copy link
Contributor

What db do you use? For pgvector https://github.com/pgvector/pgvector-dotnet there is a client.

So it looks like it’s possible to use EntityFrameworkCore

Copy link
Collaborator

@dluc dluc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review in progress...

extensions/SQLServer/SQLServer/DependencyInjection.cs Outdated Show resolved Hide resolved
@dluc dluc merged commit ccfb815 into microsoft:main Sep 20, 2024
6 checks passed
@marcominerva marcominerva deleted the sqlserver-vectorsearch branch October 1, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants