-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for knn_vector property type #524
Conversation
Signed-off-by: Malte Hedderich <github@hedderich.pro>
Signed-off-by: Malte Hedderich <github@hedderich.pro>
…etting Signed-off-by: Malte Hedderich <github@hedderich.pro>
Signed-off-by: Malte Hedderich <github@hedderich.pro>
Signed-off-by: Malte Hedderich <github@hedderich.pro>
Signed-off-by: Malte Hedderich <github@hedderich.pro>
@@ -124,6 +124,7 @@ public enum FieldType implements JsonEnum { | |||
|
|||
DenseVector("dense_vector"), | |||
|
|||
KnnVector("knn_vector"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is not OpenSearch core field type, this is provided by the plugin (https://github.com/opensearch-project/k-NN), afaik we do not (and probably should not) make it a part of the standard Java client but part of the k-NN plugin APIs, @VachaShah @dblock what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This effectively creates a chicken/egg problem, which is why plugins are "awesome" :)
@maltehedderich Is there a mechanism that lets us add a k-nn high level REST client JAR to the k-nn plugin with this code? or maybe it's a waste of time and it makes more sense to add support for k-nn to the new transport that doesn't have a dependency on core? See #326 and #262
On the other hand, in other clients we are adding plugin code into the client. We just namespace it clearly, so is there a way to namespace all k-nn code here under something.plugins.k-nn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick replies.
While namespaces probably work well for most plugins, this is difficult in the case of the KNN plugin due to its tight integration with core components.
The naming for both cluster and index-specific settings makes it difficult to identify them as plugin settings. As a user, if KNN settings remain under index settings, I would expect to be able to modify and read them through the Index Settings functionality in the Java client.
Moreover, I'm not aware of other plugins introducing custom field types. I think many users who use the OpenSearch KNN functionalities also use other field types (those who want a pure vector database have many alternatives). So it would be important to be able to define the mappings together with your other mappings.
I believe that in order to solve this cleanly we would either have to incorporate the functionality of the KNN plugin into the OpenSearch core or fundamentally consider how to deal with extensions to field types and index settings.
Personally, I think integrating vector search capabilities into the OpenSearch core is the best solution. However, if this is not in line with your product vision I am also willing to spend time on alternative solutions, we should just clarify in advance where we really want to go.
As for the namespacing solution, it's achievable but would necessitate several class extensions and potentially affect usability. Regarding incorporating this into the new Transport Client, I'm unsure how this would look like and would need more information about your vision for that integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While namespaces probably work well for most plugins, this is difficult in the case of the KNN plugin due to its tight integration with core components.
Most if not all plugins have tight integration with core components
Moreover, I'm not aware of other plugins introducing custom field types.
There are many, even within OpenSearch core plugins, fe mapper-murmur3
introduces murmur3
custom field type.
I believe that in order to solve this cleanly we would either have to incorporate the functionality of the KNN plugin into the OpenSearch core or fundamentally consider how to deal with extensions to field types and index settings.
I think more broadly we need an extensible mechanism to support plugin specific APIs in the Java client, we have talked about that in the past in scope of different issues (#377, #262) but have not conclude anything on the subject.
As for the namespacing solution, it's achievable but would necessitate several class extensions and potentially affect usability. Regarding incorporating this into the new Transport Client, I'm unsure how this would look like and would need more information about your vision for that integration.
We need to get back to the extensibility part of OpenSearch Java client. The major drawback I see is that by adding k-NN
plugin to the core client we put all other plugins in unfair and disadvantageous position, I think we need to work towards long term goal instead of taking the simple path of baking everything in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many, even within OpenSearch core plugins, fe mapper-murmur3 introduces murmur3 custom field type.
How is "core plugin" defined? I am only aware of the distinction between "Bundled Plugins" and "Additional Plugins" (according to installing plugins). Unlike murmur3, k-NN belongs to the Bundled Plugins and should therefore be present in the vast majority of OpenSearch deployments.
I think more broadly we need an extensible mechanism to support plugin specific APIs in the Java client, we have talked about that in the past in scope of different issues (#377, #262) but have not conclude anything on the subject.
The discussions sound to me less like extensibility and more like the creation of a workaround through a kind of low level client. This could then of course be used as a basis for plugin specific libraries, but would itself not make the features of the Java client (typing, builders etc.) usable for e.g. users of the k-NN plugin.
According to my understanding of extensibility, there would need to be a standardized way to extend the components of the Java client with plugin specific types and functionalities. So that mappings, index settings and co can all be defined together via the builder without the need for custom HTTP requests.
We need to get back to the extensibility part of OpenSearch Java client. The major drawback I see is that by adding k-NN plugin to the core client we put all other plugins in unfair and disadvantageous position, I think we need to work towards long term goal instead of taking the simple path of baking everything in.
The distinction between "bundled plugins" and "additional plugins" already puts all "additional plugins" at a disadvantage. In addition, I think it is questionable how much of the "plugin character" is left if they are shipped with all but the minimal distributions.
As you said, there are already several ideas and discussions about this topic, not only here in the Java Client repo. In my opinion it is important that we come to a holistic decision how to deal with "bundled" and "additional" plugins within the clients. How do you think we can reach such a decision most effectively?
The rough concepts I have seen for this so far are:
- Integration into core client libraries with appropriate namespacing
- Generic low level clients provided by the core client libraries
- Development of plugin-specific clients which may build up on the core clients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maltehedderich
How is "core plugin" defined? I am only aware of the distinction between "Bundled Plugins" and "Additional Plugins" (according to installing plugins).
Sorry, I should have been more clear, your distinction is very correct, by core plugins I meant the plugins included in OpenSearch repository and bundled with it.
The distinction between "bundled plugins" and "additional plugins" already puts all "additional plugins" at a disadvantage.
There should be no distinction, I just provided an example of field type added in the plugin (since you mentioned you were not aware of), no emphasize on core - it just happened to be a plugin we bundle with OpenSearch.
The rough concepts I have seen for this so far are:
- Integration into core client libraries with appropriate namespacing
- Generic low level clients provided by the core client libraries
- Development of plugin-specific clients which may build up on the core clients
The solution (as I see) it is the combination of plugin specific APIs that could be plugged into OpenSearch Java client. The very rough idea of how it could look like (it is not formalized anywhere):
public class OpenSearchClient extends ApiClient<OpenSearchTransport, OpenSearchClient> {
public <PluginClient> PluginClient plugin(Class<PluginClient> plugin) {
....
}
}
class OpenSearchKnnPluginClient {
....
}
But it does not include plugin specific mapping types, index settings, query builders, etc. , I think we could work it through by taking the k-NN
plugin as the example and formalize the approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my current perspective, I think we definitely want something like that, but (as you already said) it only solves the problem with plugin specific APIs. However, in the case of the k-NN plugin, the standard APIs are used almost exclusively (except for training custom models?).
Nevertheless, I can understand your concerns about including plugin-specific code in the core library and will be happy to participate in possible solutions.
Let's go back to our original problem @maltehedderich and @reta. I think we want support for I'm with ya'll on moving k-nn to core, but that's beyond the point. I would say we don't want to do it mostly because working in core is a doozy. |
This pull request I think is the shortest path to support |
Shall we just merge it and open issues for the rest, or do you think it's too much debt? |
Taking all pros and cons, I would incline to merge it - it won't make things much worth then they are now, but kicking off the work to design the mechanism to plugin/extension APIs would be great. |
...ent/src/test/java/org/opensearch/client/opensearch/_types/mapping/KnnVectorPropertyTest.java
Show resolved
Hide resolved
...-client/src/main/java/org/opensearch/client/opensearch/_types/mapping/KnnVectorProperty.java
Outdated
Show resolved
Hide resolved
java-client/src/main/java/org/opensearch/client/opensearch/_types/mapping/KnnVectorMethod.java
Outdated
Show resolved
Hide resolved
java-client/src/main/java/org/opensearch/client/opensearch/_types/mapping/FieldType.java
Outdated
Show resolved
Hide resolved
I forgot the --signoff flag when I committed the removal of the license header, but all my commits are signed with my SSH key. Is that enough or how should we solve that? All code is written by myself, I just used one of the other Property Classes as template and kept the header because it was present in all other files. |
As far as DCO checks are not failing, we should be good, thanks @maltehedderich ! |
@reta But also, where's the DCO check?! |
@dblock It is no there! And also gone for OpenSearch, opensearch-project/OpenSearch#8016 |
…nse headers Signed-off-by: Malte Hedderich <github@hedderich.pro>
It's back! |
Signed-off-by: Malte Hedderich <github@hedderich.pro>
I removed the dense_vector property and force pushed the signing to my previous commit. It's getting late here in Germany so I'm out, will fix it tomorrow if anything is still wrong. |
Signed-off-by: Malte Hedderich <github@hedderich.pro>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We will need updates to USER_GUIDE and such.
As a followup, let's talk about namespacing this with the word "plugin" better?
Regarding the USER_GUIDES, I would prepare some small examples, but probably won't get to it until the next few days. What is still unclear to me is the distinction to the Java Client Docs, do they pursue the same goal? Should we extend those as well?
Yes. In general, we wanted to take another look at how we handle plugin-specific APIs and should also discuss there how we can separate plugin-specific extensions of the standard APIs better. |
…settings and mappings Signed-off-by: Malte Hedderich <github@hedderich.pro>
…ipting_score with painless extension Signed-off-by: Malte Hedderich <github@hedderich.pro>
I have now added some examples for k-NN search to the USERGUIDE. Examples for the approximative k-NN search are still absent, because for this a KnnQuery class has to be added within our query_dsl first. I'm pretty busy the next two weeks, so I'll only be able to add this in early/mid July. But since exact k-NN search is already fully functional with the modifications in this PR, I would suggest to merge it and add the approximate k-NN search later. I also changed the parameter order for the Apache HttpHost in the USERGUIDE, as it didn't seem to match the referenced |
Signed-off-by: Malte Hedderich <github@hedderich.pro>
Signed-off-by: Malte Hedderich <github@hedderich.pro>
Signed-off-by: Malte Hedderich <github@hedderich.pro>
@@ -84,7 +89,7 @@ There are multiple low level transports which `OpenSearchClient` could be config | |||
import org.apache.hc.core5.http.HttpHost; | |||
|
|||
final HttpHost[] hosts = new HttpHost[] { | |||
new HttpHost("localhost", "http", 9200) | |||
new HttpHost("http", "localhost", 9200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this!
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-524-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7d22a7dab4278e3f6adb8b4d2125f40a624395c0
# Push it to GitHub
git push --set-upstream origin backport/backport-524-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
I think we do want to backport this to 2.x, do we? @maltehedderich would need to be done manually as the auto-backport failed. Regarding organizing the code for plugins, all yours and sincerely appreciate your contributions! |
Yes, that should be possible with reasonable effort. Shall we just create an issue for it and if it's not already done in the next few weeks, I'll take care of it in July? Additionally, Mend seems to be unhappy with a dependency in the main branch, does dependabot take care of that automatically or do we have to adjust it manually? Edit: Is the automatic backport only failing because of the merge conflict in the USERGUIDE 🤔 If that's the case, I think I will be able to fix that quickly tonight. |
I see you did this. Generally that's the way to go because otherwise we forget 😅
We let dependabot do its thing. If you see something it doesn't do, then yes.
Right on. |
Description
Adds support for the knn_vector property type and it's corresponding index settings.
Issues Resolved
#437
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.