Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add rescoreContext for neuralsparse query's two-phase speed up. #695

Closed
wants to merge 37 commits into from

Conversation

conggguan
Copy link
Contributor

@conggguan conggguan commented Apr 17, 2024

Description

This change implement for #646

  • Enhance the speed of neuralsparse query by two-phase.
  • Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

Feature support query

  • NeuralSparseQuery
  • NeuralSparseQuery nested in BoostQuery
  • NeuralSparseQuery nested in BooleanQuery

Feature implementation optimize plan

Intro

Now implement two-phase feature by adding rescoreContext in the begin of HybridQueryPhaseSearcher's. It is harmless but little hack, so there is a long term plan to replace this implementation.

RFC in opensearch core

To add a correct rescoreContext to searchContext, we need both searchContext and whole compound query, but now if we can only get these together in the HybridQueryPhaseSearcher. In QueryBuilder, the searchContext is private, and whole query are not visible.
What we need to implement this feature is the whole query and searchContext, so if OpenSearch Core can open a API which can help me get it before the query performed, we can move the rescoreContext function to a more general interface of core.

RoadMap of this optimizing

  1. RFC in OpenSearch Core.
  2. A Incremental update in NeuralSearch plugin.

Issues Resolved

Resolve #646

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: conggguan <congguan@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
@conggguan conggguan force-pushed the rescore-context branch 2 times, most recently from 7783b41 to 8b12f42 Compare April 24, 2024 08:39
Signed-off-by: conggguan <congguan@amazon.com>
Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My big concern is high number of integ tests with model deployment, the rest is less critical

if (query2weight.isEmpty()) {
return;
} else if (query2weight.size() == 1) {
Map.Entry<Query, Float> entry = query2weight.entrySet().stream().findFirst().get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entrySet().iterator().next() should be more optimal. I just has fixed a performance bottleneck in hybrid query caused by Stream.findFirst()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also you don't need else, can start from if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the extra else,
replaced the findFirst.

…eryToken, add some comments, replace some String with String.format.

Signed-off-by: conggguan <congguan@amazon.com>
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets ensure that all BWC tests are successful before this change can be merged.

/**
* Use this setting to manage if a neuralSparseQuery build a two-phase query of not.
*/
public static final Setting<Boolean> NEURAL_SPARSE_TWO_PHASE_DEFAULT_ENABLED = Setting.boolSetting(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these settings are at node scope? This will make the settings available for all the indices in the cluster. The setting should at the index level as for different indices user may choose to set or unset a particular value for a setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for setting it as a node-scope setting is that we believe the feature's settings are relatively stable. In our initial investigations, we observed great performance and accuracy with the default settings. We also think that significant differences in two-phase parameters might cause variations in accuracy across different indexes. Therefore, we made this a node-scope setting.
Additionally, for customization in a single query, we provide a one-time two-phase parameter that allows users to temporarily override this setting in an individual query."

{
  "query": {
    "neural_sparse":{
      "VECTOR_FIELD":
      {
        "model_id": "model_id",
        "query_text": "abcd",
        "two_phase_settings":
         {					
            "enabled": true
            "window_size_expansion": 5,  
            "pruning_ratio": 0.4,  
         }
      }
    }
  }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the settings are stable and we already provide user a way to set the settings in the query itself then we don't need to explicitly provide setting at cluster level because the way I look at this is these settings are of no use. They just taking space in cluster state. Hence I would recommend not having the settings. Plus more I look at this, these are not setting these are more of Neural Sparse query attributes. Hence, we should just remove the settings completely.

Copy link
Member

@zhichao-aws zhichao-aws Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The settings is stable for a concrete sparse model. If the sparse model is trained with thresholding in loss function, they can have different hyperparameters to work with. While the default settings (5, 0.4) works best for our sparse models.

Let's make the decision for keeping these settings based on pros and cons:
pros:

  • make it more convinient for users with customized sparse models
  • make it more convinient to disable the 2-phase search, there are use cases to use OpenSearch to benchmark the sparse model. For these cases we need accurate search relevance scores

cons:

  • having 3 more cluster settings, taking dozens of bytes in memory

Many settings in OpenSearch are rarely changed. Most users won't feel this settings, but for users who need the settings, it is very important to be able to configure them. I don't think this is a critical for performance, hence one single embedding vector will take much more space than these settings.

Copy link
Collaborator

@navneet1v navneet1v Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are defining these settings based on the pros you mentioned then these settings are much more suited at index level and not at cluster level. Because if I have 2 indices using 2 different sparse models, then this setting will be of use. But if we make them at index level then atleast they are useful as I can change setting for 1 index.

Moreover if you think little hard on that point, more than 1 model can be used in 2 different sparse fields on same index. In that case not even the index settings be useful. Hence if you see the settings are not very useful in that perspective because for most of the cases default works(as mentioned by @conggguan ) and for cases where default doesn't work user can update the query.

cons:
having 3 more cluster settings, taking dozens of bytes in memory

Replicate this dozen of bytes over more than 100 nodes then you will see the impact. Nevertheless the point for that comment was, if the settings are not useful as I mentioned above then just adding them in cluster setting is just waste of resources. I cannot comment on other settings which customer don't change and still present in Opensearch. But all I will say is if something is not correct in current code then that is not a justification to follow the same wrong pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a setting is supposed to be an index setting and we making it as part of cluster settings just on the name of performance, do we know how much performance boost we are getting here ? Plus index settings are also stored on the node. So, when settings change you might get hit for 1 query but other queries will be fast.

Copy link
Member

@zhichao-aws zhichao-aws Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a setting is supposed to be an index setting and we making it as part of cluster settings just on the name of performance, do we know how much performance boost we are getting here?

The performance impact is very straightforward in code so we haven't benchmark it. For cluster settings, we just do value assignment; and for index settings, we need to iterate the environment settings. And this will happen for every neural sparse query.

Plus index settings are also stored on the node. So, when settings change you might get hit for 1 query but other queries will be fast.

Do you mean we cache these settings for each index? We need to implement another complexed controller module to manage the cache for each index and do update even manage their lifecycle with index create/delete. I think this is completely overdesigned. You just mentioned the overhead to add one setting, implmenting index settings will make this overhead much larger because in every index we have these settings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we cache these settings for each index? We need to implement another complexed controller module to manage the cache for each index and do update even manage their lifecycle with index create/delete. I think this is completely overdesigned. You just mentioned the overhead to add one setting, implmenting index settings will make this overhead much larger because in every index we have these settings.

I am saying they are already cached at the node level. You don't need to cache it. So any time there is change in settings(index/cluster settings) for any index those settings are pushed to all the data nodes.

For cluster settings, we just do value assignment; and for index settings, we need to iterate the environment settings. And this will happen for every neural sparse query.

Not sure what you mean here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I get your point, I checked the settings API and found I misunderstand how it works.
So you put +1 to index settings now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to indexscope.

searchContext.addRescore(rescoreContext);
}

private static float populateQueryWeightsMapAndGetWindowSizeExpansion(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs to be split into 2 parts.

  1. Where are are just changing the rescore query/recore context.
  2. Where we are changing the actual Query clause provided by user to set what should be the currentQuery that needs to be executed.

We should also add some warnings which should tell us that query clauses was not boost, bool or neuralsparse hence this two phase optimization is not applied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For First comments, I refactor the function addRescoreContextFromNeuralSparseQuery. You can take a look for further optimization.

For the warn logs, I think, If we want to detect all of compound queries that if there has a neuralSparseQuery which couldn't be speed up, maybe it cost some performance loss, such as when a hybrid query, constant score query, dismax query of a filter boolean query come in, we must to traversal all the sub queries of them and make a warn log.
So I think maybe make a illustration in OpenSearch Document is better for neural-search plugin's performance and log's simplicity.
If you think there is a very strong need to add warn log ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only consider the supported compound queires, we can early exit for unsupported cases.
However, if we want to log warn as you mentioned, we need to go through to the end for every compound queries.
Because here is the common entry for all queries, if users are not using semantic search and need to do many complexed compound queries, going through all of them will have performance impact.

Copy link
Collaborator

@navneet1v navneet1v Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.

Moreover now this check is going to happen for every bool query(simple or complex), it has a bigger blast radius and can impact performance of every user query. As the call is recursive so it will iterate over every query clause if it is bool.

Another point, if we are not adding any log around whether twoPhaseOptimization got added or not, how we are expecting admin/operators to Opensearch to figure out whether for a given query optimization worked or not?

Copy link
Member

@zhichao-aws zhichao-aws Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.

This is why we need to early exit. We iterate deeper only if its the query clause of specific type. I.e. for a complexed query clause tree, once we found current node is not of specific types we just exit to minimize the performance impact.

With this early exit, we just go through all boolean and boost, and we don't find some use case have deep nested boolean and boost but without nodes we can early exit from.

But give an warning log for every query has large impact for performance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.

This is why we need to early exit. We iterate deeper only if its the query clause of specific type. I.e. for a complexed query clause tree, once we found current node is not of specific types we just exit to minimize the performance impact.

With this early exit, we just go through all boolean and boost, and we don't find some use case have deep nested boolean and boost but without nodes we can early exit from.

But give an warning log for every query has large impact for performance.

So in my message there 2 points. Seems like you read only one.

Here is the thing:

  1. The code is not going to do any early exit if there is bool clause every where. I am hoping that is clear.
  2. Comparing the latencies of 2 different queries is not an acceptable user explanation here why the queries didn't speed up. So if want operators to understand why a speed up didn't happen we better figure out a way. Otherwise there can be a flurry of questions around why the query is not speeding up. Thats where helpful log message are useful. If there is a better way than log I would suggest to do that.

Copy link
Member

@zhichao-aws zhichao-aws Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in my message there 2 points. Seems like you read only one.

I gave 2 response for the 2 points.

Copy link
Collaborator

@navneet1v navneet1v Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reply for both of them is added. Do you agree with point 1 or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it clear for what's the blocker issue now?
If the blocker issue is the performance impact to iterate the queries:
We already have query visitors in OpenSearch and in neural-search (NeuralSearchQueryVisitor)? In these query visitors we go through the full query, too. And we don't see any call out from developers and customers about the performance impact about them.

If the blocker issue is the warn log:
We can use search profile API to get the finalized lucene query. For 2-phase query, we can find NeuralSparseQuery.currentQuery equals NeuralSparseQuery.highScoreTokenQuery. And when we don't use 2-phase, the NeuralSparseQuery.currentQuery is NeuralSparseQuery.highScoreTokenQuery + NeuralSparseQuery.lowScoreTokenQuery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got what you are trying to say. Now help me understand this, if I have a query with multiple level of nesting of bool -> should -> bool -> should -> dis_max -> neuralsparse in this case the optimization will not happen and we will do these iterations to check neural sparse query is present or not.

This is why we need to early exit. We iterate deeper only if its the query clause of specific type. I.e. for a complexed query clause tree, once we found current node is not of specific types we just exit to minimize the performance impact.
With this early exit, we just go through all boolean and boost, and we don't find some use case have deep nested boolean and boost but without nodes we can early exit from.
But give an warning log for every query has large impact for performance.

So in my message there 2 points. Seems like you read only one.

Here is the thing:

1. The code is not going to do any early exit if there is bool clause every where. I am hoping that is clear.

2. Comparing the latencies of 2 different queries is not an acceptable user explanation here why the queries didn't speed up. So if want operators to understand why a speed up didn't happen we better figure out a way. Otherwise there can be a flurry of questions around why the query is not speeding up. Thats where helpful log message are useful. If there is a better way than log I would suggest to do that.

For 1. I think if a deep compound query that starts with a boolean's filter/must/must not , hybrid or dis_max or constant score, there is a early exit existing, to reduce the influence for unsupported compound query.
For 2. I think explain in documentation is a enough way for user, maybe lots of warn log are little extra for user, and I think, different perform time diff in different compound query are indeed exist and normal, such as, a must clause's query are more fast than a should, because Lucene make la lot of optimization work, executing only a fraction of what they need to in different query types, I think that's one of the things that optimization is all about, finding logic that can be avoided and bypassing them as much as possible, but optimization isn't a functional change, and generating a large amount of warn logs can be a little bit of a problem if we're just avoiding some optimization operation due to a system feature rather than a system bug and thus give the impression that neural sparse is not a reliable or mature new feature. If we had to do that, we have to increase the potential burden on the searcher, I think it would be better to delay this change until the RFC I mentioned in the open search core is implemented?

Signed-off-by: conggguan <congguan@amazon.com>
@zhichao-aws
Copy link
Member

Lets ensure that all BWC tests are successful before this change can be merged.

Some BWC test failed because of there is a version check of stream.

// stream in
if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) {
this.neuralSparseTwoPhaseParameters = in.readOptionalWriteable(NeuralSparseTwoPhaseParameters::new);
}
// stream out
if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) {
out.writeOptionalWriteable(this.neuralSparseTwoPhaseParameters);
}
But at now the PR hasn't been backport to V_2_14_0, so the 2.14 node can not deserialize the stream correctly. It can only work after backport success.
One workaround is using Version.Current, but we need to modify it to 2.14 after backport finished.

@conggguan conggguan force-pushed the rescore-context branch 4 times, most recently from cdfa4ca to 047382f Compare April 26, 2024 04:28
Signed-off-by: conggguan <congguan@amazon.com>
Comment on lines 42 to 45
public static volatile Float DEFAULT_WINDOW_SIZE_EXPANSION;
public static volatile Float DEFAULT_PRUNING_RATIO;
public static volatile Boolean DEFAULT_ENABLED;
public static volatile Integer MAX_WINDOW_SIZE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why public and volatile?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if field is static and final then the name should be all caps. But that is not the case here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why public and volatile?

Using a volatile field and settings update consumer is a common approach in OpenSearch to get real-time cluster settings from memory, without accessing the cluster settings object. You can just find many references in other plugins.

The public can be removed.

static final ParseField PRUNING_RATIO = new ParseField("pruning_ratio");
@VisibleForTesting
static final ParseField ENABLED = new ParseField("enabled");
private static final Version MINIMAL_SUPPORTED_VERSION_TWO_PHASE_SEARCH = Version.CURRENT;
Copy link
Collaborator

@navneet1v navneet1v Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version.Current is not right way. Wondering how it worked for other features? Because this a problem for every feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at what I respond you yesterday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that, but my question is simple. This is not the first feature who is facing this problem. So can you tell me how this particular use case is special

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bwc framework is introduced after 2.11, and it's test cases(neural query, neural sparse query, and hybrid query) are already exists when we write this framework.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bwc framework is introduced after 2.11, and it's test cases(neural query, neural sparse query, and hybrid query) are already exists when we write this framework.

So what difference does it make?

Copy link
Member

@zhichao-aws zhichao-aws Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of what you think the same case, and we can take a look at whether they are different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure minimal supported version should be current. It mean with each release it will be incremented and not supported from previous release? Say we merge this before 2.15, then in 2.16 branch 2.16 is minimal supported meaning 2.15 is not supported

…ort version check.

Signed-off-by: conggguan <congguan@amazon.com>
@vibrantvarun
Copy link
Member

@conggguan please add BWC test for the change. It gives more confidence. Also I see you have significant changes in constructors that too version specific. BWC test is a must for these type of changes.

cc: @navneet1v

conggguan and others added 2 commits April 29, 2024 10:20
}

@Override
protected Query doToQuery(QueryShardContext context) throws IOException {
protected Query doToQuery(QueryShardContext context) {
if (NeuralSparseTwoPhaseParameters.isClusterOnSameVersionForTwoPhaseSearchSupport()) completeTwoPhasePara(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix formatting, it should be easier to read if you put method call inside the if block into curly braces

@AllArgsConstructor
@Getter
@NonNull
public final class NeuralSparseQuery extends Query {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to have BWC test for higher confidence, can you add it @conggguan? I think there was same ask from Varun in this PR

static final ParseField ENABLED = new ParseField("enabled");
private static final Version MINIMAL_SUPPORTED_VERSION_TWO_PHASE_SEARCH = Version.CURRENT;
private Float window_size_expansion;
private Float pruning_ratio;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use lowerCamelCase for variable names, btw it also affects the getter method name

return builder.build();
}

private void completeTwoPhasePara(IndexSettings indexSettings) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there some letters missing in the method name, or it's a weird name? :)

return builder.build();
}

private void completeTwoPhasePara(IndexSettings indexSettings) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix formatting in this method, every method call should be inside the curly braces and on a new line

static final ParseField PRUNING_RATIO = new ParseField("pruning_ratio");
@VisibleForTesting
static final ParseField ENABLED = new ParseField("enabled");
private static final Version MINIMAL_SUPPORTED_VERSION_TWO_PHASE_SEARCH = Version.CURRENT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure minimal supported version should be current. It mean with each release it will be incremented and not supported from previous release? Say we merge this before 2.15, then in 2.16 branch 2.16 is minimal supported meaning 2.15 is not supported

@conggguan conggguan closed this May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement v2.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Enhancing Neural Sparse Query Speed with a Two-Phase Approach
6 participants