Refactor QueryCollectorContext to improve extensibility #11778

martin-gaievski · 2024-01-05T20:30:12Z

Description

Methods create and postProcess in QueryCollectorContext cannot be called from outside of the package. That is limiting extensions for core OS in certain scenarios, as those methods need to be called for functionality like aggregation in search queries.

In this PR I propose to change modifiers from default to public for methods create and prosProcess. Other changes are in implementing classes to adopt those changes in method signatures.

Related Issues

While it's not directly related I came up with this PR in scope of opensearch-project/neural-search#509

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)
Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2024-01-05T20:37:05Z

❌ Gradle check result for 04bce0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-01-05T20:51:45Z

Compatibility status:

Checks if related components are compatible with change 021f593

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

github-actions · 2024-01-05T21:29:49Z

✅ Gradle check result for be0d6c8: SUCCESS

…ess public Signed-off-by: Martin Gaievski <gaievski@amazon.com>

Signed-off-by: Martin Gaievski <gaievski@amazon.com>

jed326 · 2024-01-10T20:45:02Z

@martin-gaievski Thanks for the details! I don't want to derail the conversation too much from @reta's original concerns but I do want to re-iterate my concern that all aggregations may not just work out of the box even if we do go ahead with the change in this PR. In that case we want to avoid a situation where today we open up these 2 methods and then later on you discover there are additional methods we also need to open up, etc. I think sharing a high level design for opensearch-project/neural-search#509 first would help us understand if this change is truly needed as well as any alternatives considered.

reta · 2024-01-10T20:47:39Z

With CollectorManager design the idea is to add the collector to the manager, and it's manager's responsibility to call that collector. Collector should just get scores and implement reduce logic. But that assumption of having 1-1 relationship between score and doc id is still there.

I should go and reread the RFC (opensearch-project/neural-search#126) but just quick question on that: in general, CollectorManager could be using many collectors (MultiCollectorManager) so each one could produce own scores and reduce phase would return scores per collector type. Wouldn't it address the problem of wrapping up multiple scores per doc?

github-actions · 2024-01-10T20:48:38Z

❌ Gradle check result for 021f593: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-01-10T20:55:07Z

❌ Gradle check result for 9acc035: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-01-10T21:13:21Z

❕ Gradle check result for 27a54b9: UNSTABLE

TEST FAILURES:

      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.action.admin.indices.create.CreateIndexIT.testCreateAndDeleteIndexConcurrently
      1 org.opensearch.action.admin.indices.create.CreateIndexIT.classMethod

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

martin-gaievski · 2024-01-10T22:40:47Z

With CollectorManager design the idea is to add the collector to the manager, and it's manager's responsibility to call that collector. Collector should just get scores and implement reduce logic. But that assumption of having 1-1 relationship between score and doc id is still there.

I should go and reread the RFC (opensearch-project/neural-search#126) but just quick question on that: in general, CollectorManager could be using many collectors (MultiCollectorManager) so each one could produce own scores and reduce phase would return scores per collector type. Wouldn't it address the problem of wrapping up multiple scores per doc?

This sounds interesting, I need to explorer this option. But before going deep into details can you help with few questions/concerns:

is it guaranteed that reduce method of such multi collector manager will be called after all child managers are done (or failed)? we will need scores for all queries before we can compile the final score list at shards level.
I feel like we'll need to create new QueryCollectorContext as managers and collectors are created by calling context methods. Will you be ok in opening methods then, if its required? The QueryCollectorContext class is public abstract, but custom implementation cannot be created because abstract methods are with default access.
with such approach how do we support aggregations? Collector for aggregation will be passed to queryphase searcher, but we probably need to combine it with that new MultiCollectorManager for hybrid query and then wrap both into a another collector manager. Does it make sense?

as a remark, with a multicollector approach collectors will be of the same type, just one per inner query. Hybrid query acts like a compound query, where any other query can be an inner query.

reta · 2024-01-11T15:23:35Z

is it guaranteed that reduce method of such multi collector manager will be called after all child managers are done (or failed)? we will need scores for all queries before we can compile the final score list at shards level.

Yes

I feel like we'll need to create new QueryCollectorContext as managers and collectors are created by calling context methods. Will you be ok in opening methods then, if its required? The QueryCollectorContext class is public abstract, but custom implementation cannot be created because abstract methods are with default access.

No, you could use SearchContext::queryCollectorManagers to add your own (pointed it here #11778 (comment))

with such approach how do we support aggregations? Collector for aggregation will be passed to queryphase searcher, but we probably need to combine it with that new MultiCollectorManager for hybrid query and then wrap both into a another collector manager. Does it make sense?

I think the answer would be - probably not, aggregation are essentially a framework on top of collectors / collector managers that has own API. We may need a new time of the aggregator to design, if we have a gap there - we need to understand what is missing.

navneet1v · 2024-01-13T20:39:12Z

No, you could use SearchContext::queryCollectorManagers to add your own (pointed it here #11778 (comment))

Thanks @reta, providing this alternative. After reading the comments on the PR what I feel is we have changed the design of Aggregation for concurrent segment search and there has been some new interfaces that been added. Its worth looking into the new interfaces and see if it can solve the aggregations for hybrid query clause. As pointed out by @martin-gaievski that we will do some deep-dive and get back on this.

But on a very high level if I have to provide whats the requirement here is:

A plugin which is implementing a new QueryPhaseSearcher wants to add another collector in the collector list. This will ensure that new collectors that has been added by the Plugin Query phase Searcher is called along with other collectors. The rest of the flow should remain same to start with.

Then with more testing we keep seeing if some changes are required in the already implemented aggregation flow to remove the issues in aggregation happening with Hybrid Query clause. Note: The Plugin Query phase searcher is called only if the hybrid query clause is a top level query clause, so the impact will be on the aggs with hybrid query clause and nothing else.

navneet1v · 2024-01-13T22:10:34Z

@reta

I did some more deep-dive(Looked into ConcurrentQueryPhaseSearcher class, QueryCollectorManagerContext, CollectorManager and DefaultQueryPhaseSearcher) and here is what I can find:

The queryphaseSearchers gets CollectorContexts as a parameter to its searchWith function. These collectorcontexts are then converted to single collector(in defaultQueryPhaseSearcher) or a single collector manager(ConcurrentQueryPhaseSearcher) which is then passed to the IndexSearcher class.
All this is good because the CollectorContexts are getting created on Opensearch Core. How this is different in HybridQueryPhaseSearcher. The HybridQueryPhaseSearcher creates its own Collector(not collector context) inside the HybridQueryPhaseSearcher[ref].
The Collector context cannot be created in HybridQueryPhaseSearcher because the functions(create, createManager, createQueryCollector, createQueryCollectorWithProfiler etc) are not public. This limits the functionality of Plugins who are adding a new QueryPhaseSearcher to add a new Collector context. And part of the change in this PR is to make those functions public.
To work around this HybridQueryPhaseSearcher as of not is [dropping the collector contexts] (https://github.com/opensearch-project/neural-search/blob/ff3862250ccdae41798fe0787d4872a9b07ffe2d/src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java#L216-L221) which are passed to it and create a new collector that is passed to the index searcher. Due to this, only search is working other things like aggregation, post filtering is not working.
I do understand that rather than creating a single collector we can create a collectorManager which can then be passed to IndexSearcher but even for that, Neural Plugin needs to create a collector context which can be then added with CollectorsContext list passed to searchWith function as an argument and then to IndexSearcher.

Please let me know if there is gap in my understanding.

Once the interfaces are opened up the way I see HybridQueryPhaseSearcher implementation getting changed is:

A new class HybridTopScoreDocCollectorContext should be created. It is then will be added as the first collectorcontext in the collectorsContext list passed to searchWith function.
The indexSearcher can be called either with a collector or collector Manager. I think collector manager should be used. But not a strong preference here.
Once the query is executed, then call the postProcess function on all the collector contexts.

cc: @martin-gaievski , @vamshin

reta · 2024-01-15T19:59:02Z

thanks @navneet1v

All this is good because the CollectorContexts are getting created on Opensearch Core. How this is different in HybridQueryPhaseSearcher. The HybridQueryPhaseSearcher creates its own Collector(not collector context) inside the HybridQueryPhaseSearcher

I do understand that rather than creating a single collector we can create a collectorManager which can then be passed to IndexSearcher but even for that, Neural Plugin needs to create a collector context which can be then added with CollectorsContext list passed to searchWith function as an argument and then to IndexSearcher.

This part I don't understand, any QueryPhaseSearcher could add new collector managers to the context, which is passed as argument, there is no need to do anything with the XxxCollectorContexts here:

context.queryCollectorManagers().put(MyHybridCollectorManager.class, new MyHybridCollectorManager());

Also, which I think is the most important thought to keep in mind: QueryPhaseSearcher was introduced solely to support seamless concurrent / non-concurrent branching of the search flow. It may not fit to what you folks are looking to do - it is perfectly fine, we could think about right abstraction / APIs instead.

martin-gaievski · 2024-01-18T16:59:07Z

I'm checking Andriy's suggestion for using CollectorManager without creating custom context object. It looks promising so far, I built a small POC and basic sanity checks are passing for both aggregations and existing functionality of hybrid query. We may need to tweak few small things but overall it looks good, thanks for your input @reta

cc: @navneet1v

martin-gaievski · 2024-02-14T22:41:06Z

@reta Do you have any suggestion on following finding.

if we call IndexSearcher.search(Query, CollectorManager) then there will be exception throwing from DefaultSearchContext.getTargetMaxSliceCount(), this is because two core aggregations sampler and diversified_sampler do not support concurrent search (this has been added with this PR #11087).

my understanding is that in core system falls back to a non-concurrent search. Is there something that we can control from plugin side to avoid an exception and allow query to work in the same non-concurrent way?

reta · 2024-02-15T14:30:53Z

my understanding is that in core system falls back to a non-concurrent search. Is there something that we can control from plugin side to avoid an exception and allow query to work in the same non-concurrent way?

@martin-gaievski so the aggregations have this method

@Override
    protected boolean supportsConcurrentSegmentSearch() {
      // return true or false
    }

When search request comes in, it is being inspected, among other things, if there are any aggregations that are not compatible with concurrent search, and if yes, it delegates to nonconcurrent path. To answer your question - the IndexSearcher.search(Query, CollectorManager) path should never be taken if there are components incompatible with concurrent search. We may be are missing some inspection here? (fe SearchContext::queryCollectorManagers())

martin-gaievski · 2024-02-15T16:33:52Z

@reta my understanding is that you are suggesting something like:

if (concurrent_search enabled AND aggs_present_AND_support_concurrent_search):
     call `IndexSearcher.search(Query, CollectorManager)`
else
     call other `IndexSearcher.search` method

the problem is that for adding aggregation support to hybrid query we can only follow CollectorManager approach as direct calls to QueryContext has been locked, as per your initial explanation. So we'll be taking that approach independently of concurrent search being enabled or not.

Do you mean that SearchContext::queryCollectorManagers() should return different results depending on all aggs in request being compatible/not compatible with concurrent search?

reta · 2024-02-15T16:48:05Z

@reta my understanding is that you are suggesting something like:

@martin-gaievski no, the decision is taken by QueryPhaseSearcherWrapper

the problem is that for adding aggregation support to hybrid query we can only follow CollectorManager approach as direct calls to QueryContext has been locked, as per your initial explanation. So we'll be taking that approach independently of concurrent search being enabled or not.

I think this is totally fine, I believe that the current implementation does not inspect SearchContext::queryCollectorManagers() as part of go / no go (concurrent path flow). The implementation explicitly checks aggregations but if we are adding something through SearchContext::queryCollectorManagers() (which is totally fine), it is not taken into account, has to be fixed I believe, the SearchContext could make this checks.

jed326 · 2024-02-21T22:15:06Z

if we call IndexSearcher.search(Query, CollectorManager) then there will be exception throwing from DefaultSearchContext.getTargetMaxSliceCount(), this is because two core aggregations sampler and diversified_sampler do not support concurrent search (this has been added with this PR #11087).

@martin-gaievski I'm late to the discussion here but I wanted to mention that aggregation support for concurrent search is opt-in for each aggregator. See:

OpenSearch/server/src/main/java/org/opensearch/search/aggregations/AggregatorFactory.java

Lines 120 to 125 in 247e2ee

    
               /** 
        
                * Implementation should override this method and return true if the Aggregator created by the factory works with concurrent segment search execution model 
        
                */ 
        
               protected boolean supportsConcurrentSegmentSearch() { 
        
                   return false; 
        
               }

This is because plugins can provide their own AggregatorFactory classes and we cannot guarantee that the existing concurrent segment search implementation will work correctly for all plugins, so we need to rely on plugin developers to verify on their own if their plugin can support concurrent segment search. For example, the ParentAggregator in the parent-join module also does not support concurrent segment search:

OpenSearch/modules/parent-join/src/main/java/org/opensearch/join/aggregations/ParentAggregatorFactory.java

Lines 122 to 126 in 247e2ee

    
           @Override 
        
           protected boolean supportsConcurrentSegmentSearch() { 
        
               // See https://github.com/opensearch-project/OpenSearch/issues/9316 
        
               return false; 
        
           }

martin-gaievski · 2024-02-22T01:24:06Z

got it @jed326, I see that for hybrid query we need to take that into account and have a flexibility to switch between concurrent/non-concurrent flow depending on how it's defined in each aggregator.

I worked with @reta on possible approach, looks like we have found promising one that can provide following:

be compatible with existing and future aggregations in cases when it does or does not support concurrent search
don't interfere with the search logic internal, like re-using collectors for concurrent search cases

main idea is to add new collector manager before the QueryPhase.searchInternal and for non-concurrent search do collectorManager.reduce manually after QueryPhase.searchInternal. I put all details into corresponding RFC in neural-search repo.

opensearch-trigger-bot · 2024-03-23T15:20:18Z

This PR is stalled because it has been open for 30 days with no activity.

opensearch-trigger-bot · 2024-04-25T15:21:04Z

This PR is stalled because it has been open for 30 days with no activity.

jed326 · 2024-04-25T17:50:10Z

Hey @martin-gaievski since aggregation support for hybrid queries has been implemented are we good to close this PR?

martin-gaievski · 2024-04-25T18:03:45Z

Yes, let me close it

martin-gaievski force-pushed the make-methods-in-doc-collector-classes-public branch from 04bce0e to be0d6c8 Compare January 5, 2024 20:31

deshsidd approved these changes Jan 10, 2024

View reviewed changes

Made QueryCollectorContext methods create, createManager and postProc…

021f593

…ess public Signed-off-by: Martin Gaievski <gaievski@amazon.com>

martin-gaievski force-pushed the make-methods-in-doc-collector-classes-public branch from 27a54b9 to 021f593 Compare January 10, 2024 20:37

Merge branch 'main' into make-methods-in-doc-collector-classes-public

9acc035

Signed-off-by: Martin Gaievski <gaievski@amazon.com>

martin-gaievski mentioned this pull request Feb 14, 2024

[RFC] Aggregations and Hybrid query opensearch-project/neural-search#604

Closed

opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Mar 23, 2024

opensearch-trigger-bot bot added the stalled Issues that have stalled label Apr 25, 2024

martin-gaievski closed this Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor QueryCollectorContext to improve extensibility #11778

Refactor QueryCollectorContext to improve extensibility #11778

martin-gaievski commented Jan 5, 2024 •

edited

Loading

github-actions bot commented Jan 5, 2024

github-actions bot commented Jan 5, 2024 •

edited

Loading

github-actions bot commented Jan 5, 2024

jed326 commented Jan 10, 2024

reta commented Jan 10, 2024

github-actions bot commented Jan 10, 2024

github-actions bot commented Jan 10, 2024

github-actions bot commented Jan 10, 2024

martin-gaievski commented Jan 10, 2024

reta commented Jan 11, 2024

navneet1v commented Jan 13, 2024 •

edited

Loading

navneet1v commented Jan 13, 2024

reta commented Jan 15, 2024

martin-gaievski commented Jan 18, 2024

martin-gaievski commented Feb 14, 2024

reta commented Feb 15, 2024 •

edited

Loading

martin-gaievski commented Feb 15, 2024

reta commented Feb 15, 2024 •

edited

Loading

jed326 commented Feb 21, 2024

martin-gaievski commented Feb 22, 2024

opensearch-trigger-bot bot commented Mar 23, 2024

opensearch-trigger-bot bot commented Apr 25, 2024

jed326 commented Apr 25, 2024

martin-gaievski commented Apr 25, 2024

Refactor QueryCollectorContext to improve extensibility #11778

Refactor QueryCollectorContext to improve extensibility #11778

Conversation

martin-gaievski commented Jan 5, 2024 • edited Loading

Description

Related Issues

Check List

github-actions bot commented Jan 5, 2024

github-actions bot commented Jan 5, 2024 • edited Loading

Compatibility status:

Incompatible components

Skipped components

Compatible components

github-actions bot commented Jan 5, 2024

jed326 commented Jan 10, 2024

reta commented Jan 10, 2024

github-actions bot commented Jan 10, 2024

github-actions bot commented Jan 10, 2024

github-actions bot commented Jan 10, 2024

martin-gaievski commented Jan 10, 2024

reta commented Jan 11, 2024

navneet1v commented Jan 13, 2024 • edited Loading

navneet1v commented Jan 13, 2024

reta commented Jan 15, 2024

martin-gaievski commented Jan 18, 2024

martin-gaievski commented Feb 14, 2024

reta commented Feb 15, 2024 • edited Loading

martin-gaievski commented Feb 15, 2024

reta commented Feb 15, 2024 • edited Loading

jed326 commented Feb 21, 2024

martin-gaievski commented Feb 22, 2024

opensearch-trigger-bot bot commented Mar 23, 2024

opensearch-trigger-bot bot commented Apr 25, 2024

jed326 commented Apr 25, 2024

martin-gaievski commented Apr 25, 2024

martin-gaievski commented Jan 5, 2024 •

edited

Loading

github-actions bot commented Jan 5, 2024 •

edited

Loading

navneet1v commented Jan 13, 2024 •

edited

Loading

reta commented Feb 15, 2024 •

edited

Loading

reta commented Feb 15, 2024 •

edited

Loading