Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In hybrid query allow to skip parallel score collection by core TopDocsCollector #729

Closed
martin-gaievski opened this issue May 2, 2024 · 0 comments

Comments

@martin-gaievski
Copy link
Member

martin-gaievski commented May 2, 2024

As part of performance optimization for Hybrid query we need to find a way to minimize time taken by parallel score collecting processes that is running in core with TopDocsCollector. As per following information collected during profiling these calls take 40-80% of CPU time.

325698784-4a1336dd-923a-4f07-bb22-5fb37720dfd5

325698823-74ac1ec0-f400-4c27-97c4-0228618bf88f

As a baseline we're taking results from previous PR related to hybrid query optimization, those are based on 2.13 version and noaa OSB workload, all times are in ms:

One sub-query that selects 11M documents

Bool: p50 97.9306 | p90 116.299
Hybrid: p50 228.696 | p90 249.665

One sub-query that selects 1.6K documents

Bool: p50 87.3152 | p90 89.3061
Hybrid: p50 89.9654 | p90 92.349

Three sub-query that select 15M documents

Bool: p50 97.9891 | p90 114.396
Hybrid: p50 353.631 | p90 377.527

Most likely that will be a compound change in both core OpenSearch and neural-search plugin. Preferred suggestion is - provide capability to skip or ignore TopDocsCollector in core QueryPhaseSearcher (Core side) and by using that new option call only HybridQueryDocsCollector (plugin side).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant