You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of performance optimization for Hybrid query we need to find a way to minimize time taken by getting next matching doc for sun query and collection sun query scores. As per following information collected during profiling these calls take ~85% of CPU time.
As a baseline we're taking results from previous PR related to hybrid query optimization, those are based on 2.13 version and noaa OSB workload, all times are in ms:
One sub-query that selects 11M documents
Bool: p50 77.8893 | p90 78.1916
Hybrid: p50 186.709 | p90 197.739
One sub-query that selects 1.6K documents
Bool: p50 71.0947 | p90 71.691
Hybrid: p50 71.5156 | p90 72.8801
Three sub-query that select 15M documents
Bool: p50 87.0556 | p90 90.9105
Hybrid: p50 287.255 | p90 313.868
The current logic of iterating over doc and collecting scores is following:
iterators for individual sub queries are wrapped into compound iterator. Wrapper provides ability to position all individual iterators to the same doc id
bulk scorer iterate over results by requesting next doc id, wrapper doc iterator return collection of sub query iterators that are positioned on the next (same) doc id
for doc id collector collecting scores. that's done by using collection of priority queues of hits for every sub query. priority queue is needed to sort scores in desc order as iterators cannot do it by design - different sub queries may have different order of scores and different doc ids.
The text was updated successfully, but these errors were encountered:
As part of performance optimization for Hybrid query we need to find a way to minimize time taken by getting next matching doc for sun query and collection sun query scores. As per following information collected during profiling these calls take ~85% of CPU time.
As a baseline we're taking results from previous PR related to hybrid query optimization, those are based on 2.13 version and noaa OSB workload, all times are in ms:
The current logic of iterating over doc and collecting scores is following:
The text was updated successfully, but these errors were encountered: