-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select global top N records with grouping by multiple fields #121
Comments
Hi @ylwu-amzn I tried out a similar query with your requirements, and the sort operator is optimized by pushing it down to the OpenSearch engine, which means it will sort on all 10k documents down into the OS engine before return to the plugin and should be ok for your use case.
As for the case you mentioned where the sort operator is not pushed down and the sorting operation is done in the plugin, it happens when the it fails to proceed the optimization, for example when the expression to sort on is a complicated script or function rather than a simple field:
Then the sort is failed to push down, and will be done in the plugin runtime rather than in OS engine. If the users do want to run query like this and need a accurate result to sort all the documents (rather than fetch part of the index like 1k then sort locally), the users need to update the size limit to a larger number of the index size, and sometimes need to config the search size limit of the OS engine. But the tradeoff is very obvious that it leads to bad performance since all the computation is done in local JVM of the plugin. |
hi, @chloe-zh , thanks for your answer. From this doc https://github.com/opensearch-project/sql/blob/main/docs/user/optimization/optimization.rst#sort-merge-into-opensearch-aggregation,
In our case, we use composite aggregation, so SQL plugin can't push the sort down. As SQL plugin just sort on the queried result from composite aggregation, the sorted result is not globally true. It will be great if SQL can support global sort on composite aggregation. |
Sure, we will evaluate all the limitations of current aggregation support and see how we can improve. Thanks! |
Is your feature request related to a problem? Please describe.
We need to query top N records by grouping by multiple fields and sorting by doc count. Checked the explained DSL, seems no sort logic. Does that mean SQL plugin will take in the query result and sort ? In this way, the sorted result may be not globally top records? For example, we have 10K documents. The query returns 1K documents by default, then SQL plugin just sorts these 1K documents without checking the other 9K documents.
My testing query
The text was updated successfully, but these errors were encountered: