-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JSON Support to V2 Engine (#217) #1464
Conversation
* Implement json support for V2 engine Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Reverted some changes Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Removed some fields Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * minor fix Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Added a unit test, cleaned up Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Returning raw OpenSearch response when type is json Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Add an integration test, fix checkstyle errors Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Added constructor for empty rawResponse Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added constant for supported formats Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added unit test Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Addressed PR comments Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Addressed PR comments: Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Fixed issue Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added getter for rawResponse in PhysicalPlan Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Legacy fall back with JSON format (#237) * Implement json support for V2 engine Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Reverted some changes Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Removed some fields Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * minor fix Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Added a unit test, cleaned up Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Returning raw OpenSearch response when type is json Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Add an integration test, fix checkstyle errors Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> * Made new engine fallback to legacy for in memory operations for json format Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Address build failures Signed-off-by: MaxKsyunz <maxk@bitquilltech.com> * Added legacy fall back Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Refactored fall back logic to use visitor design pattern Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added unit tests Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Removed unnecessary IT Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Addressed PR feedback Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Removed unnecessary context Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added fall back for Filter functions Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Made new engine fallback to legacy for in memory operations for json format Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Address build failures Signed-off-by: MaxKsyunz <maxk@bitquilltech.com> * Added legacy fall back Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Refactored fall back logic to use visitor design pattern Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added unit tests Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Removed unnecessary IT Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Addressed PR feedback Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Removed unnecessary context Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added fall back for Filter functions Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> --------- Signed-off-by: MaxKsyunz <maxk@bitquilltech.com> * Fixed checkstyle errors Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Addressed PR comments and fixed the visitor Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added comment to visitor class Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Addressed PR comments to improve visitor class Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added unit tests for JsonSupportVisitor Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added helper function for SQLServiceTest Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added expected failures Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Reworked the visitor class to have type Void instead of Boolean Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Fixed typo Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added github link for tracking issue Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> --------- Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> Signed-off-by: MaxKsyunz <maxk@bitquilltech.com> Co-authored-by: Margarit Hakobyan <margarit.hakobyan@improving.com> Co-authored-by: MaxKsyunz <maxk@bitquilltech.com> * Reverted OpenSearchIndexScan changes as it broke IT Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Added unit test Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Removed unused Mock variable Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> --------- Signed-off-by: Margarit Hakobyan <margarit.hakobyan@improving.com> Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> Signed-off-by: MaxKsyunz <maxk@bitquilltech.com> Co-authored-by: Guian Gumpac <guian.gumpac@improving.com> Co-authored-by: MaxKsyunz <maxk@bitquilltech.com>
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1464 +/- ##
============================================
+ Coverage 98.43% 98.47% +0.04%
- Complexity 3775 3894 +119
============================================
Files 343 348 +5
Lines 9364 9663 +299
Branches 599 620 +21
============================================
+ Hits 9217 9516 +299
Misses 142 142
Partials 5 5
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 32 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@Override | ||
public Void visit(Node node, JsonSupportVisitorContext context) { | ||
visitChildren(node, context); | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this code using Void
, return null
and raising exceptions instead of just returning boolean
?
The visitor answers a yes or no question. Using boolean
is simpler and more natural here.
Using exceptions for flow control is not a recommended practice. SQL plugin does that with fallback to legacy, but that's no great -- let's not add more.
The only thing it would lose is knowing which element caused validation to fail. That seems ok in this case -- the error message will be slightly worse.
If you do want to maintain the error message, you could return Optional<String>
. Then for supported queries JsonSupportVisitor.visit
will return Optional.empty()
. Otherwise it returns the label of the first element that caused validation to fail. That's enough information to throw exception with the same message. As a bonus, throw will happen in one place now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Boolean return type is a good change.
I'm not convinced that removing the Exception throws in this instance is the right way. Try...Catch blocks are slightly more expensive in processing time, but then we have the satisfaction of knowing that the NodeVisitor is interrupted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 4e692fe
|
||
public String getRawResponse() { | ||
return getChild().stream().map(PhysicalPlan::getRawResponse) | ||
.filter(r -> r != null && !r.isEmpty()).findFirst().orElse(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.filter(r -> r != null && !r.isEmpty()).findFirst().orElse(""); | |
.filter(StringUtils::isNotEmpty).findFirst().orElse(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 4e692fe
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndexScan.java
Show resolved
Hide resolved
#255) * Changed visitor type to Boolean and added LIMIT to fall back to legacy Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> * Addressed PR comment Signed-off-by: Guian Gumpac <guian.gumpac@improving.com> --------- Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a high level question about the implementation which I asked earlier on PoC PR too:
For now this JSON format only makes senses for OpenSearch storage. However, special logic are added in many places, QueryService
, PhysicalPlan
, QueryResponse
, ExecutionEngine
etc. Many of them are very fundamental and on critical code path of our core engine. Can we think about if any more extensible way:
- Either make this JSON format and code changes work for all other storage;
- Or minimize the special logic, ex. use
_raw
meta field and only use a special JSON response formatter to work with it.
Thanks!
* Unsupported features in V2 are ones the produce results that differ from | ||
* legacy results. | ||
*/ | ||
public class JsonSupportVisitor extends AbstractNodeVisitor<Boolean, JsonSupportVisitorContext> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if any easy way to do this. As I understand, JSON format only makes sense if the physical query plan for a query only has a Project (item is regular field) and OpenSearchIndexScan. If so, can we do this validation later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, As I understand it, all unsupported cases will still fallback to the V1 engine, so there won't be a significant difference, right?
This PR is a first phase of JSON support implementation. Second phase assumes support of JSON response format for all datasources and queries. We have a PoC on test-json-pagination branch, it keep no raw response from OpenSearch/Prometheus/whatever, but it generates a response which looks 1:1 like returned by OpenSearch with resultset of the SQL plugin. $ curl -s -XPOST http://localhost:9200/_plugins/_sql?format=json -H 'Content-Type: application/json' -d '{"query": "SELECT int0 + 1, 2, now() FROM calcs limit 2"}' | jq
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": null,
"_id": null,
"_score": 0,
"fields": {
"2": 2,
"int0 + 1": 2,
"now()": "2023-03-17 13:07:47"
}
},
{
"_index": null,
"_id": null,
"_score": 0,
"fields": {
"2": 2,
"int0 + 1": null,
"now()": "2023-03-17 13:07:47"
}
}
]
}
} It is not completed yet ( UPD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the long-term plan, do we intend to fully deprecate the legacy engine? I believe that the legacy engine excels in supporting the JSON format, correct? What we do eventually is translate SQL statement to DSL as much as possible.
@@ -49,7 +49,8 @@ public void execute(PhysicalPlan physicalPlan, ExecutionContext context, | |||
result.add(plan.next()); | |||
} | |||
|
|||
QueryResponse response = new QueryResponse(physicalPlan.schema(), result); | |||
String rawResponse = physicalPlan.getRawResponse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is json format, result is not been used, only rawResponse will be used in response, right? Does it means the query plan should be different, if the query plan to fetch raw data?
* Unsupported features in V2 are ones the produce results that differ from | ||
* legacy results. | ||
*/ | ||
public class JsonSupportVisitor extends AbstractNodeVisitor<Boolean, JsonSupportVisitorContext> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, As I understand it, all unsupported cases will still fallback to the V1 engine, so there won't be a significant difference, right?
I believe that what we are targeting here is specific to the OpenSearch raw data response. Do other data sources have similar requirements? |
Not yet, but you're welcome to define them in #1450. It is possible to generate OpenSearch-like JSON using result set from Prometheus. Or create OpenSearch response for OpenSearch queries and Prometheus-like for Prometheus. |
IMO, if we touch core logic as what we're doing in this PR, we should clearly define what's the meaning of JSON format for all data sources. Otherwise, we may want to minimize such special logic that only work for OpenSearch. @acarbonetto @Yury-Fridlyand |
Closing this PR because JSON format doesn't currently have a place in the V2 engine. |
Description
Phase 1 of adding JSON Support to V2 Engine
For example: The following request was handled by V2 engine:
Returning the raw response from OpenSearch made IT tests fail due to missing in-memory operations, difference in aggregation type (V2 is composite_bucket), and some bugs were discovered. The solution for these failing IT tests were to fall back to legacy for all functions, expect failures for IT tests with a comment for issues related.
List of expected failures in legacy IT:
DateFormatIT.and (1436)
DateFormatIT.equalTo (1436)
DateFormatIT.greaterThan (1436)
DateFormatIT.greaterThanOrEqualTo (1436)
DateFormatIT.lessThan (1436)
DateFormatIT.lessThanOrEqualTo (1436)
DateFormatIT.or (1436)
DateFormatIT.sortByDateFormat (1436)
QueryFunctionsIT.wildcardQuery (Dependent on PR #1314)
QueryIT.selectAllWithFieldAndOrderBy (785)
QueryIT.selectAllWithFieldReturnsAll (785)
QueryIT.selectAllWithFieldReverseOrder (785)
QueryIT.selectAllWithMultipleFields (785)
QueryIT.notLikeTests (1425)
Proposal for phase 2 of JSON support for V2
#1450
Issues Resolved
#1317
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.