-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark application #1723
Add Spark application #1723
Conversation
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Codecov Report
@@ Coverage Diff @@
## main #1723 +/- ##
=========================================
Coverage 97.29% 97.29%
Complexity 4408 4408
=========================================
Files 388 388
Lines 10944 10944
Branches 774 774
=========================================
Hits 10648 10648
Misses 289 289
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. |
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
|
spark-sql-application/src/main/scala/org/opensearch/sql/SQLJob.scala
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
def getJson(df: DataFrame): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the limitation of the result size? 100MB, because limits the maximum size of a HTTP request to 100mb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so but I haven't tested with large dataset yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we exclude the filed in which we are writing the result from the mapping.
we can keep the data in _source and the field shouldn't be analyzed or indexed. I am not sure if we are specifying the mapping of the index in which we are writing to
val expectedSchema = StructType(Seq( | ||
StructField(name, ArrayType(StringType, containsNull = true), nullable = true) | ||
)) | ||
val expectedRows = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the expected in here is different with the description which include triple quote string,
"result" : [
"""{"name":"Tina","age":29,"city":"Bellevue"}""",
"""{"name":"Jane","age":25,"city":"London"}""",
"""{"name":"Mike","age":35,"city":"Paris"}"""
],
does triple quote string valid in json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I print dataframe I don't see triple quotes that's why didn't add in test. Also IIRC, triple quote strings are not valid in json.
I assumed it's getting added while writing to opensearch index because string contains quote. will check more on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After replacing \"
with '
in all json string, triple quotes is removed from index and json is valid. updated doc and description accordingly.
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Updated readme and currently mapping looks like
|
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
spark-sql-application/src/main/scala/org/opensearch/sql/SQLJob.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Signed-off-by: Rupal Mahajan <maharup@amazon.com>
Description
Schema of final DataFrame
Example (query:
select * from my_table
)Data written to OpenSearch index
Issues Resolved
#1722
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.