Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coral-Spark] Modify coral hive parser and coral spark writer dialect to generate spark compliant escaped string literal #473

Merged
merged 5 commits into from
Nov 7, 2023

Conversation

rzhang10
Copy link
Member

@rzhang10 rzhang10 commented Nov 6, 2023

What changes are proposed in this pull request, and why are they necessary?

Without this patch, a string literal of select 'I'm' will be re-written in spark-sql output of select I''m, the '' double single quote escaping is a ANSI sql escaping standard but is not what is recognized by the spark sql parser. Instead, spark uses the backslash for escaping quotes inside quotes in sql string, i.e. the correct output should be select 'I\'m'.

This patch adds the customizations in the SparkSqlDialect to make the generated output sql consistency use single quote for quoting a outermost level string literal and use \' to quote internal literals occurences of ', for ", it will appear as-is since it will never need to be escaped because the outermost quote is always single quote.

How was this patch tested?

Added 6 unit tests to demonstrate the new behavior:

hive spark
select 'abc' select 'abc'
select "abc" select 'abc'
select 'abc[\'xyz\']' select 'abc[\'xyz\']'
select "abc['xyz']" select 'abc[\'xyz\']'
select 'abc["xyz"]' select 'abc[\'xyz\']'
select "abc[\"xyz\"]" select 'abc["xyz"]'

Run regression tests against all views and results looks fine. Also tested against the view sql which exposed this issue on spark, the spark read result literals are correct now.

@ljfgem
Copy link
Collaborator

ljfgem commented Nov 6, 2023

Thanks Raymond for this pr, could you add regression tests for both spark and trino? And could you also test it on spark shell against the target view to ensure it works as expected?

@rzhang10
Copy link
Member Author

rzhang10 commented Nov 7, 2023

Thanks Raymond for this pr, could you add regression tests for both spark and trino? And could you also test it on spark shell against the target view to ensure it works as expected?

Yeah I just triggered int-test for both trino and spark, let's wait for the restults.

@rzhang10 rzhang10 changed the title Modify coral hive parser and coral spark writer dialect to generate spark compliant escaped string literal [Coral-Spark] Modify coral hive parser and coral spark writer dialect to generate spark compliant escaped string literal Nov 7, 2023
Copy link
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rzhang10 for this PR, left one comment

Copy link
Collaborator

@ljfgem ljfgem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @rzhang10 for the quick fix!

@rzhang10 rzhang10 merged commit d04c55e into linkedin:master Nov 7, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants