-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrain JSON_TABLE table function parsing to MySqlDialect and AnsiDialect #1123
Conversation
src/parser/mod.rs
Outdated
} else if dialect_of!(self is MySqlDialect) | ||
&& self.parse_keyword(Keyword::JSON_TABLE) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only change is here. Other changes are from rustfmt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a simple test to generic SQL dialect to ensure this behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/parser/mod.rs
Outdated
@@ -7506,7 +7506,7 @@ impl<'a> Parser<'a> { | |||
with_offset, | |||
with_offset_alias, | |||
}) | |||
} else if self.parse_keyword(Keyword::JSON_TABLE) { | |||
} else if dialect_of!(self is MySqlDialect) && self.parse_keyword(Keyword::JSON_TABLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual change, right? It seems like the changes to derive/src/lib.rs are unrelated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds a little bit adhoc. Shouldn't this be a method on dialect ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON_TABLE seems to be implemented on oracle too, for instance: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/JSON_TABLE.html#GUID-3C8E63B5-0B94-4E86-A2D3-3D4831B67C62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Yea, derive/src/lib.rs
was changed by rustfmt. I restored it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have dialects for Oracle and DB2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds a little bit adhoc. Shouldn't this be a method on dialect ?
This can be discussed separately, but this syntax follows existing ones.
Pull Request Test Coverage Report for Build 7838315622
💛 - Coveralls |
src/parser/mod.rs
Outdated
@@ -7506,7 +7506,7 @@ impl<'a> Parser<'a> { | |||
with_offset, | |||
with_offset_alias, | |||
}) | |||
} else if self.parse_keyword(Keyword::JSON_TABLE) { | |||
} else if dialect_of!(self is MySqlDialect) && self.parse_keyword(Keyword::JSON_TABLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON_TABLE seems to be implemented on oracle too, for instance: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/JSON_TABLE.html#GUID-3C8E63B5-0B94-4E86-A2D3-3D4831B67C62
src/parser/mod.rs
Outdated
@@ -7506,7 +7506,7 @@ impl<'a> Parser<'a> { | |||
with_offset, | |||
with_offset_alias, | |||
}) | |||
} else if self.parse_keyword(Keyword::JSON_TABLE) { | |||
} else if dialect_of!(self is MySqlDialect) && self.parse_keyword(Keyword::JSON_TABLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to accept In simpler cases, we may even be able to distinguish between the unquoted table name and the function call directly during parsing, and make everyone happy. |
src/parser/mod.rs
Outdated
@@ -7506,7 +7506,7 @@ impl<'a> Parser<'a> { | |||
with_offset, | |||
with_offset_alias, | |||
}) | |||
} else if self.parse_keyword(Keyword::JSON_TABLE) { | |||
} else if dialect_of!(self is MySqlDialect | AnsiDialect) && self.parse_keyword(Keyword::JSON_TABLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although AnsiDialect
is for SQL:2011 standard based on its comment, seems it makes sense to add it in supported dialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to accept json_table as a table name for specific dialects where it has been tested to be accepted unquoted, but the default behavior should probably be to parse it as the json_table function, not as a table name.
It could be the default behavior for dialects which are likely to follow the standard to support the table function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there are only very few dialects supporting this, I think it makes more sense to not add it into GenericDialect
currently.
src/parser/mod.rs
Outdated
@@ -7506,7 +7506,7 @@ impl<'a> Parser<'a> { | |||
with_offset, | |||
with_offset_alias, | |||
}) | |||
} else if self.parse_keyword(Keyword::JSON_TABLE) { | |||
} else if dialect_of!(self is MySqlDialect | AnsiDialect) && self.parse_keyword(Keyword::JSON_TABLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think the dialect identity should be hardcoded. This should probably be a dialect.supports_json_table. And it should be true in the generic dialect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be nicer to do if self.dialect.supports_json_table()
instead. However, I think we could also do this as a follow on PR as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why it should be true in generic dialect. GenericDialect
is not strictly SQL standard dialect (AnsiDialect
is for the purpose). It is for syntax that most dialects use. Currently the table function is only supported by very few dialects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be nicer to do if
self.dialect.supports_json_table()
instead. However, I think we could also do this as a follow on PR as well
Agreed. As I mentioned early, this can be discussed separately. Although I wonder if it will result in many similar functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I wonder if it will result in many similar functions.
Yes I think it would. The codebase has a mix of styles currently -- both dialect_of!
and trait methods
I believe CI is failing due to a new rust version being released (and thus new clippy lints). I will fix that separately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged up from main to pick up #1130 |
I think @lovasoa 's comment on apache/datafusion#9122 (comment) is worth considering (should we perhaps instead make the error message more helpful?) |
Sounds good. Let me revise this to improve the error message for unsupported dialects. |
tests/sqlparser_common.rs
Outdated
let parsed = all_dialects().parse_sql_statements("SELECT * FROM json_table"); | ||
assert_eq!( | ||
ParserError::ParserError( | ||
"Cannot specify a reserved keyword as identifier for table factor".to_string() | ||
), | ||
parsed.unwrap_err() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I think this can have a more meaningful message to end users.
tests/sqlparser_common.rs
Outdated
let parsed = all_dialects().parse_sql_statements("SELECT * FROM json_table"); | ||
assert_eq!( | ||
ParserError::ParserError( | ||
"Cannot specify a keyword as identifier for table factor".to_string() | ||
), | ||
parsed.unwrap_err() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, even for the supported dialects, json_table
cannot be used as table identifier too. Previously it cannot be too but the paring error looks confusing.
Hmm, there are more tests using keywords as identifiers... |
I don't think banning using keywords as identifiers is a good idea. It will probably break a lot people's applications. Currently, keywords are defined globally in sqlparser, even though the list of supported keywords varies from database from database. Banning everything that is a keyword in one database from use as a keyword globally is probably not a good idea. I recently had the case with MySQL, which supports |
I have concerns about this too. But I think this is what you first suggested (e.g., apache/datafusion#9122 (comment)), no? |
What I was suggesting was just to close apache/datafusion#9122 as "working as intended" :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Keyword as identifier is only disallowed with ANSI mode now. Other dialects still can have keywords as identifiers for their custom usage.
I think there are disallowed cases of using keyword as identifier in individual dialects, ie., Each engine has its reserved keyword list.
We probably can consider to enhance dialect behavior regarding this, i.e., to create keyword list per dialect, in the future.
Does the ansi sql dialect still parse |
It still parses the sql, although For column name etc., it is out of the range of the purpose of this PR, we can work on this later, I think. |
Yes, the problem is, I think |
Another example is |
So I would like to suggest we take a step back here and review what we are trying to accomplish What our DataFusion user hit, and I actually hit as well is quite confusing behavior. Specifically, while it is fine to create a table named DataFusion CLI v35.0.0
❯ create table json_table(x int);
0 rows in set. Query took 0.005 seconds. If you try to query it you get a very confusing syntax error: ❯ select * from json_table; 🤔 Invalid statement: sql parser error: Expected (, found: EOF For better or worse I think using I think the experience would have been far better if the error had instead looked like: ❯ select * from json_table; 🤔 Invalid statement: sql parser error: Can not parse json_table clause. Expected (, found: EOF. Hint: to select from a table named json_table, use double quotes "json_table" So my suggestion is:
|
There is a third option, that I had suggested above, and that I think would be the best:
This way, And If I'm not mistaken, implementing that is just a matter of adding |
I opened an alternative pr to this one here: #1134 |
|
Doesn't it mean you want to block the SQL syntax which not "valid" SQL? I know you changed your mind later. I just don't like to go forth and back on this PR. So I don't want to follow up this PR. Please move forward with your PR. |
My initial comment was not suggesting to block more keywords. I was saying that the current behavior is reasonable. Accepting even more valid syntax is better. Blocking syntax that is valid in most databases and currently works in sqlparser is worse. |
Anyway, I think we are all ok with #1134 in the end. And your effort was not in vain, it ended up raising important points. |
Thank you @alamb |
JSON_TABLE
is introduced in SQL:2016 and added into this parser by #1062. But it is not actually implemented by most SQL engines. Having it as general parser support may cause issue like apache/datafusion#9122. Maybe we should restrainJSON_TABLE
toMySqlDialect
for now.Btw, the PR #1062 added the support also only wrote the test for mysql (
tests/sqlparser_mysql.rs
).