-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit case sensitivity of identifiers #1122
Comments
Adding some background from SQL-1999, section 20.1:
The specification opts for using uppercase |
Related, we will need to reify the case-sensitivity of binders in the AST so that when we pretty-print we will know to quote or not to quote (in the case of keywords). However, the quotes should not matter for binders as we use that literal value regardless of quotes. Today's behavior is what one would expect, that is |
TBH, I'm not sure where this is coming from. The SQL-92 specification AFAICT is clear about it in section 5.2:
|
In short: What is currently implemented for identifiers in partiql-lang-kotlin:
Ideally, we implement what SQL prescribes, while taking care of properly abstracting it.
This will surely be backward-incompatible, but everyone's sanity can be worth it.
SQL
In SQL, there are two syntactic variants of identifiers: regular identifiers like
foo
and delimited/quoted identifiers like"fOO"
.foo
,Foo
,fOo
,FOO
and, notably,"FOO"
are all the same identifier, canonically beingFOO
. This identifier is distinct from"foo"
,"Foo"
,"fOo"
, all of which are also distinct from each other.Importantly, it is ok to name an object
"FOO"
and then refer to it asFOO
orfoo
and, conversely, name an objectgOO
and then refer to it as"GOO"
.partiql-lang-kotlin
In contrast, in the current implementation:
Foo
matching bindersFOO
,Foo
,foo
) -- apparently, identifiers that are equivalent in case-insensitive sense can bind to different entities, without shadowing.[Disclaimer: the above is an attempt at a rational reconstruction of the design in place; no claim is made about its fidelity to the intent or to all intricacies of the implementation.]
Known history
Lower case was chosen as the canonical case in order to conform with PostgreSQL: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
Conclusion
While the SQL's design is not an exhibit of simplicity, the approach in p-l-k is even more complicated. While the two seem to coincide in common usage (say, only non-delimited identifiers and no malicious introduction of ambiguous binders), the differences are significant and break SQL conformance claims.
The text was updated successfully, but these errors were encountered: