Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump calcite SQL parser identifier max length from default 128 (SqlParser.DEFAULT_IDENTIFIER_MAX_LENGTH) to 1024. #14363

Conversation

jackluo923
Copy link
Contributor

@jackluo923 jackluo923 commented Nov 2, 2024

For Pinot queries that involve extensive flattened JSON log matching, the default SQL parser identifier limit of 128 characters is insufficient. For example, a JSON log like {"aaa": {"bbb": {"ccc": 1}} gets flattened automatically and converted into a structured table with a single column with the column name aaa.bbb.ccc. In this case, the column name is 11 characters long, but in real-world applications, it could easily exceed 128 characters.

This PR raises the identifier length limit from the default 128 (as defined by SqlParser.DEFAULT_IDENTIFIER_MAX_LENGTH) to 1024. Currently, this value cannot be adjusted via configuration, but if necessary in the future, the public static variable that holds this limit can be made configurable with minimal effort.

…rser.DEFAULT_IDENTIFIER_MAX_LENGTH) to 1024.
@codecov-commenter
Copy link

codecov-commenter commented Nov 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.79%. Comparing base (59551e4) to head (aee5f42).
Report is 1277 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14363      +/-   ##
============================================
+ Coverage     61.75%   63.79%   +2.03%     
- Complexity      207     1556    +1349     
============================================
  Files          2436     2660     +224     
  Lines        133233   145902   +12669     
  Branches      20636    22333    +1697     
============================================
+ Hits          82274    93073   +10799     
- Misses        44911    45960    +1049     
- Partials       6048     6869     +821     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.76% <100.00%> (+2.06%) ⬆️
java-21 63.67% <100.00%> (+2.05%) ⬆️
skip-bytebuffers-false 63.78% <100.00%> (+2.04%) ⬆️
skip-bytebuffers-true 55.32% <100.00%> (+27.60%) ⬆️
temurin 63.79% <100.00%> (+2.03%) ⬆️
unittests 63.78% <100.00%> (+2.03%) ⬆️
unittests1 55.51% <100.00%> (+8.62%) ⬆️
unittests2 34.13% <100.00%> (+6.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang
Copy link
Contributor

Can you elaborate more on why text search might use long identifier? Identifier is column name, instead of literal. What is the overhead of increasing it?

@jackluo923
Copy link
Contributor Author

Can you elaborate more on why text search might use long identifier? Identifier is column name, instead of literal. What is the overhead of increasing it?

I have updated the PR description which explains the problem better. I don't think there is any overhead of increasing it as it is an artificial limit within the sql parser. For cases where the identifier does not exceed 128 characters, the overhead should be the same. For identifiers that exceeds 128 characters, we haven't noticed any observable overhead since query parsing constitute minimal run-time overhead compared to the query itself. It is extremely rare to encounter identifiers over 128 characters, but if we do encounter it, with this PR, queries will still be able to complete.

@Jackie-Jiang Jackie-Jiang merged commit 10625a3 into apache:master Nov 6, 2024
19 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants