Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sqllab): add latest partition support for BigQuery #30760

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Oct 31, 2024

Adding db_engine_spec-related features that enables SQL Lab to show the latest partition when using time partitioning in BigQuery as well as applying a WHERE clause on the latest partition by default when fetching the sample dataset. Turns out that SELECT * FROM {{...}} LIMIT {n} can be costly against large tables in BigQuery as it results in a full table scan.

closes #17299

)
assert result == expected_result

def test_get_indexes(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch mistercrunch marked this pull request as ready for review October 31, 2024 05:32
@dosubot dosubot bot added data:connect:googlebigquery Related to BigQuery sqllab Namespace | Anything related to the SQL Lab labels Oct 31, 2024
@@ -40,6 +40,7 @@
EXAMPLES_HOST = os.getenv("EXAMPLES_HOST")
EXAMPLES_PORT = os.getenv("EXAMPLES_PORT")
EXAMPLES_DB = os.getenv("EXAMPLES_DB")
SHOW_STACKTRACE = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intended to be part of the PR, or just for local/temporary dev purposes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me remove that line, now thinking it may make sense to deprecate this flag as it's not evenly implemented.

Adding db_engine_spec-related features that enables SQL Lab to show the latest partition when using time partitioning in BigQuery as well as applying a WHERE clause on the latest partition by default when fetching the sample dataset. Turns out that `SELECT * FROM {{...}} LIMIT {n}` can be costly against large tables in BigQuery as it results in a full table scan.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:googlebigquery Related to BigQuery preset-io size/L sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Select * Limit is DANGEROUS in BigQuery
2 participants