-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(pinot): The python_date_format
for a temporal column was not being passed to get_timestamp_expr
#24942
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superset/db_engine_specs/pinot.py
Outdated
if time_grain: | ||
granularity = cls.get_time_grain_expressions().get(time_grain) | ||
if not granularity: | ||
raise NotImplementedError(f"No pinot grain spec for '{time_grain}'") | ||
else: | ||
return TimestampExpression("{{col}}", col) | ||
return TimestampExpression("{col}", col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you also mind adding some unit tests for Pinot which cover the get_timestamp_expr
function. You can find many other examples of this in the other DB engine specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, not a problem.
superset/connectors/sqla/models.py
Outdated
@@ -1011,7 +1013,7 @@ def adhoc_column_to_sqla( # pylint: disable=too-many-locals | |||
if is_dttm and has_timegrain: | |||
sqla_column = self.db_engine_spec.get_timestamp_expr( | |||
col=sqla_column, | |||
pdf=None, | |||
pdf=pdf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this change may actually break existing logic—given it was explicitly set to None
. Would you mind adding a unit test for this which helps not just to provide code coverage, but also helps reviewers et al. grok the consequence of the change.
@zhaoyongjie it seems like you added this logic in #21163 and thus you probably have the most context as to why we historically weren't defining the pdf
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, in my testing whenever I tried to create a chart or use a dashboard, if a column was marked as temporal
it would always call get_timestamp_expr
via adhoc_column_to_sqla
which means that the user defined date format is never passed to the DB Engine Spec.
It's possible that the root cause of the issue is that get_timestamp_expr
is being called through adhoc_column_to_sqla
which it should be getting called via TableColumn.get_timestamp_expression
(the only other call path to get_timestamp_expr
I could find. But all my tests pointed to adhoc_column_to_sqla
being the root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley the "pdf" is a shortcut for "date format (seconds or milliseconds)", this code was existing in many years, the "pdf" only used in Calculated Column and Columns from database, but not used in Adhoc expression, so we shouldn't make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design of current Pinot DB spec is completely incorrect. Maintaining our own Pinot driver and db_spec should solve your issue.
class PinotEngineSpec(BaseEngineSpec): # pylint: disable=abstract-method
engine = "pinot"
engine_name = "Apache Pinot"
allows_subqueries = False
allows_joins = False
allows_alias_in_select = True
allows_alias_in_orderby = False
# https://docs.pinot.apache.org/users/user-guide-query/supported-transformations#datetime-functions
_time_grain_expressions = {
None: "{col}",
"PT1S": "CAST(DATE_TRUNC('second', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
"PT1M": "CAST(DATE_TRUNC('minute', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
"PT5M": "CAST(ROUND(DATE_TRUNC('minute', CAST({col} AS TIMESTAMP)), 300000) as TIMESTAMP)",
"PT10M": "CAST(ROUND(DATE_TRUNC('minute', CAST({col} AS TIMESTAMP)), 600000) as TIMESTAMP)",
"PT15M": "CAST(ROUND(DATE_TRUNC('minute', CAST({col} AS TIMESTAMP)), 900000) as TIMESTAMP)",
"PT30M": "CAST(ROUND(DATE_TRUNC('minute', CAST({col} AS TIMESTAMP)), 1800000) as TIMESTAMP)",
"PT1H": "CAST(DATE_TRUNC('hour', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
"P1D": "CAST(DATE_TRUNC('day', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
"P1W": "CAST(DATE_TRUNC('week', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
"P1M": "CAST(DATE_TRUNC('month', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
"P3M": "CAST(DATE_TRUNC('quarter', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
"P1Y": "CAST(DATE_TRUNC('year', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)",
}
@classmethod
def column_datatype_to_string(
cls, sqla_column_type: TypeEngine, dialect: Dialect
) -> str:
# Pinot driver infers TIMESTAMP column as LONG, so make the quick fix.
# When the Pinot driver fix this bug, current method could be removed.
if isinstance(sqla_column_type, types.TIMESTAMP):
return sqla_column_type.compile().upper()
else:
return super().column_datatype_to_string(sqla_column_type, dialect)
driver at: https://github.com/BurdaForward/pinot-dbapi/tree/bf_release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaoyongjie I'm aware of what the Python Date Format (PDF) represents, though thanks for clarifying that this shouldn't be used for ad-hoc expressions.
Note we do already have a Pino DB engine spec, but maybe only adding the column_datatype_to_string
method is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ege-st I also wondered if this was an underlying issue with the Pino SQLAlchemy dialect. You might want to look into the visit_label
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaoyongjie I've confirmed that this error happens with the latest versions of Pinot: so Superset can't alias a projection to the same name as a column that already exists. I looked at the diff you provided but it appears to be diffing a version of models.py
that is not the same as the one in the master
branch.
@john-bodley could you provide some more detail? Is SQL Alchemy generating the alias name used in the projection? If so, then it could be an issue with the dialect, but if Superset generates the alias label then I'm not sure how the dialect can address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ege-st the git-diffs are from Superset 2.1.0 branch. There aren't many changes, so you should apply the changes manually 🖨️🖨️🖨️
You should change this part of code on Master branch
superset/superset/models/core.py
Lines 965 to 979 in cacad56
def make_sqla_column_compatible( | |
self, sqla_col: ColumnElement, label: str | None = None | |
) -> ColumnElement: | |
"""Takes a sqlalchemy column object and adds label info if supported by engine. | |
:param sqla_col: sqlalchemy column instance | |
:param label: alias/label that column is expected to have | |
:return: either a sql alchemy column or label instance if supported by engine | |
""" | |
label_expected = label or sqla_col.name | |
# add quotes to tables | |
if self.db_engine_spec.allows_alias_in_select: | |
label = self.db_engine_spec.make_label_compatible(label_expected) | |
sqla_col = sqla_col.label(label) | |
sqla_col.key = label_expected | |
return sqla_col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaoyongjie so I believe I figured out a workaround for the alias issue. If I just set allows_alias_in_select = False
then the query generated by Superset does not use an alias and the query is then compatible with Pinot. So, I don't think any of the additional changes you kindly suggested are necessary.
One question that I have is: what is the purpose of the pdf
that gets defined in the dataset configuration? Since it isn't passed into the engine spec when creating a chart, it can't be used in the query generation, so it doesn't seem to serve a purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I believe I figured out a workaround for the alias issue. If I just set allows_alias_in_select = False then the query generated by Superset does not use an alias and the query is then compatible with Pinot. So, I don't think any of the additional changes you kindly suggested are necessary.
Sounds good! It should be worked.
One question that I have is: what is the purpose of the pdf that gets defined in the dataset configuration? Since it isn't passed into the engine spec when creating a chart, it can't be used in the query generation, so it doesn't seem to serve a purpose?
I think the original design of "pdf" is a hard-code for getting a timestamp from a string, but a type conversion expression is more graceful, --- should push down the function and run in DB rather than calculate in client.
Thanks, I will test the recommended changes.
…On Tue, Aug 15, 2023 at 9:54 AM Yongjie Zhao ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In superset/connectors/sqla/models.py
<#24942 (comment)>:
> @@ -1011,7 +1013,7 @@ def adhoc_column_to_sqla( # pylint: disable=too-many-locals
if is_dttm and has_timegrain:
sqla_column = self.db_engine_spec.get_timestamp_expr(
col=sqla_column,
- pdf=None,
+ pdf=pdf,
@ege-st <https://github.com/ege-st> the git-diffs are from Superset 2.1.0
branch. There aren't many changes, so you should apply the changes manually
🖨️🖨️🖨️
—
Reply to this email directly, view it on GitHub
<#24942 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BAASDJYAFUBWSB6AAIG4XETXVN5SDANCNFSM6AAAAAA3KYBQUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@ege-st when you have time, could you fix the unit test and code style? If you need helping, feel free to ping me at GitHub or Slack DM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change LGTM, waiting for CI.
superset/db_engine_specs/pinot.py
Outdated
TimeGrain.QUARTER: True, | ||
TimeGrain.YEAR: True, | ||
None: "{col}", | ||
"PT1S": "CAST(DATE_TRUNC('second', CAST({col} AS TIMESTAMP)) AS TIMESTAMP)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the enums here, eg, TimeGrain.SECOND
instead of PT1S
? It makes it much easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@nytai could you take a look at this PR, I believe your review is the last one required. Thanks! |
@zhaoyongjie @john-bodley Is the PR good to merge? Thanks! |
I’ll check it out tonight and let you know.
On Sun 27. Aug 2023 at 17:29, Erich ***@***.***> wrote:
@zhaoyongjie <https://github.com/zhaoyongjie> @john-bodley
<https://github.com/john-bodley> Is the PR good to merge? Thanks!
—
Reply to this email directly, view it on GitHub
<#24942 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPMKURVVPL3EOQXF4GHMRTXXNRUVANCNFSM6AAAAAA3KYBQUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Best regards,
Yongjie
|
Could you mind to change subtitle of merge request? This mr is refacoring
of the Pinot spec rather than fix it.
On Sun 27. Aug 2023 at 17:29, Erich ***@***.***> wrote:
@zhaoyongjie <https://github.com/zhaoyongjie> @john-bodley
<https://github.com/john-bodley> Is the PR good to merge? Thanks!
—
Reply to this email directly, view it on GitHub
<#24942 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAPMKURVVPL3EOQXF4GHMRTXXNRUVANCNFSM6AAAAAA3KYBQUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Best regards,
Yongjie
|
python_date_format
for a temporal column was not being passed to get_timestamp_expr
python_date_format
for a temporal column was not being passed to get_timestamp_expr
@zhaoyongjie Done. |
@ege-st Thanks for tackling this problem! |
Hi @martin-raymond, I'll take a look today and let you know. |
@ege-st do you have any news ? |
…ot being passed to `get_timestamp_expr` (apache#24942)
SUMMARY
Refactoring the Pinot plugin for Superset to bring it inline with how plugins operate in the latest version of Superset.
This refactoring is also addressing two bugs in the Apache Pinot DB Engine spec:
temporal
and no time grain was provided then the query would be constructed with illegal{}
around the column name causing Pinot to reject the query as syntactically invalid. The code has been updated to remove the incorrect{}
get_timestamp_expr
will be called bymodels.adhoc_column_to_sqla
andNone
is explicitly passed for thepdf
parameter. This would cause the Pinotget_timestamp_expr
to fault. To fix this, themodels.adhoc_column_to_sqla
method is updated to get thepython_date_format
for a column (if that data is available). (Without this fix, queries created for charts simply do not use the date format a user sets for a temporal column).BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
long
column asepoch_ms
and tested creating a Bar Chart V2 to confirm that the query would be correctly constructed (this tested the issue whereNone
is passed for thepdf
parameter.%Y-%m-%d
as the PDF value and confirming that when a chart is created the correct Date Time conversion function is used to parse the text date time.ADDITIONAL INFORMATION