feat: add support for generic series limit #16660

villebro · 2021-09-10T08:06:53Z

SUMMARY

This is the first in a series of multiple PRs that aims to add generic x-axis support. The long-term goal is the following:

deprecate is_timeseries - a plugin should explicitly add the temporal column to the groupby if it is used as a dimension. This behavior is not changed in this PR, but removes the need for enabling is_timeseries to be able to limit series count.
deprecate timeseries_limit and timeseries_limit_metric with series_columns, series_limit and series_limit_metric.
Deprecate groupby in favor of columns.

Existing charts should not be affected - legacy charts will work as before, but V1 charts will emit deprecation warnings when requesting data using the deprecated field names.

Related PR on superset-ui: apache-superset/superset-ui#1356

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2021-09-10T09:45:28Z

Codecov Report

Merging #16660 (40764d8) into master (092ef5b) will decrease coverage by 0.02%.
The diff coverage is 86.15%.

@@            Coverage Diff             @@
##           master   #16660      +/-   ##
==========================================
- Coverage   76.99%   76.96%   -0.03%     
==========================================
  Files        1007     1007              
  Lines       54133    54155      +22     
  Branches     7374     7374              
==========================================
+ Hits        41678    41681       +3     
- Misses      12215    12234      +19     
  Partials      240      240

Flag	Coverage Δ
hive	`81.31% <86.15%> (-0.05%)`	⬇️
javascript	`71.30% <ø> (ø)`
mysql	`81.66% <86.15%> (-0.05%)`	⬇️
postgres	`81.76% <86.15%> (-0.05%)`	⬇️
presto	`81.65% <86.15%> (-0.01%)`	⬇️
python	`82.27% <86.15%> (-0.06%)`	⬇️
sqlite	`81.33% <86.15%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/common/query_actions.py	`92.95% <ø> (-0.10%)`	⬇️
superset/common/query_context.py	`90.74% <ø> (ø)`
superset/db_engine_specs/base.py	`88.39% <ø> (ø)`
superset/connectors/sqla/models.py	`87.95% <75.75%> (-1.76%)`	⬇️
superset/common/query_object.py	`90.62% <96.55%> (-0.05%)`	⬇️
superset/charts/schemas.py	`100.00% <100.00%> (ø)`
superset/db_engine_specs/presto.py	`89.95% <0.00%> (-0.42%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 092ef5b...40764d8. Read the comment docs.

villebro · 2021-09-10T11:59:31Z

superset/connectors/sqla/models.py

@@ -1087,7 +1096,6 @@ def get_sqla_query(  # pylint: disable=too-many-arguments,too-many-locals,too-ma
            metrics_exprs = []

        time_range_endpoints = extras.get("time_range_endpoints")
-        groupby_exprs_with_timestamp = OrderedDict(groupby_exprs_sans_timestamp.items())


OrderedDict is no longer necessary, as dict is guaranteed to be ordered as of Python 3.7.

https://docs.python.org/3.8/library/stdtypes.html

villebro · 2021-09-10T12:00:26Z

superset/connectors/sqla/models.py

-            and groupby_exprs_sans_timestamp
-            and dttm_col
-        ):
+        if db_engine_spec.allows_subqueries and series_limit and groupby_series_columns:


Bycatch: I noticed we weren't checking if the engine allows subqueries

villebro · 2021-09-10T12:02:05Z

superset/connectors/sqla/models.py

-                subq = subq.where(and_(*(where_clause_and + [inner_time_filter])))
+                inner_time_filter = []
+
+                if dttm_col and not db_engine_spec.time_groupby_inline:


I have no idea what db_engine_spec.time_groupby_inline does, but I can only assume it relates to temporal filters in subqueries. If this turns out not to be the case I can improve the documentation in a subsequent PR.

villebro · 2021-09-10T12:09:07Z

superset/common/query_object.py

-    granularity: Optional[str]
+    columns: List[str]
+    datasource: Optional[BaseDatasource]
+    extras: Dict[str, Any]
+    filter: List[QueryObjectFilterClause]
    from_dttm: Optional[datetime]
-    to_dttm: Optional[datetime]
+    granularity: Optional[str]
    inner_from_dttm: Optional[datetime]
    inner_to_dttm: Optional[datetime]
+    is_rowcount: bool
    is_timeseries: bool
-    time_shift: Optional[timedelta]
-    groupby: List[str]
-    metrics: Optional[List[Metric]]
-    row_limit: int
-    row_offset: int
-    filter: List[QueryObjectFilterClause]
-    timeseries_limit: int
-    timeseries_limit_metric: Optional[Metric]
    order_desc: bool
-    extras: Dict[str, Any]
-    columns: List[str]
    orderby: List[OrderBy]
-    post_processing: List[Dict[str, Any]]
-    datasource: Optional[BaseDatasource]
+    metrics: Optional[List[Metric]]
    result_type: Optional[ChartDataResultType]
-    is_rowcount: bool
+    row_limit: int
+    row_offset: int
+    series_columns: List[str]
+    series_limit: int
+    series_limit_metric: Optional[Metric]
    time_offsets: List[str]
+    time_shift: Optional[timedelta]
+    to_dttm: Optional[datetime]
+    post_processing: List[Dict[str, Any]]


In hindsight I should not have reordered these in this PR to keep it as easily reviewable as possible (my sincere apologies to reviewers!), but having these non-ordered annoyed me so much I couldn't help myself. The point here is I've added series_columns, series_limit and series_limit_metric and removed timeseries_limit and timeseries_limit_metric as they're picked up by kwargs and handled in the deprecated mapping methods.

villebro · 2021-09-13T09:27:22Z

superset/connectors/sqla/models.py

+            "columns": columns,
            "from_dttm": from_dttm.isoformat() if from_dttm else None,
            "groupby": groupby,
            "metrics": metrics,
            "row_limit": row_limit,
            "row_offset": row_offset,
            "to_dttm": to_dttm.isoformat() if to_dttm else None,
+            "table_columns": [col.column_name for col in self.columns],
            "filter": filter,
-            "columns": [col.column_name for col in self.columns],


Here I'm renaming columns to table_columns, as columns should refer to the actual query object property (I doubt the old columns property is really being actively used).

ktmud

Thanks for tackling this! This is long overdue. I'm super excited on seeing what we can do with Superset charts by "downgrading" time columns to a place it really deserves---to be treated the same like any other columns.

ktmud · 2021-09-14T19:59:16Z

superset/common/query_object.py

+        elif is_timeseries and metrics:
+            self.series_columns = columns
+        else:
+            self.series_columns = []


self.series_columns = series_columns or (columns if is_timeseries and metrics else [])

Would this be more Pythonic?

I'm also wondering whether we should have another layer of parameter consolidation before QueryObject to handle all the special overrides & fallbacks for legacy/deprecated parameters (e.g. timeseries_limit vs series_limit, columns vs groupby, sortby vs orderby). By isolating parameter consolidation in the Flask view handler layer, we reduce the number of parameters for QueryObject itself and simply all downstream functions, which may help cleaning up deprecated code faster---all without affecting backward compatibility.

self.series_columns = series_columns or (columns if is_timeseries and metrics else [])

Would this be more Pythonic?

IMO the current implementation is more pythonic/readable, as it doesn't require unnesting the inline logic when reading it. But I'm ok with the proposed snippet, too. I'm curious to hear what @dpgaspar thinks - it would be nice to add something on this to the code style guide.

I'm also wondering whether we should have another layer of parameter consolidation before QueryObject so to handle all the special overrides & fallbacks for legacy/deprecated parameters (e.g. timeseries_limit vs series_limit, columns vs groupby, sortby vs orderby). By isolating parameter consolidation in the Flask view handler layer, we reduce the number of parameters for QueryObject itself and simply all downstream functions, which may help cleaning up deprecated code faster---all without affecting backward compatibility.

I wanted to restrict this PR to the bare minimum amount of changes to contain the blast radius (whenever we're touching this type of core logic there's serious risk of regressions). I absolutely agree we should consolidate more (and should finish the consolidation work before cutting 2.0), but I'd propose doing it across multiple PRs to avoid making one big bang PR.

I have no strong opinions here, but, I do like to use simple ternaries. But current implementation seems more readable to me because of the nested ternary.

villebro · 2021-09-15T08:55:45Z

superset/connectors/sqla/models.py

+        series_limit_metric: Optional[Metric] = None,
+        row_limit: Optional[int] = None,
+        row_offset: Optional[int] = None,
+        timeseries_limit: Optional[int] = None,


timeseries_limit previously defaulted to 15. However, this was always defaulted to 0 in both viz.py and QueryObject, so we never actually used this magic number.

zhaoyongjie

LGTM

villebro · 2021-09-16T09:09:04Z

Merging this to unblock superset-ui development

john-bodley · 2021-09-23T17:15:39Z

@villebro any reason you marked these fields as deprecated—my understanding is both the old and new name may coexist in the form-data (thought the deprecated fields are renamed)—and not just rename them and add a migration?

villebro · 2021-09-23T18:05:58Z

@villebro any reason you marked these fields as deprecated—my understanding is both the old and new name may coexist in the form-data (thought the deprecated fields are renamed)—and not just rename them and add a migration?

This is mostly in preparation for Superset 2.0. One reason for deprecating these is to avoid introducing breaking changes right now (I assume some people may be calling both the old and new chart data endpoints for custom use cases). Another reason I didn't yet want to add a db migration for these was to avoid having to do extensive refactoring on viz.py. But I fully intend for us to do this later this year when we start preparing in earnest for 2.0.

ofekisr · 2021-10-03T10:27:27Z

superset/common/query_object.py

@@ -254,6 +264,14 @@ def validate(
        """Validate query object"""
        error: Optional[QueryObjectValidationError] = None
        all_labels = self.metric_names + self.column_names
+        missing_series = [col for col in self.series_columns if col not in self.columns]
+        if missing_series:
+            _(


@villebro you forgot to raise the exception here

#16946 fix it

* feat: add support for generic series limit * refine series_columns logic * update docs * bump superset-ui * add note to UPDATING.md * remove default value for timeseries_limit

superset-github-bot bot added the preset-io label Sep 10, 2021

pull-request-size bot added the size/L label Sep 10, 2021

villebro force-pushed the villebro/generic-series-limit branch from 930dcdd to 90158f7 Compare September 10, 2021 09:03

villebro mentioned this pull request Sep 10, 2021

feat(core): add support for non-temporal series limit apache-superset/superset-ui#1356

Merged

villebro force-pushed the villebro/generic-series-limit branch 2 times, most recently from f8e44f1 to 926b968 Compare September 10, 2021 09:43

villebro force-pushed the villebro/generic-series-limit branch from 926b968 to abc24e7 Compare September 10, 2021 11:54

villebro commented Sep 10, 2021

View reviewed changes

villebro force-pushed the villebro/generic-series-limit branch from abc24e7 to b72ba42 Compare September 10, 2021 12:07

villebro commented Sep 10, 2021

View reviewed changes

villebro force-pushed the villebro/generic-series-limit branch from b72ba42 to f8a6ed4 Compare September 10, 2021 12:22

villebro requested review from ktmud and zhaoyongjie September 10, 2021 12:36

villebro mentioned this pull request Sep 10, 2021

Allow different x-axis for line charts #822

Closed

villebro commented Sep 13, 2021

View reviewed changes

villebro force-pushed the villebro/generic-series-limit branch from d8f3a09 to fe156bb Compare September 14, 2021 07:11

ktmud reviewed Sep 14, 2021

View reviewed changes

villebro added 5 commits September 15, 2021 09:28

feat: add support for generic series limit

b2eb7b4

refine series_columns logic

928f47a

update docs

c10e28f

bump superset-ui

a9eb5b9

add note to UPDATING.md

dad75b9

villebro force-pushed the villebro/generic-series-limit branch from fe156bb to dad75b9 Compare September 15, 2021 06:37

villebro changed the title ~~[WIP] feat: add support for generic series limit~~ feat: add support for generic series limit Sep 15, 2021

villebro commented Sep 15, 2021

View reviewed changes

remove default value for timeseries_limit

89819a0

villebro force-pushed the villebro/generic-series-limit branch from 9ebf8bb to 89819a0 Compare September 15, 2021 10:20

Merge branch 'master' into villebro/generic-series-limit

40764d8

zhaoyongjie approved these changes Sep 16, 2021

View reviewed changes

villebro merged commit 836b5e2 into apache:master Sep 16, 2021

villebro deleted the villebro/generic-series-limit branch September 16, 2021 09:09

villebro mentioned this pull request Sep 29, 2021

fix(sqla): allow series limit without subquery support #16896

Merged

9 tasks

ofekisr mentioned this pull request Oct 3, 2021

fix(query_object): missing series validation not raised an exception #16946

Merged

ofekisr reviewed Oct 3, 2021

View reviewed changes

john-bodley mentioned this pull request Oct 25, 2021

fix: Set default series limit to zero apache-superset/superset-ui#1428

Closed

siegfriedweber mentioned this pull request Feb 9, 2022

Add Superset 1.4.1 stackabletech/docker-images#47

Closed

villebro mentioned this pull request Aug 23, 2022

fix(explore): Prevent unnecessary series limit subquery #21154

Merged

9 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.4.0 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for generic series limit #16660

feat: add support for generic series limit #16660

villebro commented Sep 10, 2021 •

edited

Loading

codecov bot commented Sep 10, 2021 •

edited

Loading

villebro Sep 10, 2021

villebro Sep 10, 2021

villebro Sep 10, 2021

villebro Sep 10, 2021 •

edited

Loading

villebro Sep 13, 2021

ktmud left a comment

ktmud Sep 14, 2021 •

edited

Loading

villebro Sep 15, 2021

dpgaspar Sep 15, 2021

villebro Sep 15, 2021

zhaoyongjie left a comment

villebro commented Sep 16, 2021

john-bodley commented Sep 23, 2021 •

edited

Loading

villebro commented Sep 23, 2021

ofekisr Oct 3, 2021

feat: add support for generic series limit #16660

feat: add support for generic series limit #16660

Conversation

villebro commented Sep 10, 2021 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Sep 10, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ktmud left a comment

Choose a reason for hiding this comment

ktmud Sep 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhaoyongjie left a comment

Choose a reason for hiding this comment

villebro commented Sep 16, 2021

john-bodley commented Sep 23, 2021 • edited Loading

villebro commented Sep 23, 2021

Choose a reason for hiding this comment

villebro commented Sep 10, 2021 •

edited

Loading

codecov bot commented Sep 10, 2021 •

edited

Loading

villebro Sep 10, 2021 •

edited

Loading

ktmud Sep 14, 2021 •

edited

Loading

john-bodley commented Sep 23, 2021 •

edited

Loading