-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for generic series limit #16660
feat: add support for generic series limit #16660
Conversation
930dcdd
to
90158f7
Compare
f8e44f1
to
926b968
Compare
Codecov Report
@@ Coverage Diff @@
## master #16660 +/- ##
==========================================
- Coverage 76.99% 76.96% -0.03%
==========================================
Files 1007 1007
Lines 54133 54155 +22
Branches 7374 7374
==========================================
+ Hits 41678 41681 +3
- Misses 12215 12234 +19
Partials 240 240
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
926b968
to
abc24e7
Compare
@@ -1087,7 +1096,6 @@ def get_sqla_query( # pylint: disable=too-many-arguments,too-many-locals,too-ma | |||
metrics_exprs = [] | |||
|
|||
time_range_endpoints = extras.get("time_range_endpoints") | |||
groupby_exprs_with_timestamp = OrderedDict(groupby_exprs_sans_timestamp.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrderedDict
is no longer necessary, as dict
is guaranteed to be ordered as of Python 3.7.
and groupby_exprs_sans_timestamp | ||
and dttm_col | ||
): | ||
if db_engine_spec.allows_subqueries and series_limit and groupby_series_columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bycatch: I noticed we weren't checking if the engine allows subqueries
subq = subq.where(and_(*(where_clause_and + [inner_time_filter]))) | ||
inner_time_filter = [] | ||
|
||
if dttm_col and not db_engine_spec.time_groupby_inline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what db_engine_spec.time_groupby_inline
does, but I can only assume it relates to temporal filters in subqueries. If this turns out not to be the case I can improve the documentation in a subsequent PR.
abc24e7
to
b72ba42
Compare
granularity: Optional[str] | ||
columns: List[str] | ||
datasource: Optional[BaseDatasource] | ||
extras: Dict[str, Any] | ||
filter: List[QueryObjectFilterClause] | ||
from_dttm: Optional[datetime] | ||
to_dttm: Optional[datetime] | ||
granularity: Optional[str] | ||
inner_from_dttm: Optional[datetime] | ||
inner_to_dttm: Optional[datetime] | ||
is_rowcount: bool | ||
is_timeseries: bool | ||
time_shift: Optional[timedelta] | ||
groupby: List[str] | ||
metrics: Optional[List[Metric]] | ||
row_limit: int | ||
row_offset: int | ||
filter: List[QueryObjectFilterClause] | ||
timeseries_limit: int | ||
timeseries_limit_metric: Optional[Metric] | ||
order_desc: bool | ||
extras: Dict[str, Any] | ||
columns: List[str] | ||
orderby: List[OrderBy] | ||
post_processing: List[Dict[str, Any]] | ||
datasource: Optional[BaseDatasource] | ||
metrics: Optional[List[Metric]] | ||
result_type: Optional[ChartDataResultType] | ||
is_rowcount: bool | ||
row_limit: int | ||
row_offset: int | ||
series_columns: List[str] | ||
series_limit: int | ||
series_limit_metric: Optional[Metric] | ||
time_offsets: List[str] | ||
time_shift: Optional[timedelta] | ||
to_dttm: Optional[datetime] | ||
post_processing: List[Dict[str, Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In hindsight I should not have reordered these in this PR to keep it as easily reviewable as possible (my sincere apologies to reviewers!), but having these non-ordered annoyed me so much I couldn't help myself. The point here is I've added series_columns
, series_limit
and series_limit_metric
and removed timeseries_limit
and timeseries_limit_metric
as they're picked up by kwargs
and handled in the deprecated mapping methods.
b72ba42
to
f8a6ed4
Compare
superset/connectors/sqla/models.py
Outdated
"columns": columns, | ||
"from_dttm": from_dttm.isoformat() if from_dttm else None, | ||
"groupby": groupby, | ||
"metrics": metrics, | ||
"row_limit": row_limit, | ||
"row_offset": row_offset, | ||
"to_dttm": to_dttm.isoformat() if to_dttm else None, | ||
"table_columns": [col.column_name for col in self.columns], | ||
"filter": filter, | ||
"columns": [col.column_name for col in self.columns], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm renaming columns
to table_columns
, as columns
should refer to the actual query object property (I doubt the old columns
property is really being actively used).
d8f3a09
to
fe156bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this! This is long overdue. I'm super excited on seeing what we can do with Superset charts by "downgrading" time columns to a place it really deserves---to be treated the same like any other columns.
elif is_timeseries and metrics: | ||
self.series_columns = columns | ||
else: | ||
self.series_columns = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.series_columns = series_columns or (columns if is_timeseries and metrics else [])
Would this be more Pythonic?
I'm also wondering whether we should have another layer of parameter consolidation before QueryObject
to handle all the special overrides & fallbacks for legacy/deprecated parameters (e.g. timeseries_limit
vs series_limit
, columns
vs groupby
, sortby
vs orderby
). By isolating parameter consolidation in the Flask view handler layer, we reduce the number of parameters for QueryObject
itself and simply all downstream functions, which may help cleaning up deprecated code faster---all without affecting backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.series_columns = series_columns or (columns if is_timeseries and metrics else [])Would this be more Pythonic?
IMO the current implementation is more pythonic/readable, as it doesn't require unnesting the inline logic when reading it. But I'm ok with the proposed snippet, too. I'm curious to hear what @dpgaspar thinks - it would be nice to add something on this to the code style guide.
I'm also wondering whether we should have another layer of parameter consolidation before
QueryObject
so to handle all the special overrides & fallbacks for legacy/deprecated parameters (e.g.timeseries_limit
vsseries_limit
,columns
vsgroupby
,sortby
vsorderby
). By isolating parameter consolidation in the Flask view handler layer, we reduce the number of parameters forQueryObject
itself and simply all downstream functions, which may help cleaning up deprecated code faster---all without affecting backward compatibility.
I wanted to restrict this PR to the bare minimum amount of changes to contain the blast radius (whenever we're touching this type of core logic there's serious risk of regressions). I absolutely agree we should consolidate more (and should finish the consolidation work before cutting 2.0), but I'd propose doing it across multiple PRs to avoid making one big bang PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinions here, but, I do like to use simple ternaries. But current implementation seems more readable to me because of the nested ternary.
fe156bb
to
dad75b9
Compare
series_limit_metric: Optional[Metric] = None, | ||
row_limit: Optional[int] = None, | ||
row_offset: Optional[int] = None, | ||
timeseries_limit: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeseries_limit
previously defaulted to 15
. However, this was always defaulted to 0
in both viz.py
and QueryObject
, so we never actually used this magic number.
9ebf8bb
to
89819a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging this to unblock |
@villebro any reason you marked these fields as deprecated—my understanding is both the old and new name may coexist in the form-data (thought the deprecated fields are renamed)—and not just rename them and add a migration? |
This is mostly in preparation for Superset 2.0. One reason for deprecating these is to avoid introducing breaking changes right now (I assume some people may be calling both the old and new chart data endpoints for custom use cases). Another reason I didn't yet want to add a db migration for these was to avoid having to do extensive refactoring on |
@@ -254,6 +264,14 @@ def validate( | |||
"""Validate query object""" | |||
error: Optional[QueryObjectValidationError] = None | |||
all_labels = self.metric_names + self.column_names | |||
missing_series = [col for col in self.series_columns if col not in self.columns] | |||
if missing_series: | |||
_( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* feat: add support for generic series limit * refine series_columns logic * update docs * bump superset-ui * add note to UPDATING.md * remove default value for timeseries_limit
* feat: add support for generic series limit * refine series_columns logic * update docs * bump superset-ui * add note to UPDATING.md * remove default value for timeseries_limit
SUMMARY
This is the first in a series of multiple PRs that aims to add generic x-axis support. The long-term goal is the following:
is_timeseries
- a plugin should explicitly add the temporal column to the groupby if it is used as a dimension. This behavior is not changed in this PR, but removes the need for enablingis_timeseries
to be able to limit series count.timeseries_limit
andtimeseries_limit_metric
withseries_columns
,series_limit
andseries_limit_metric
.groupby
in favor ofcolumns
.Existing charts should not be affected - legacy charts will work as before, but V1 charts will emit deprecation warnings when requesting data using the deprecated field names.
Related PR on
superset-ui
: apache-superset/superset-ui#1356BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION