-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[missing values] Reverting replacing missing values with zeros #4905
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4905 +/- ##
==========================================
- Coverage 64.46% 55.77% -8.7%
==========================================
Files 421 344 -77
Lines 20537 9924 -10613
Branches 2247 2245 -2
==========================================
- Hits 13240 5535 -7705
+ Misses 7170 4262 -2908
Partials 127 127
Continue to review full report at Codecov.
|
🙌 why was filling |
"""Returns the value for use as filler for a specific Column.type""" | ||
if col: | ||
if col.is_string: | ||
return ' NULL' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @xrmx added some of this logic. null
-labeled series creates issues on some visualizations. I believe the leading space has to do with sorting NULL first. We may want to reuse some of the logic I added recently around filtering that replaces None
by <NULL>
and empty stings by <empty string>
. This should only be applied to dimensions / strings. Note that it applies only to string columns, thus not "zerofying" NULL values.
I'm pretty sure that removing this will break many charts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch the ' NULL'
(with a leading space) is somewhat misleading and is inconsistent with SQL Lab which reports NULL
(or missing values) as null
. On an unrelated note SQL Lab doesn't seem to sort NULL
values correctly and it would be good to ensure that NULL
values are rendered/sorted consistency throughout the application.
@@ -961,7 +936,6 @@ def as_floats(field): | |||
return d | |||
|
|||
def get_data(self, df): | |||
df = df.fillna(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this particular visualization handles null nicely?
Some visualizations may not handle NULL properly, I'd double check that they do prior to pushing this. |
@mistercrunch this PR is very much WIP as I need to go through each visualization type and update the logic were necessary. At the moment I've only taken a pass at the box-plot visualization. The reason I put this PR out was to gain more context (like your comment around I think there is merit in having consistency between how data in rendered in SQL Lab and slices, i.e., there are inconsistencies here where SQL Lab reports |
Glad you're taking this on! I've had some conversations with @betodealmeida about sorting some of this out. I'm taking a tangent here thinking about NULL handling in time series. Users may want or expect a set of behaviors:
Option 1 is arguably the best visual representation of what is happening, but as far as I know NVD3 doesn't do this well (maybe it didn't in the past but does now...) Option 2 makes it pretty clear that data is missing, though clearly NULL is not zero and that's somewhat wrong and ugly (steep diagonals) Option 3 is misleading, especially in the absence of markers, you may care and never know that data is missing. Recently we introduced ISO notation to |
I think that option 1 above is the best^ it's super misleading to have nulls replaced with 0s, and if you interpolate between nulls (ie connect them with a line) I think it's just as misleading. users should be able to make the connection between the line segments interrupted by nulls using line color. |
Any progress on this update? Any work arounds to get the "proposed" results seen above? More Examples using Line Chart and Table View: This makes graphs look awful when the data suddenly drops to 0 because of NULLS. The below row in a table view only contains data in the first estimate column. The other columns are NULL but displays 0's. |
@DavidHassan apologies for the delay in responding. This is definitely still on my radar though my time has been focused on other Superset related projects of recent. I do hope to address this some time in June. I need to go through and test all the various chart types and ensure they correctly handle the NULL values. |
@john-bodley thank you for taking the time to work on this issue. |
addcaf4
to
e8af8bc
Compare
This issue seemed to be resolved for table views in 0.27; However, it is back again in 0.29.rc07. |
0ffa4e1
to
afbfa4f
Compare
Here is an update using a toy time-series data set which contains missing value and has the following specific features:
For context the data set is defined as: Bellow are examples of the current and proposed charts. Additionally I tested other chart types which could have been potentially impacted by this change. Line Chart (current) Line Chart (proposed) Note that there is an issue with the NDV3 line chart where the siloed data point does not render other than when one hovers over it. This issue will be fixed with the Pie Chart (current) Pie Chart (proposed) Note there is no change here, though the code has been changed to ensure that the NULL values are not dropped in the pivot-table. Internally there was some debate whether this was the right approach, but I sense i) it ensures consistency with other charts, and ii) it accurately represents the underly result set, i.e., it informs the user that a dimension exists however the value is either zero of undefined. Bar Chart (current) Bar Chart (proposed) Note this remains unchanged like the Pie Chart and ensures consistency in terms of defining either zero or undefined values. Pivot Table (current) Pivot Table (proposed) Note there seems to be an issue with any value other than an empty string being rendered as |
b95c473
to
7573888
Compare
Sounds good to follow up on max's earlier comments about whether null labeled series creates issues on some visualizations but otherwise this lgtm. |
# find the closest value above the lower outer limit | ||
series = series[series >= lower_outer_lim] | ||
return series[np.abs(series - lower_outer_lim).argmin()] | ||
return series[series >= lower_outer_lim].min() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per Wikipedia there are several representations of the whiskers, including the common Tukey boxplot,
The lowest datum still within 1.5 IQR of the lower quartile, and the highest datum still within 1.5 IQR of the upper quartile.
The definition above is the same as the more complex argmin
approach and doesn't require a second lookup. The resulting value is in agreement with the definition above.
7573888
to
4511846
Compare
The proposed changes LGTM. I can add a patch to the Have you tried passing |
@@ -1350,7 +1350,7 @@ export const controls = { | |||
'mean', | |||
'min', | |||
'max', | |||
'stdev', | |||
'std', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note stdev
is not a valid Pandas
function.
Happy to see this getting fixed. 🔥 |
(cherry picked from commit 61add60)
* Sparkline dates aren't formatting in Time Series Table (#6976) * Exclude venv for python linter to ignore * Fix NaN error * Fix the white background shown in SQL editor on drag (#7021) This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends. * Show tooltip with time frame (#6979) * Fix time filter control (#6978) * Enhancement of query context and object. (#6962) * added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795) * [fix] /superset/slice/id url is too long (#6989) (cherry picked from commit 6a4d507) * [WIP] fix user specified JSON metadata not updating dashboard on refresh (#7027) (cherry picked from commit cc58f0e) * feat: add ability to change font size in big number (#7003) * Add ability to change font sizes in Big Number * rename big number to header * Add comment to clarify font size values * Allow LIMIT to be specified in parameters (#7052) * [fix] Cursor jumping when editing chart and dashboard titles (#7038) (cherry picked from commit fc1770f) * Changing time table viz to pass formatTime a date (#7020) (cherry picked from commit 7f3c145) * [db-engine-spec] Aligning Hive/Presto partition logic (#7007) (cherry picked from commit 05be866) * [fix] explore chart from dashboard missed slice title (#7046) (cherry picked from commit a6d48d4) * fix inaccurate data calculation with adata rolling and contribution (#7035) (cherry picked from commit 0782e83) * Adding warning message for sqllab save query (#7028) (cherry picked from commit ead3d48) * [datasource] Ensuring consistent behavior of datasource editing/saving. (#7037) * Update datasource.py * Update datasource.py (cherry picked from commit c771625) * [csv-upload] Fixing message encoding (#6971) (cherry picked from commit 48431ab) * [sql-parse] Fixing LIMIT exceptions (#6963) (cherry picked from commit 3e076cb) * Adding custom control overrides (#6956) * Adding extraOverrides to line chart * Updating extraOverrides to fit with more cases * Moving extraOverrides to index.js * Removing webpack-merge in package.json * Fixing metrics control clearing metric (cherry picked from commit e619405) * [sqlparse] Fixing table name extraction for ill-defined query (#7029) (cherry picked from commit 07c340c) * [missing values] Removing replacing missing values (#4905) (cherry picked from commit 61add60) * [SQL Lab] Improved query and results tabs rendering reliability (#7082) closes #7080 (cherry picked from commit 9b58e9f) * Fix filter_box migration PR #6523 (#7066) * Fix filter_box migration PR #6523 * Fix druid-related bug (cherry picked from commit b210742) * SQL editor layout makeover (#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 71f1bbd) * [forms] Fix handling of NULLs (cherry picked from commit e83a07d) * handle null column_name in sqla and druid models (cherry picked from commit 2ff721a) * Use metric name instead of metric in filter box (#7106) (cherry picked from commit 003364e) * Bump python lib croniter to an existing version (#7132) Package maintainers should really never delete packages, but it appears this happened with croniter and resulted in breaking our builds. This PR bumps to a more recent existing version of the library (cherry picked from commit 215ed39) * Revert PR #6933 (#7162) * [bugfix] SQL Lab 'Filter Results' doesn't stick (#7104) When using a "Search Results" criteria, the subset of rows that match the criteria get displayed. While this the filter is applied, if another query is run, the filter is still active, but not displayed in the input text box. After this change, the state of the input box sticks after subsequent queries. (cherry picked from commit d5e8d66) * Injectable statsd client (#7138) * Add ability to inject statsd client; some py test/reqs updates - Updated the metrics logger to allow construction with an existing statsd client, so that it can be configured by external systems or libs. - added requirements to requirements-dev.txt which are needed to run tests-eg coverage, nose - removed dependency on mock lib, it is in python stdlib now - updated tox.ini to remove the now-superfluous deps * add license to test file, and remove blank line at EOF (cherry picked from commit ba19a62) * [Lyft-GA] Enable color consistency in a dashboard (#7135) * Enable color consistency in a dashboard Moved actions, minor UI, allowed dashboard copy Fix linting errors Undo unintentional change Updated and added unit tests Fail quietly if package has not been updated Fail quietly on dashboard copy if package is old * Update packages * Remove unnecessary code * Addressed Grace's comments * Small fix for item key * Reset chart's color during exploration * Do not reset chart form data when exploring chart * Fix double scroll bars when content of sql result table overflows horizontally (#7168) The PR substracts the scrollbar height from the height of the container of the react virtualized table so we don't see double scrollbars. (cherry picked from commit 7ffcabd) * Change number format default * Use smart formatter instead * fix merge issues * Use SMART_NUMBER
* Sparkline dates aren't formatting in Time Series Table (#6976) * Exclude venv for python linter to ignore * Fix NaN error * Fix the white background shown in SQL editor on drag (#7021) This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends. * Show tooltip with time frame (#6979) * Fix time filter control (#6978) * Enhancement of query context and object. (#6962) * added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795) * [fix] /superset/slice/id url is too long (#6989) (cherry picked from commit 6a4d507) * [WIP] fix user specified JSON metadata not updating dashboard on refresh (#7027) (cherry picked from commit cc58f0e) * feat: add ability to change font size in big number (#7003) * Add ability to change font sizes in Big Number * rename big number to header * Add comment to clarify font size values * Allow LIMIT to be specified in parameters (#7052) * [fix] Cursor jumping when editing chart and dashboard titles (#7038) (cherry picked from commit fc1770f) * Changing time table viz to pass formatTime a date (#7020) (cherry picked from commit 7f3c145) * [db-engine-spec] Aligning Hive/Presto partition logic (#7007) (cherry picked from commit 05be866) * [fix] explore chart from dashboard missed slice title (#7046) (cherry picked from commit a6d48d4) * fix inaccurate data calculation with adata rolling and contribution (#7035) (cherry picked from commit 0782e83) * Adding warning message for sqllab save query (#7028) (cherry picked from commit ead3d48) * [datasource] Ensuring consistent behavior of datasource editing/saving. (#7037) * Update datasource.py * Update datasource.py (cherry picked from commit c771625) * [csv-upload] Fixing message encoding (#6971) (cherry picked from commit 48431ab) * [sql-parse] Fixing LIMIT exceptions (#6963) (cherry picked from commit 3e076cb) * Adding custom control overrides (#6956) * Adding extraOverrides to line chart * Updating extraOverrides to fit with more cases * Moving extraOverrides to index.js * Removing webpack-merge in package.json * Fixing metrics control clearing metric (cherry picked from commit e619405) * [sqlparse] Fixing table name extraction for ill-defined query (#7029) (cherry picked from commit 07c340c) * [missing values] Removing replacing missing values (#4905) (cherry picked from commit 61add60) * [SQL Lab] Improved query and results tabs rendering reliability (#7082) closes #7080 (cherry picked from commit 9b58e9f) * Fix filter_box migration PR #6523 (#7066) * Fix filter_box migration PR #6523 * Fix druid-related bug (cherry picked from commit b210742) * SQL editor layout makeover (#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 71f1bbd) * [forms] Fix handling of NULLs (cherry picked from commit e83a07d) * handle null column_name in sqla and druid models (cherry picked from commit 2ff721a) * Use metric name instead of metric in filter box (#7106) (cherry picked from commit 003364e) * Bump python lib croniter to an existing version (#7132) Package maintainers should really never delete packages, but it appears this happened with croniter and resulted in breaking our builds. This PR bumps to a more recent existing version of the library (cherry picked from commit 215ed39) * Revert PR #6933 (#7162) * [bugfix] SQL Lab 'Filter Results' doesn't stick (#7104) When using a "Search Results" criteria, the subset of rows that match the criteria get displayed. While this the filter is applied, if another query is run, the filter is still active, but not displayed in the input text box. After this change, the state of the input box sticks after subsequent queries. (cherry picked from commit d5e8d66) * Injectable statsd client (#7138) * Add ability to inject statsd client; some py test/reqs updates - Updated the metrics logger to allow construction with an existing statsd client, so that it can be configured by external systems or libs. - added requirements to requirements-dev.txt which are needed to run tests-eg coverage, nose - removed dependency on mock lib, it is in python stdlib now - updated tox.ini to remove the now-superfluous deps * add license to test file, and remove blank line at EOF (cherry picked from commit ba19a62) * [Lyft-GA] Enable color consistency in a dashboard (#7135) * Enable color consistency in a dashboard Moved actions, minor UI, allowed dashboard copy Fix linting errors Undo unintentional change Updated and added unit tests Fail quietly if package has not been updated Fail quietly on dashboard copy if package is old * Update packages * Remove unnecessary code * Addressed Grace's comments * Small fix for item key * Reset chart's color during exploration * Do not reset chart form data when exploring chart * Fix double scroll bars when content of sql result table overflows horizontally (#7168) The PR substracts the scrollbar height from the height of the container of the react virtualized table so we don't see double scrollbars. (cherry picked from commit 7ffcabd) * Change number format default * Use smart formatter instead * fix merge issues * Use SMART_NUMBER
…equests (#7032) * Sparkline dates aren't formatting in Time Series Table (#6976) * Exclude venv for python linter to ignore * Fix NaN error * Fix the white background shown in SQL editor on drag (#7021) This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends. * Show tooltip with time frame (#6979) * Fix time filter control (#6978) * Enhancement of query context and object. (#6962) * added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795) * [fix] /superset/slice/id url is too long (#6989) (cherry picked from commit 6a4d507) * [WIP] fix user specified JSON metadata not updating dashboard on refresh (#7027) (cherry picked from commit cc58f0e) * feat: add ability to change font size in big number (#7003) * Add ability to change font sizes in Big Number * rename big number to header * Add comment to clarify font size values * Allow LIMIT to be specified in parameters (#7052) * [fix] Cursor jumping when editing chart and dashboard titles (#7038) (cherry picked from commit fc1770f) * Changing time table viz to pass formatTime a date (#7020) (cherry picked from commit 7f3c145) * [db-engine-spec] Aligning Hive/Presto partition logic (#7007) (cherry picked from commit 05be866) * [fix] explore chart from dashboard missed slice title (#7046) (cherry picked from commit a6d48d4) * fix inaccurate data calculation with adata rolling and contribution (#7035) (cherry picked from commit 0782e83) * Adding warning message for sqllab save query (#7028) (cherry picked from commit ead3d48) * [datasource] Ensuring consistent behavior of datasource editing/saving. (#7037) * Update datasource.py * Update datasource.py (cherry picked from commit c771625) * [csv-upload] Fixing message encoding (#6971) (cherry picked from commit 48431ab) * [sql-parse] Fixing LIMIT exceptions (#6963) (cherry picked from commit 3e076cb) * Adding custom control overrides (#6956) * Adding extraOverrides to line chart * Updating extraOverrides to fit with more cases * Moving extraOverrides to index.js * Removing webpack-merge in package.json * Fixing metrics control clearing metric (cherry picked from commit e619405) * [sqlparse] Fixing table name extraction for ill-defined query (#7029) (cherry picked from commit 07c340c) * [missing values] Removing replacing missing values (#4905) (cherry picked from commit 61add60) * [SQL Lab] Improved query and results tabs rendering reliability (#7082) closes #7080 (cherry picked from commit 9b58e9f) * Fix filter_box migration PR #6523 (#7066) * Fix filter_box migration PR #6523 * Fix druid-related bug (cherry picked from commit b210742) * SQL editor layout makeover (#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 71f1bbd) * [forms] Fix handling of NULLs (cherry picked from commit e83a07d) * handle null column_name in sqla and druid models (cherry picked from commit 2ff721a) * Use metric name instead of metric in filter box (#7106) (cherry picked from commit 003364e) * Bump python lib croniter to an existing version (#7132) Package maintainers should really never delete packages, but it appears this happened with croniter and resulted in breaking our builds. This PR bumps to a more recent existing version of the library (cherry picked from commit 215ed39) * Revert PR #6933 (#7162) * Add decorator for etag cache * Fetch charts with GET * Small fixes * Fix typo * Compute correct cache key; fix logging * Check perms on cached response * Revert change * If perms fail, return naked response * Fix lint * Compute cache key from all form data * Pass extra_filters in GET request * Fix pylint * Fix flake8 * Use ETags even if no cache is set * Handle adhoc filters * Raise in debug mode * Rename actions * Fix integration tests * Do POST request on new charts * Set extra/adhoc filters only in GET requests * Raise if check_perms fails * Refactor auth * Fix flake8 * Fix js unit tests * Fix js unit tests that fail in lyftga * Fix js * Sparkline dates aren't formatting in Time Series Table (#6976) * Exclude venv for python linter to ignore * Fix NaN error * Changing time table viz to pass formatTime a date (#7020) (cherry picked from commit 7f3c145) * SQL editor layout makeover (#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 71f1bbd) * Add decorator for etag cache * Fetch charts with GET * Small fixes * Fix typo * Compute correct cache key; fix logging * Check perms on cached response * Revert change * If perms fail, return naked response * Fix lint * Compute cache key from all form data * Pass extra_filters in GET request * Fix pylint * Fix flake8 * Use ETags even if no cache is set * Handle adhoc filters * Raise in debug mode * Rename actions * Fix integration tests * Do POST request on new charts * Set extra/adhoc filters only in GET requests * Raise if check_perms fails * Refactor auth * Fix flake8 * Fix js unit tests * Fix js unit tests that fail in lyftga * Fix js * Fix bad merge * Use far future when max_age=0
* Sparkline dates aren't formatting in Time Series Table (#6976) * Exclude venv for python linter to ignore * Fix NaN error * Fix the white background shown in SQL editor on drag (#7021) This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends. * Show tooltip with time frame (#6979) * Fix time filter control (#6978) * Enhancement of query context and object. (#6962) * added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795) * [fix] /superset/slice/id url is too long (#6989) (cherry picked from commit 6a4d507) * [WIP] fix user specified JSON metadata not updating dashboard on refresh (#7027) (cherry picked from commit cc58f0e) * feat: add ability to change font size in big number (#7003) * Add ability to change font sizes in Big Number * rename big number to header * Add comment to clarify font size values * Allow LIMIT to be specified in parameters (#7052) * [fix] Cursor jumping when editing chart and dashboard titles (#7038) (cherry picked from commit fc1770f) * Changing time table viz to pass formatTime a date (#7020) (cherry picked from commit 7f3c145) * [db-engine-spec] Aligning Hive/Presto partition logic (#7007) (cherry picked from commit 05be866) * [fix] explore chart from dashboard missed slice title (#7046) (cherry picked from commit a6d48d4) * fix inaccurate data calculation with adata rolling and contribution (#7035) (cherry picked from commit 0782e83) * Adding warning message for sqllab save query (#7028) (cherry picked from commit ead3d48) * [datasource] Ensuring consistent behavior of datasource editing/saving. (#7037) * Update datasource.py * Update datasource.py (cherry picked from commit c771625) * [csv-upload] Fixing message encoding (#6971) (cherry picked from commit 48431ab) * [sql-parse] Fixing LIMIT exceptions (#6963) (cherry picked from commit 3e076cb) * Adding custom control overrides (#6956) * Adding extraOverrides to line chart * Updating extraOverrides to fit with more cases * Moving extraOverrides to index.js * Removing webpack-merge in package.json * Fixing metrics control clearing metric (cherry picked from commit e619405) * [sqlparse] Fixing table name extraction for ill-defined query (#7029) (cherry picked from commit 07c340c) * [missing values] Removing replacing missing values (#4905) (cherry picked from commit 61add60) * [SQL Lab] Improved query and results tabs rendering reliability (#7082) closes #7080 (cherry picked from commit 9b58e9f) * Fix filter_box migration PR #6523 (#7066) * Fix filter_box migration PR #6523 * Fix druid-related bug (cherry picked from commit b210742) * SQL editor layout makeover (#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 71f1bbd) * [forms] Fix handling of NULLs (cherry picked from commit e83a07d) * handle null column_name in sqla and druid models (cherry picked from commit 2ff721a) * Use metric name instead of metric in filter box (#7106) (cherry picked from commit 003364e) * Bump python lib croniter to an existing version (#7132) Package maintainers should really never delete packages, but it appears this happened with croniter and resulted in breaking our builds. This PR bumps to a more recent existing version of the library (cherry picked from commit 215ed39) * Revert PR #6933 (#7162) * Celery worker for warming up cache * Remove testing changes * Add documentation * Fix lint * WIP dashboard filters * Use new cache so it works with dashboards * Add more unit tests, fix old ones * Fix flake8 and docs * Sparkline dates aren't formatting in Time Series Table (#6976) * Exclude venv for python linter to ignore * Fix NaN error * Changing time table viz to pass formatTime a date (#7020) (cherry picked from commit 7f3c145) * SQL editor layout makeover (#7102) This PR includes the following layout and css tweaks: - Using flex to layout the north and south sub panes of query pane so resizing works properly in both Chrome and Firefox - Removal of necessary wrapper divs and tweaking of css in sql lab so we can scroll to the bottom of both the table list and the results pane - Make sql lab's content not overflow vertically and layout the query result area to eliminate double scroll bars - css tweaks on the basic.html page so the loading animation appears in the center of the page across the board (cherry picked from commit 71f1bbd) * Celery worker for warming up cache * Remove testing changes * Add documentation * Fix lint * WIP dashboard filters * Use new cache so it works with dashboards * Add more unit tests, fix old ones * Fix flake8 and docs * Fix bad merge and pylint
Apologies for not having full context of this code but from a numerical standpoint replacing missing values with zero (or other values) is rarely ever a good idea as this leads to inaccuracies which surely violates the core tenant of a data analysis tool. Note Pandas (implicitly) and Numpy (explicitly) correctly handle missing values, e.g. mean and nanmean respectively.
This PR is still WIP as I've only remedied a couple of the visualization types and still to add a number of unit tests to ensure numerical correctness with missing values. I felt there was merit in sharing this now in order for me to better understand the context of replacing missing values and potential corner cases I need to be aware of.
For context here's a few examples were replacing missing values with
0
leads to incorrect results:Time-series (current):
Time-series (proposed):
Box-plot (current):
Box-plot (proposed):
Closes #3603
to: @jeffreythewang @mistercrunch @williaster @xrmx