[druid] fix bug around handling NULLs #4358

mistercrunch · 2018-02-06T16:57:07Z

fillna would miss out on identifying STRING columns for Druid and
replace None in string columns with a numeric 0. This
mixed type column would confuse
pandas down the line on some operations like df.pivot_table.

superseeds #4236

@xrmx 👀

mistercrunch · 2018-02-06T17:02:01Z

Note that get_fillna_for_columns assumes that the dataframe's column names are bound to table column names and related types. While this is generally the case, this is not enforced, and the bug could show up again if a string column in a dataframe is not named after a database column.

Viz like the Sankey may use specific columns as a source but rename dataframe columns to source and target for instance. In this case the current logic would fail at identifying the df columns as strings and the mixed type issue may re-appear.

xrmx · 2018-02-06T17:01:57Z

superset/viz.py

-                return ' NULL'
+        if col:
+            if col.is_string:
+                return 'NULL'


The space was added on purpose for having NULL entries before other values after sorting

gotcha added it back

mistercrunch · 2018-02-06T17:04:54Z

A more final solution (outside the scope of this bugfix PR) would be to get better at typing dataframes (currently we have a lot of object types depending on the db engine) and pandas is much slower when operating over object types. We should infer type earlier on in the pipeline and have fillna logic be based on dataframe types instead of column types. (though dataframe types can be inferred from the database types in the first place, though drivers/pandas may fail at that). @betodealmeida and I had a conversation about that last week.

fillna would miss out on identifying STRING columns for Druid and replace None in string columns with a numeric `0`. This mixed type column would confuse pandas down the line on some operations like `df.pivot_table`.

mistercrunch mentioned this pull request Feb 6, 2018

Fix msg "'<' not supported between instances of 'str' and 'int'" #4236

Closed

xrmx suggested changes Feb 6, 2018

View reviewed changes

[druid] fix bug around handling NULLs

ae4ca3a

fillna would miss out on identifying STRING columns for Druid and replace None in string columns with a numeric `0`. This mixed type column would confuse pandas down the line on some operations like `df.pivot_table`.

mistercrunch force-pushed the fix_cannot_compare branch from 3cf2a93 to ae4ca3a Compare February 6, 2018 17:38

xrmx approved these changes Feb 6, 2018

View reviewed changes

mistercrunch merged commit 31a0b6e into apache:master Feb 7, 2018

mistercrunch deleted the fix_cannot_compare branch February 7, 2018 16:19

jeffreythewang mentioned this pull request Mar 6, 2018

'Null' value is being displayed as '0' in the data retrieved. #3603

Closed

m-ajay mentioned this pull request Feb 12, 2024

[Snyk] Fix for 1 vulnerabilities m-ajay/superset#263

Open

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0 labels Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[druid] fix bug around handling NULLs #4358

[druid] fix bug around handling NULLs #4358

mistercrunch commented Feb 6, 2018

mistercrunch commented Feb 6, 2018

xrmx Feb 6, 2018 •

edited

Loading

mistercrunch Feb 6, 2018

mistercrunch commented Feb 6, 2018 •

edited

Loading

[druid] fix bug around handling NULLs #4358

[druid] fix bug around handling NULLs #4358

Conversation

mistercrunch commented Feb 6, 2018

mistercrunch commented Feb 6, 2018

xrmx Feb 6, 2018 • edited Loading

Choose a reason for hiding this comment

mistercrunch Feb 6, 2018

Choose a reason for hiding this comment

mistercrunch commented Feb 6, 2018 • edited Loading

xrmx Feb 6, 2018 •

edited

Loading

mistercrunch commented Feb 6, 2018 •

edited

Loading