DES: Q about SparseDataFrame._combine_match_columns behavior #28025

jbrockmendel · 2019-08-20T01:49:52Z

There is some idiosyncratic behavior that I think is non-intentional, need to double check: cc @jreback introduced the relevant code 8ee0a89

SparseDataFrame._combine_match_index and SparseDataFrame._combine_match_columns are both for arithmetic ops with a Series object, but with different alignment treatment. When calling the constructor, combine_match_index passes default_fill_value = self._get_op_result_fill_value(other, func), which matches _combine_frame behavior. _combine_match_columns just passes default_fill_value = self.default_fill_value.

I think that _get_op_result_fill_value should be called in both cases. Can anyone confirm?

(BTW this is relevant even though the class is deprecated because consolidating the behavior will allow us to streamline code in ops)

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2019-08-20T06:38:44Z

Can you give an example of the difference in behaviour?

(side question: what does DES mean?)

jbrockmendel · 2019-08-20T13:59:40Z

I was thinking "Design", maybe there's a better abbrev

Can you give an example of the difference in behaviour?

Well if I change _combine_match_columns to use _get_op_result_fill_value like the others, exactly one test fails and that is with a NotImplementedError in _get_op_result_fill_value because it doesn't know what to do when other is a Series. Editing _get_op_result_fill_value to return self.default_fill_value in that case makes sense.

jorisvandenbossche · 2019-08-20T15:22:56Z

(BTW this is relevant even though the class is deprecated because consolidating the behavior will allow us to streamline code in ops)

But not sure we should do that? If this changes the behaviour, or has the risk to do so, why not leave the deprecated code alone?
If this is a bottleneck in cleaning up the ops code (because it is now shared between normal and sparse frame), then I would rather isolate the code that is needed for sparse and keep that as is, and refactor the rest as you want without needing to care about SparseDataFrame ops.

jbrockmendel · 2019-08-20T19:44:15Z

But not sure we should do that? If this changes the behaviour, or has the risk to do so, why not leave the deprecated code alone?

Streamlining the code in ops we should definitely do for performance reasons. So whether to do the consolidating vs the copy/pasting depends on the degree of invasiveness, and in this particular case the sharing approach is much less invasive.

jbrockmendel mentioned this issue Aug 20, 2019

REF: standardize usage in DataFrame vs SparseDataFrame ops #28027

Merged

jorisvandenbossche added the Sparse Sparse Data Type label Aug 20, 2019

TomAugspurger mentioned this issue Sep 16, 2019

Remove SparseSeries and SparseDataFrame #28425

Merged

TomAugspurger closed this as completed in #28425 Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DES: Q about SparseDataFrame._combine_match_columns behavior #28025

DES: Q about SparseDataFrame._combine_match_columns behavior #28025

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

DES: Q about SparseDataFrame._combine_match_columns behavior #28025

DES: Q about SparseDataFrame._combine_match_columns behavior #28025

Comments

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019

jorisvandenbossche commented Aug 20, 2019

jbrockmendel commented Aug 20, 2019