Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DES: Q about SparseDataFrame._combine_match_columns behavior #28025

Closed
jbrockmendel opened this issue Aug 20, 2019 · 4 comments · Fixed by #28425
Closed

DES: Q about SparseDataFrame._combine_match_columns behavior #28025

jbrockmendel opened this issue Aug 20, 2019 · 4 comments · Fixed by #28425
Labels
Sparse Sparse Data Type

Comments

@jbrockmendel
Copy link
Member

There is some idiosyncratic behavior that I think is non-intentional, need to double check: cc @jreback introduced the relevant code 8ee0a89

SparseDataFrame._combine_match_index and SparseDataFrame._combine_match_columns are both for arithmetic ops with a Series object, but with different alignment treatment. When calling the constructor, combine_match_index passes default_fill_value = self._get_op_result_fill_value(other, func), which matches _combine_frame behavior. _combine_match_columns just passes default_fill_value = self.default_fill_value.

I think that _get_op_result_fill_value should be called in both cases. Can anyone confirm?

(BTW this is relevant even though the class is deprecated because consolidating the behavior will allow us to streamline code in ops)

@jorisvandenbossche
Copy link
Member

Can you give an example of the difference in behaviour?

(side question: what does DES mean?)

@jorisvandenbossche jorisvandenbossche added the Sparse Sparse Data Type label Aug 20, 2019
@jbrockmendel
Copy link
Member Author

I was thinking "Design", maybe there's a better abbrev

Can you give an example of the difference in behaviour?

Well if I change _combine_match_columns to use _get_op_result_fill_value like the others, exactly one test fails and that is with a NotImplementedError in _get_op_result_fill_value because it doesn't know what to do when other is a Series. Editing _get_op_result_fill_value to return self.default_fill_value in that case makes sense.

@jorisvandenbossche
Copy link
Member

(BTW this is relevant even though the class is deprecated because consolidating the behavior will allow us to streamline code in ops)

But not sure we should do that? If this changes the behaviour, or has the risk to do so, why not leave the deprecated code alone?
If this is a bottleneck in cleaning up the ops code (because it is now shared between normal and sparse frame), then I would rather isolate the code that is needed for sparse and keep that as is, and refactor the rest as you want without needing to care about SparseDataFrame ops.

@jbrockmendel
Copy link
Member Author

But not sure we should do that? If this changes the behaviour, or has the risk to do so, why not leave the deprecated code alone?

Streamlining the code in ops we should definitely do for performance reasons. So whether to do the consolidating vs the copy/pasting depends on the degree of invasiveness, and in this particular case the sharing approach is much less invasive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants