-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: combine_first (replace with update(..., join='outer'); for both Series/DF) #21859
Comments
cc @jreback |
One downside of Just as a point of reference, in xarray we implement def combine_first(self, other):
"""Combine two DataArray objects, with union of coordinates.
This operation follows the normal broadcasting and alignment rules of
``join='outer'``. Default to non-null values of array calling the
method. Use np.nan to fill in vacant cells after alignment.
Parameters
----------
other : DataArray
Used to fill all matching missing values in this array.
Returns
-------
DataArray
"""
return ops.fillna(self, other, join="outer") |
And for historical fun, |
my impression is that we're making more of an effort to avoid adding new keywords than we were in 2018. could we just tell users to align explicitly before calling fillna? |
This probably works in most cases. It doesn't work well for non-nullable integers where aligning first may introduce nans and convert to float. |
Fair enough. That's a price I'd be willing to live with, especially as non-nullable integers are going way eventually, but I dont care enough to push on this too hard. |
I always found the mechanics of
combine_first
very unintuitive, and constantly need to look into the docs to see what's happening. I haven't checked the git history, but it seems that the method was a direct response from wesm to a SO question (https://stackoverflow.com/a/9794891). In particular, I think this would be much more intuitive to do withdf.update
, which is a subset of what #21855 proposes -- it introducesjoin='outer'
forDataFrame.update
(currently, only'left'
is supported, but even the source code notes# TODO: Support other joins
).With that new option,
df1.combine_first(df2)
would be the same asdf1.update(df2, join='outer', overwrite=False)
, only thatcombine_first
has much fewer options and controls (i.e.filter_func
andraise_conflict)
. The only difference is thatdf.update
currently returns None, see #21858.Since it's quite a well-established function, the deprecation cycle would maybe have to be longer than usual, but I think the
update
variant is much cleaner, as well as more versatile, than this single-purpose function.The text was updated successfully, but these errors were encountered: