Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop_duplicates for dims #5239

Merged
merged 15 commits into from
May 15, 2021
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,7 @@ DataArray contents
DataArray.swap_dims
DataArray.expand_dims
DataArray.drop_vars
DataArray.drop_duplicates
DataArray.reset_coords
DataArray.copy

Expand Down
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ v0.18.1 (unreleased)

New Features
~~~~~~~~~~~~

- Implement :py:meth:`DataArray.drop_duplicates`
to remove duplicate dimension values (:pull:`5239`).
By `Andrew Huang <https://github.com/ahuang11>`_.
- allow passing ``combine_attrs`` strategy names to the ``keep_attrs`` parameter of
:py:func:`apply_ufunc` (:pull:`5041`)
By `Justus Magin <https://github.com/keewis>`_.
Expand Down
27 changes: 27 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -4573,6 +4573,33 @@ def curvefit(
kwargs=kwargs,
)

def drop_duplicates(
self,
dim: Hashable,
keep: Union[
str,
bool,
] = "first",
):
"""Returns a new DataArray with duplicate dimension values removed.
Parameters
----------
dim : dimension label, optional
keep : {"first", "last", False}, default: "first"
Determines which duplicates (if any) to keep.
- ``"first"`` : Drop duplicates except for the first occurrence.
- ``"last"`` : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.

Returns
-------
DataArray
"""
if dim not in self.dims:
raise ValueError(f"'{dim}' not found in dimensions")
indexes = {dim: ~self.get_index(dim).duplicated(keep=keep)}
return self.isel(indexes)

# this needs to be at the end, or mypy will confuse with `str`
# https://mypy.readthedocs.io/en/latest/common_issues.html#dealing-with-conflicting-names
str = utils.UncachedAccessor(StringAccessor)
21 changes: 21 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -7434,3 +7434,24 @@ def test_clip(da):
# Unclear whether we want this work, OK to adjust the test when we have decided.
with pytest.raises(ValueError, match="arguments without labels along dimension"):
result = da.clip(min=da.mean("x"), max=da.mean("a").isel(x=[0, 1]))


@pytest.mark.parametrize("keep", ["first", "last", False])
def test_drop_duplicates(keep):
ds = xr.DataArray(
[0, 5, 6, 7], dims="time", coords={"time": [0, 0, 1, 2]}, name="test"
)

if keep == "first":
data = [0, 6, 7]
time = [0, 1, 2]
elif keep == "last":
data = [5, 6, 7]
time = [0, 1, 2]
else:
data = [6, 7]
time = [1, 2]

expected = xr.DataArray(data, dims="time", coords={"time": time}, name="test")
result = ds.drop_duplicates("time", keep=keep)
assert_equal(expected, result)