Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop_isel #4819

Merged
merged 3 commits into from
Jan 18, 2021
Merged

Add drop_isel #4819

merged 3 commits into from
Jan 18, 2021

Conversation

mesejo
Copy link
Contributor

@mesejo mesejo commented Jan 17, 2021

- Use get_index(dim) in drop_sel
- Add drop_isel
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent! Thanks @mesejo

I added a couple of minor comments.

It would be great to do this for DataArray too — it can be a very simple wrapper of this method.

xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Show resolved Hide resolved
xarray/tests/test_dataset.py Show resolved Hide resolved
@mathause
Copy link
Collaborator

LGTM!

It would be great to do this for DataArray too — it can be a very simple wrapper of this method.

Just to add on to this - the wrapper can be written as:

class DataArray:
    ...

    def drop_isel(self, indexers=None, **indexers_kwargs):
        """Drop index positions from this dataset.
        ...
        """
        dataset = self._to_temp_dataset()
        dataset = dataset.drop_isel(indexers=indexers, **indexers_kwargs)
        return self._from_temp_dataset(dataset)

    ...

@mesejo
Copy link
Contributor Author

mesejo commented Jan 18, 2021

This is excellent! Thanks @mesejo

I added a couple of minor comments.

It would be great to do this for DataArray too — it can be a very simple wrapper of this method.

Thanks for the feedback. I've addressed the issues.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks @mesejo !

Copy link
Collaborator

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I added two suggestions but it's also fine to merge as is.

xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/tests/test_dataset.py Outdated Show resolved Hide resolved
@max-sixty max-sixty merged commit 2956067 into pydata:master Jan 18, 2021
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few more issues we should address in a new PR

A (x, y) int64 0 2 3 5
"""

indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "drop")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "drop")
indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "drop_isel")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one we should change — @mesejo would you be up for putting the one-line PR in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can do it. No problem, but just to be sure, drop_sel also has:
labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")

Should I change it also? BTW drop_sel also uses np.random.randn in the docstrings should I change this also?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good spot, that's a mistake, it should be "drop_sel")

We're trying to move away from random values in the docstrings, but it's not urgent — if you can that would be appreciated, but correcting this & #4819 (comment) is fine to do alone.

Thank you!

@@ -2327,6 +2327,12 @@ def test_drop_index_labels(self):
with pytest.warns(DeprecationWarning):
arr.drop([0, 1, 3], dim="y", errors="ignore")

def test_drop_index_positions(self):
arr = DataArray(np.random.randn(2, 3), dims=["x", "y"])
actual = arr.drop_sel(y=[0, 1])
Copy link
Collaborator

@keewis keewis Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intentional that you use drop_sel here? This succeeds because of the change to self.get_index(dim) in drop_sel, but it really should fail. Edit: as it turns out, drop_sel is consistent with sel now, so I guess it makes sense that this does not fail. However, since the test is called test_drop_index_positions I still think this is a typo:

Suggested change
actual = arr.drop_sel(y=[0, 1])
actual = arr.drop_isel(y=[0, 1])

xarray/core/dataset.py Show resolved Hide resolved
xarray/tests/test_dataset.py Show resolved Hide resolved
@max-sixty
Copy link
Collaborator

Thanks for spotting these @keewis, sorry I missed them.

On the sel fallback to isel, though:

I might be missing something, but I don't think drop_sel should work for dimensions without coordinates: sel as well as drop_sel work on labels, so they need the coordinates

I tend to agree, but .sel does fall back to .isel for dimensions without coordinates, so the now-current code is more consistent. What are your thoughts?

@mathause
Copy link
Collaborator

I suggested using index = self.get_index(dim) in #4819. Mostly for consistency with sel & because there was no drop_isel. I don't have a strong opinion. I agree it's inconsistent - but it's also convenient and IMHO not surprising.

@keewis
Copy link
Collaborator

keewis commented Jan 19, 2021

I agree it's important to be consistent with sel. This did come as a surprise to me, though: I assumed that sel would only work in coordinate space, so having it switch to index space when there's no coordinate is somewhat inconsistent with that.

Also, neither drop_sel nor sel explain how they work on dimensions without coordinates in their docstrings. If we decide to keep this I think we should make sure to update the documentation.

dcherian added a commit to TomNicholas/xarray that referenced this pull request Jan 29, 2021
* upstream/master:
  Bugfix in list_engine (pydata#4811)
  Add drop_isel (pydata#4819)
  Fix RST.
  Remove the references to `_file_obj` outside low level code paths, change to `_close` (pydata#4809)
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 29, 2021
* upstream/master:
  speed up the repr for big MultiIndex objects (pydata#4846)
  dim -> coord in DataArray.integrate (pydata#3993)
  WIP: backend interface, now it uses subclassing  (pydata#4836)
  weighted: small improvements (pydata#4818)
  Update related-projects.rst (pydata#4844)
  iris update doc url (pydata#4845)
  Faster unstacking (pydata#4746)
  Allow swap_dims to take kwargs (pydata#4841)
  Move skip ci instructions to contributing guide (pydata#4829)
  fix issues in drop_sel and drop_isel (pydata#4828)
  Bugfix in list_engine (pydata#4811)
  Add drop_isel (pydata#4819)
  Fix RST.
  Remove the references to `_file_obj` outside low level code paths, change to `_close` (pydata#4809)
dcherian added a commit to dcherian/xarray that referenced this pull request Feb 3, 2021
* master: (458 commits)
  Add units if "unit" is in the attrs. (pydata#4850)
  speed up the repr for big MultiIndex objects (pydata#4846)
  dim -> coord in DataArray.integrate (pydata#3993)
  WIP: backend interface, now it uses subclassing  (pydata#4836)
  weighted: small improvements (pydata#4818)
  Update related-projects.rst (pydata#4844)
  iris update doc url (pydata#4845)
  Faster unstacking (pydata#4746)
  Allow swap_dims to take kwargs (pydata#4841)
  Move skip ci instructions to contributing guide (pydata#4829)
  fix issues in drop_sel and drop_isel (pydata#4828)
  Bugfix in list_engine (pydata#4811)
  Add drop_isel (pydata#4819)
  Fix RST.
  Remove the references to `_file_obj` outside low level code paths, change to `_close` (pydata#4809)
  fix decode for scale/ offset list (pydata#4802)
  Expand user dir paths (~) in open_mfdataset and to_zarr. (pydata#4795)
  add a version info step to the upstream-dev CI (pydata#4815)
  fix the ci trigger action (pydata#4805)
  scatter plot by order of the first appearance of hue (pydata#4723)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

drop_sel indices in dimension that doesn't have coordinates?
5 participants