Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop_duplicates for dims #5239

Merged
merged 15 commits into from
May 15, 2021
Merged

Conversation

ahuang11
Copy link
Contributor

@ahuang11 ahuang11 commented May 1, 2021

Ruined #5089 with reverting so remaking the PR for just dims

  • Closes #xxxx
  • Tests added
  • Passes pre-commit run --all-files
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@pep8speaks
Copy link

pep8speaks commented May 1, 2021

Hello @ahuang11! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-13 17:47:30 UTC

@ahuang11 ahuang11 mentioned this pull request May 1, 2021
5 tasks
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually -- there is still one unresolved design concern here.

If you drop duplicates over multiple variables at once, what dimensions should the result have? In particular -- should it have the original dimensions, or should all dimensions involved be combined into one?

The latter might seem a little crazy now, but would make more sense once we allow dropping over multi-dimensional variables.

If we really want to avoid any possible controversy here, it might be best to stick to only supporting one variable for now, too.

@max-sixty
Copy link
Collaborator

The docs are failing with:

/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/5239/xarray/core/dataarray.py:docstring of xarray.core.dataarray.DataArray.drop_duplicates:3: WARNING: Unexpected indentation.
/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/5239/xarray/core/dataarray.py:docstring of xarray.core.dataarray.DataArray.drop_duplicates:4: WARNING: Block quote ends without a blank line; unexpected unindent.
looking for now-outdated files... none found

@max-sixty
Copy link
Collaborator

If you drop duplicates over multiple variables at once, what dimensions should the result have? In particular -- should it have the original dimensions, or should all dimensions involved be combined into one?

The latter might seem a little crazy now, but would make more sense once we allow dropping over multi-dimensional variables.

If we really want to avoid any possible controversy here, it might be best to stick to only supporting one variable for now, too.

For me this case supports consistent behavior over any number of dims! Which would mean rejecting the "stack only if you supply exactly ndims" proposal.

Completely fine with restricting to one dim for the moment and seeing how that goes (it's always possible to pass .drop_duplicates_coords(dim1).drop_duplicates_coords(dim2).

Were we going with drop_duplicates or drop_duplicates_coords? No strong view from me at all.

xarray/core/dataarray.py Outdated Show resolved Hide resolved
Co-authored-by: keewis <keewis@users.noreply.github.com>
@keewis
Copy link
Collaborator

keewis commented May 1, 2021

the docs build can probably be fixed by adding newlines before the section headers ("Parameters" / "Returns")

@max-sixty
Copy link
Collaborator

@ahuang11 we discussed this on the core team call. People are excited to merge this, and appreciate you bearing with the changes.

I suggested that we narrow this even further to one dimension and merge, so we can benefit from this now and consider additions from there. Would that be OK with you?

@ahuang11
Copy link
Contributor Author

ahuang11 commented May 12, 2021 via email

@max-sixty
Copy link
Collaborator

@shoyer could I confirm that limiting to a single dimension satisfies this? You say "multiple variables" here. I mentioned a single dimension on the call but maybe wasn't clear.

If you drop duplicates over multiple variables at once, what dimensions should the result have? In particular -- should it have the original dimensions, or should all dimensions involved be combined into one?

The latter might seem a little crazy now, but would make more sense once we allow dropping over multi-dimensional variables.

If we really want to avoid any possible controversy here, it might be best to stick to only supporting one variable for now, too.

@ahuang11
Copy link
Contributor Author

Dont understand this

xarray/core/dataarray.py:4605: error: Argument 1 to "isel" of "DataArray" has incompatible type "Dict[str, Any]"; expected "Optional[Mapping[Hashable, Any]]"  [arg-type]
Found 1 error in 1 file (checked 140 source files)

self.isel(indexes)

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
ahuang11 and others added 3 commits May 12, 2021 23:06
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
@max-sixty max-sixty added the plan to merge Final call for comments label May 13, 2021
* upstream/master:
  Update release guide (pydata#5274)
  combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041)
  Explained what a deprecation cycle is (pydata#5289)
  Code cleanup (pydata#5234)
  FacetGrid docstrings (pydata#5293)
  Add whats new for dataset interpolation with non-numerics (pydata#5297)
  Allow dataset interpolation with different datatypes (pydata#5008)
@dcherian
Copy link
Contributor

@shoyer could I confirm that limiting to a single dimension satisfies this? You say "multiple variables" here. I mentioned a single dimension on the call but maybe wasn't clear.

@shoyer can you kindly merge this if you're happy with it?

@dcherian dcherian mentioned this pull request May 13, 2021
9 tasks
@max-sixty
Copy link
Collaborator

Merging but I or @ahuang11 can do any follow-ups.

Thank you v much @ahuang11 — appreciate this was a lot of work and happy to have you contribute such a good feature.

@max-sixty max-sixty merged commit b1bd6c8 into pydata:master May 15, 2021
@keewis keewis mentioned this pull request May 16, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants