Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control attrs of result in merge(), concat(), combine_by_coords() and combine_nested() #3877

Merged
merged 12 commits into from
Mar 24, 2020

Conversation

johnomotani
Copy link
Contributor

combine_attrs argument for merge(), concat(), combine_by_coords() and combine_nested() controls what attributes the result is given. Defaults maintain the current behaviour. Possible values (named following compat arguments) are:

  • 'drop': empty attrs on returned Dataset.
  • 'identical': all attrs must be the same on every object.
  • 'no_conflicts': attrs from all objects are combined, any that have the same name must also have the same value.
  • 'override': skip comparing and copy attrs from the first dataset to the result.
  • Closes merge drops attributes #3865
  • Tests added
  • Passes isort -rc . && black . && mypy . && flake8
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

Adds option 'promote_attrs' to DataArray.to_dataset(). By default
promote_attrs=False, maintaining current behaviour. If
promote_attrs=True, the attrs of the DataArray are shallow-copied to the
Dataset returned by to_dataset().
If the values of any shared key are not equivalent, then raises an
error.
Provides several options for how to combine the attributes of the passed
objects and give them to the returned Dataset.
Provides several options for how to combine the attributes of the passed
objects and give them to the returned DataArray or Dataset.
Provides several options for how to combine the attributes of the passed
objects and give them to the returned Dataset.
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent! Thanks @johnomotani !

Any thoughts from anyone re the kwarg name combine_coords or its options? They seem reasonable to me.

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/utils.py Outdated Show resolved Hide resolved
xarray/core/concat.py Outdated Show resolved Hide resolved
xarray/core/combine.py Outdated Show resolved Hide resolved
xarray/core/merge.py Outdated Show resolved Hide resolved
xarray/core/combine.py Outdated Show resolved Hide resolved
xarray/core/concat.py Outdated Show resolved Hide resolved
johnomotani and others added 2 commits March 23, 2020 09:48
Apply suggestions from code review

Co-Authored-By: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
No need for these arguments to be MutableMapping rather than just
Mapping.
@max-sixty
Copy link
Collaborator

Any other thoughts from anyone before we hit the big green button?

Do not use OrderedDicts any more, so name did not make sense.
@TomNicholas
Copy link
Member

Oh sweet!

Should there perhaps be another option to specify which object to get the attrs from? I'm just thinking by analogy to how open_mfdataset now lets you specify which file you want the attrs from.

@johnomotani
Copy link
Contributor Author

Should there perhaps be another option to specify which object to get the attrs from? I'm just thinking by analogy to how open_mfdataset now lets you specify which file you want the attrs from.

That would actually be nice to have in concat() for a use-case I have. It's not immediately obvious to me how to implement it though. For merge() or concat() you could give an integer index. I think open_mfdataset used the file-name (?), but there's no equivalent for combine_by_coords or combine_nested is there?

@johnomotani
Copy link
Contributor Author

For specifying which object, one possibility would be to pass an int to combine_attrs in merge(), concat() or combine_by_coords(), or a tuple of int to combine_nested, giving the index of the object to use attributes from. This feature would need new tests writing though, so I'd suggest implementing it in a new PR.

@TomNicholas
Copy link
Member

TomNicholas commented Mar 24, 2020

For specifying which object, one possibility would be to pass an int

I'm not in favour of an integer argument, for the reasons discussed for open_mfdataset here. We maybe just pass the actual dataset/array object though...

This feature would need new tests writing though, so I'd suggest implementing it in a new PR.

Fine to discuss this on a separate PR though.

- Control over attributes of result in :py:func:`merge`, :py:func:`concat`,
:py:func:`combine_by_coords` and :py:func:`combine_nested` using
combine_attrs keyword argument. (:issue:`3865`, :pull:`3877`)
By `John Omotani <https://github.com/johnomotani>`_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we merge master and move this to 0.16.0?

@max-sixty
Copy link
Collaborator

max-sixty commented Mar 24, 2020

Let's merge later today unless there are other comments?

@max-sixty
Copy link
Collaborator

Merging — any other feedback please post and we can follow-up on.

Thanks a lot @johnomotani , great PR!

@max-sixty max-sixty merged commit d8bb620 into pydata:master Mar 24, 2020
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 28, 2020
* upstream/master: (54 commits)
  Limit repr of arrays containing long strings (pydata#3900)
  expose a few zarr backend functions as semi-public api (pydata#3897)
  Use drawstyle instead of linestyle in plot.step. (pydata#3274)
  Implementation of polyfit and polyval (pydata#3733)
  misplaced quote in whatsnew (pydata#3889)
  Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887)
  Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877)
  xfail test_uamiv_format_write (pydata#3885)
  Use `fixes` in PR template (pydata#3886)
  Tweaks to "how_to_release" (pydata#3882)
  whatsnew section for 0.16.0
  Release v0.15.1
  whatsnew for 0.15.1 (pydata#3879)
  update panel documentation (pydata#3880)
  reword the whats-new entry for unit support (pydata#3878)
  Raise error when assigning to IndexVariable.values & IndexVariable.data (pydata#3862)
  Re-enable tests xfailed in pydata#3808 and fix new CFTimeIndex failures due to upstream changes (pydata#3874)
  add spacing in the versions section of the issue report (pydata#3876)
  map_blocks: allow user function to add new unindexed dimension. (pydata#3817)
  Delete associated indexes when deleting coordinate variables. (pydata#3840)
  ...
dcherian added a commit to MeraX/xarray that referenced this pull request Mar 29, 2020
* upstream/master: (75 commits)
  Implement idxmax and idxmin functions (pydata#3871)
  Update pre-commit-config.yaml (pydata#3911)
  Revert "Use `fixes` in PR template (pydata#3886)" (pydata#3912)
  update the docstring of diff (pydata#3909)
  Un-xfail test_dayofyear_after_cftime_range (pydata#3907)
  Limit repr of arrays containing long strings (pydata#3900)
  expose a few zarr backend functions as semi-public api (pydata#3897)
  Use drawstyle instead of linestyle in plot.step. (pydata#3274)
  Implementation of polyfit and polyval (pydata#3733)
  misplaced quote in whatsnew (pydata#3889)
  Rename ordered_dict_intersection -> compat_dict_intersection (pydata#3887)
  Control attrs of result in `merge()`, `concat()`, `combine_by_coords()` and `combine_nested()` (pydata#3877)
  xfail test_uamiv_format_write (pydata#3885)
  Use `fixes` in PR template (pydata#3886)
  Tweaks to "how_to_release" (pydata#3882)
  whatsnew section for 0.16.0
  Release v0.15.1
  whatsnew for 0.15.1 (pydata#3879)
  update panel documentation (pydata#3880)
  reword the whats-new entry for unit support (pydata#3878)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

merge drops attributes
3 participants