Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deepcopy of Variables and DataArrays #7089

Merged
merged 9 commits into from
Sep 29, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 23 additions & 25 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,70 +42,68 @@ Deprecations
Bug fixes
~~~~~~~~~

- Allow reading netcdf files where the 'units' attribute is a number(:pull:`7085`)
- Allow reading netcdf files where the 'units' attribute is a number. (:pull:`7085`)
By `Ghislain Picard <https://github.com/ghislainp>`_.
- Allow decoding of 0 sized datetimes(:issue:`1329`, :pull:`6882`)
- Allow decoding of 0 sized datetimes. (:issue:`1329`, :pull:`6882`)
By `Deepak Cherian <https://github.com/dcherian>`_.
- Make sure DataArray.name is always a string when used as label for plotting.
(:issue:`6826`, :pull:`6832`)
- Make sure DataArray.name is always a string when used as label for plotting. (:issue:`6826`, :pull:`6832`)
By `Jimmy Westling <https://github.com/illviljan>`_.
- :py:attr:`DataArray.nbytes` now uses the ``nbytes`` property of the underlying array if available.
(:pull:`6797`)
- :py:attr:`DataArray.nbytes` now uses the ``nbytes`` property of the underlying array if available. (:pull:`6797`)
By `Max Jones <https://github.com/maxrjones>`_.
- Rely on the array backend for string formatting. (:pull:`6823`).
By `Jimmy Westling <https://github.com/illviljan>`_.
- Fix incompatibility with numpy 1.20 (:issue:`6818`, :pull:`6821`)
- Fix incompatibility with numpy 1.20. (:issue:`6818`, :pull:`6821`)
By `Michael Niklas <https://github.com/headtr1ck>`_.
- Fix side effects on index coordinate metadata after aligning objects. (:issue:`6852`, :pull:`6857`)
By `Benoît Bovy <https://github.com/benbovy>`_.
- Make FacetGrid.set_titles send kwargs correctly using `handle.udpate(kwargs)`.
(:issue:`6839`, :pull:`6843`)
- Make FacetGrid.set_titles send kwargs correctly using `handle.udpate(kwargs)`. (:issue:`6839`, :pull:`6843`)
By `Oliver Lopez <https://github.com/lopezvoliver>`_.
- Fix bug where index variables would be changed inplace (:issue:`6931`, :pull:`6938`)
- Fix bug where index variables would be changed inplace. (:issue:`6931`, :pull:`6938`)
By `Michael Niklas <https://github.com/headtr1ck>`_.
- Allow taking the mean over non-time dimensions of datasets containing
dask-backed cftime arrays (:issue:`5897`, :pull:`6950`). By `Spencer Clark
<https://github.com/spencerkclark>`_.
- Harmonize returned multi-indexed indexes when applying ``concat`` along new dimension (:issue:`6881`, :pull:`6889`)
dask-backed cftime arrays. (:issue:`5897`, :pull:`6950`)
By `Spencer Clark <https://github.com/spencerkclark>`_.
- Harmonize returned multi-indexed indexes when applying ``concat`` along new dimension. (:issue:`6881`, :pull:`6889`)
By `Fabian Hofmann <https://github.com/FabianHofmann>`_.
- Fix step plots with ``hue`` arg. (:pull:`6944`)
By `András Gunyhó <https://github.com/mgunyho>`_.
- Avoid use of random numbers in `test_weighted.test_weighted_operations_nonequal_coords` (:issue:`6504`, :pull:`6961`).
- Avoid use of random numbers in `test_weighted.test_weighted_operations_nonequal_coords`. (:issue:`6504`, :pull:`6961`)
By `Luke Conibear <https://github.com/lukeconibear>`_.
- Fix multiple regression issues with :py:meth:`Dataset.set_index` and
:py:meth:`Dataset.reset_index` (:pull:`6992`)
:py:meth:`Dataset.reset_index`. (:pull:`6992`)
By `Benoît Bovy <https://github.com/benbovy>`_.
- Raise a ``UserWarning`` when renaming a coordinate or a dimension creates a
non-indexed dimension coordinate, and suggest the user creating an index
either with ``swap_dims`` or ``set_index`` (:issue:`6607`, :pull:`6999`). By
`Benoît Bovy <https://github.com/benbovy>`_.
- Use ``keep_attrs=True`` in grouping and resampling operations by default (:issue:`7012`).
either with ``swap_dims`` or ``set_index``. (:issue:`6607`, :pull:`6999`)
By `Benoît Bovy <https://github.com/benbovy>`_.
- Use ``keep_attrs=True`` in grouping and resampling operations by default. (:issue:`7012`)
This means :py:attr:`Dataset.attrs` and :py:attr:`DataArray.attrs` are now preserved by default.
By `Deepak Cherian <https://github.com/dcherian>`_.
- ``Dataset.encoding['source']`` now exists when reading from a Path object (:issue:`5888`, :pull:`6974`)
- ``Dataset.encoding['source']`` now exists when reading from a Path object. (:issue:`5888`, :pull:`6974`)
By `Thomas Coleman <https://github.com/ColemanTom>`_.
- Better dtype consistency for ``rolling.mean()``. (:issue:`7062`, :pull:`7063`)
By `Sam Levang <https://github.com/slevang>`_.
- Allow writing NetCDF files including only dimensionless variables using the distributed or multiprocessing scheduler
(:issue:`7013`, :pull:`7040`).
- Allow writing NetCDF files including only dimensionless variables using the distributed or multiprocessing scheduler. (:issue:`7013`, :pull:`7040`)
By `Francesco Nattino <https://github.com/fnattino>`_.
- Fix bug where subplot_kwargs were not working when plotting with figsize, size or aspect (:issue:`7078`, :pull:`7080`)
- Fix deepcopy of attrs and encoding of DataArrays and Variables. (:issue:`2835`, :pull:`7089`)
By `Michael Niklas <https://github.com/headtr1ck>`_.
- Fix bug where subplot_kwargs were not working when plotting with figsize, size or aspect. (:issue:`7078`, :pull:`7080`)
By `Michael Niklas <https://github.com/headtr1ck>`_.

Documentation
~~~~~~~~~~~~~
- Update merge docstrings (:issue:`6935`, :pull:`7033`).
- Update merge docstrings. (:issue:`6935`, :pull:`7033`)
By `Zach Moon <https://github.com/zmoon>`_.
- Raise a more informative error when trying to open a non-existent zarr store. (:issue:`6484`, :pull:`7060`)
By `Sam Levang <https://github.com/slevang>`_.
- Added examples to docstrings for :py:meth:`DataArray.expand_dims`, :py:meth:`DataArray.drop_duplicates`, :py:meth:`DataArray.reset_coords`, :py:meth:`DataArray.equals`, :py:meth:`DataArray.identical`, :py:meth:`DataArray.broadcast_equals`, :py:meth:`DataArray.bfill`, :py:meth:`DataArray.ffill`, :py:meth:`DataArray.fillna`, :py:meth:`DataArray.dropna`, :py:meth:`DataArray.drop_isel`, :py:meth:`DataArray.drop_sel`, :py:meth:`DataArray.head`, :py:meth:`DataArray.tail`. (:issue:`5816`, :pull:`7088`).
- Added examples to docstrings for :py:meth:`DataArray.expand_dims`, :py:meth:`DataArray.drop_duplicates`, :py:meth:`DataArray.reset_coords`, :py:meth:`DataArray.equals`, :py:meth:`DataArray.identical`, :py:meth:`DataArray.broadcast_equals`, :py:meth:`DataArray.bfill`, :py:meth:`DataArray.ffill`, :py:meth:`DataArray.fillna`, :py:meth:`DataArray.dropna`, :py:meth:`DataArray.drop_isel`, :py:meth:`DataArray.drop_sel`, :py:meth:`DataArray.head`, :py:meth:`DataArray.tail`. (:issue:`5816`, :pull:`7088`)
By `Patrick Naylor <https://github.com/patrick-naylor>`_.
- Add missing docstrings to various array properties. (:pull:`7090`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.

Internal Changes
~~~~~~~~~~~~~~~~
- Added test for DataArray attrs deepcopy recursion/nested attrs (:issue:`2835`).
- Added test for DataArray attrs deepcopy recursion/nested attrs. (:issue:`2835`, :pull:`7086`)
By `Paul hockett <https://github.com/phockett>`_.

.. _whats-new.2022.06.0:
Expand Down
8 changes: 3 additions & 5 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -853,25 +853,23 @@ def loc(self) -> _LocIndexer:
return _LocIndexer(self)

@property
# Key type needs to be `Any` because of mypy#4167
def attrs(self) -> dict[Any, Any]:
"""Dictionary storing arbitrary metadata with this array."""
return self.variable.attrs

@attrs.setter
def attrs(self, value: Mapping[Any, Any]) -> None:
# Disable type checking to work around mypy bug - see mypy#4167
self.variable.attrs = value # type: ignore[assignment]
self.variable.attrs = dict(value)

@property
def encoding(self) -> dict[Hashable, Any]:
def encoding(self) -> dict[Any, Any]:
"""Dictionary of format-specific settings for how this array should be
serialized."""
return self.variable.encoding

@encoding.setter
def encoding(self, value: Mapping[Any, Any]) -> None:
self.variable.encoding = value
self.variable.encoding = dict(value)

@property
def indexes(self) -> Indexes:
Expand Down
9 changes: 5 additions & 4 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,7 +633,7 @@ def variables(self) -> Frozen[Hashable, Variable]:
return Frozen(self._variables)

@property
def attrs(self) -> dict[Hashable, Any]:
def attrs(self) -> dict[Any, Any]:
"""Dictionary of global attributes on this dataset"""
if self._attrs is None:
self._attrs = {}
Expand All @@ -644,7 +644,7 @@ def attrs(self, value: Mapping[Any, Any]) -> None:
self._attrs = dict(value)

@property
def encoding(self) -> dict[Hashable, Any]:
def encoding(self) -> dict[Any, Any]:
"""Dictionary of global encoding attributes on this dataset"""
if self._encoding is None:
self._encoding = {}
Expand Down Expand Up @@ -1123,7 +1123,7 @@ def _overwrite_indexes(
return replaced

def copy(
self: T_Dataset, deep: bool = False, data: Mapping | None = None
self: T_Dataset, deep: bool = False, data: Mapping[Any, ArrayLike] | None = None
) -> T_Dataset:
"""Returns a copy of this dataset.

Expand Down Expand Up @@ -1252,8 +1252,9 @@ def copy(
variables[k] = v.copy(deep=deep, data=data.get(k))

attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)

return self._replace(variables, indexes=indexes, attrs=attrs)
return self._replace(variables, indexes=indexes, attrs=attrs, encoding=encoding)

def as_numpy(self: T_Dataset) -> T_Dataset:
"""
Expand Down
49 changes: 27 additions & 22 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -894,7 +894,7 @@ def __setitem__(self, key, value):
indexable[index_tuple] = value

@property
def attrs(self) -> dict[Hashable, Any]:
def attrs(self) -> dict[Any, Any]:
"""Dictionary of local attributes on this variable."""
if self._attrs is None:
self._attrs = {}
Expand All @@ -905,7 +905,7 @@ def attrs(self, value: Mapping[Any, Any]) -> None:
self._attrs = dict(value)

@property
def encoding(self):
def encoding(self) -> dict[Any, Any]:
"""Dictionary of encodings on this variable."""
if self._encoding is None:
self._encoding = {}
Expand All @@ -918,7 +918,7 @@ def encoding(self, value):
except ValueError:
raise ValueError("encoding must be castable to a dictionary")

def copy(self, deep=True, data=None):
def copy(self, deep: bool = True, data: ArrayLike | None = None):
"""Returns a copy of this object.

If `deep=True`, the data array is loaded into memory and copied onto
Expand All @@ -929,7 +929,7 @@ def copy(self, deep=True, data=None):

Parameters
----------
deep : bool, optional
deep : bool, default: True
Whether the data array is loaded into memory and copied onto
the new object. Default is True.
data : array_like, optional
Expand Down Expand Up @@ -975,28 +975,29 @@ def copy(self, deep=True, data=None):
pandas.DataFrame.copy
"""
if data is None:
data = self._data
ndata = self._data

if isinstance(data, indexing.MemoryCachedArray):
if isinstance(ndata, indexing.MemoryCachedArray):
# don't share caching between copies
data = indexing.MemoryCachedArray(data.array)
ndata = indexing.MemoryCachedArray(ndata.array)

if deep:
data = copy.deepcopy(data)
ndata = copy.deepcopy(ndata)

else:
data = as_compatible_data(data)
if self.shape != data.shape:
ndata = as_compatible_data(data)
if self.shape != ndata.shape:
raise ValueError(
"Data shape {} must match shape of object {}".format(
data.shape, self.shape
ndata.shape, self.shape
)
)

# note:
# dims is already an immutable tuple
# attributes and encoding will be copied when the new Array is created
return self._replace(data=data)
attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)

# note: dims is already an immutable tuple
return self._replace(data=ndata, attrs=attrs, encoding=encoding)

def _replace(
self: T_Variable,
Expand Down Expand Up @@ -2877,7 +2878,7 @@ def concat(

return cls(first_var.dims, data, attrs)

def copy(self, deep=True, data=None):
def copy(self, deep: bool = True, data: ArrayLike | None = None):
"""Returns a copy of this object.

`deep` is ignored since data is stored in the form of
Expand All @@ -2889,7 +2890,7 @@ def copy(self, deep=True, data=None):

Parameters
----------
deep : bool, optional
deep : bool, default: True
Deep is ignored when data is given. Whether the data array is
loaded into memory and copied onto the new object. Default is True.
data : array_like, optional
Expand All @@ -2902,16 +2903,20 @@ def copy(self, deep=True, data=None):
data copied from original.
"""
if data is None:
data = self._data.copy(deep=deep)
ndata = self._data.copy(deep=deep)
else:
data = as_compatible_data(data)
if self.shape != data.shape:
ndata = as_compatible_data(data)
if self.shape != ndata.shape:
raise ValueError(
"Data shape {} must match shape of object {}".format(
data.shape, self.shape
ndata.shape, self.shape
)
)
return self._replace(data=data)

attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)

return self._replace(data=ndata, attrs=attrs, encoding=encoding)

def equals(self, other, equiv=None):
# if equiv is specified, super up
Expand Down
1 change: 0 additions & 1 deletion xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -6458,7 +6458,6 @@ def test_delete_coords() -> None:
assert set(a1.coords.keys()) == {"x"}


@pytest.mark.xfail
def test_deepcopy_nested_attrs() -> None:
"""Check attrs deep copy, see :issue:`2835`"""
da1 = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"), coords={"x": [10, 20]})
Expand Down