Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New deep copy behavior in 2022.9.0 causes maximum recursion error #7111

Closed
4 tasks done
djhoese opened this issue Sep 30, 2022 · 23 comments · Fixed by #7112
Closed
4 tasks done

New deep copy behavior in 2022.9.0 causes maximum recursion error #7111

djhoese opened this issue Sep 30, 2022 · 23 comments · Fixed by #7112
Labels

Comments

@djhoese
Copy link
Contributor

djhoese commented Sep 30, 2022

What happened?

I have a case where a Dataset to be written to a NetCDF file has "ancillary_variables" that have a circular dependence. For example, variable A has .attrs["ancillary_variables"] that contains variable B, and B has .attrs["ancillary_variables"] that contains A.

What did you expect to happen?

Circular dependencies are detected and avoided. No maximum recursion error.

Minimal Complete Verifiable Example

In [1]: import xarray as xr

In [2]: a = xr.DataArray(1.0, attrs={})

In [3]: b = xr.DataArray(2.0, attrs={})

In [4]: a.attrs["other"] = b

In [5]: b.attrs["other"] = a

In [6]: a_copy = a.copy(deep=True)
---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
Cell In [6], line 1
----> 1 a_copy = a.copy(deep=True)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1172, in DataArray.copy(self, deep, data)
   1104 def copy(self: T_DataArray, deep: bool = True, data: Any = None) -> T_DataArray:
   1105     """Returns a copy of this array.
   1106 
   1107     If `deep=True`, a deep copy is made of the data array.
   (...)
   1170     pandas.DataFrame.copy
   1171     """
-> 1172     variable = self.variable.copy(deep=deep, data=data)
   1173     indexes, index_vars = self.xindexes.copy_indexes(deep=deep)
   1175     coords = {}

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:996, in Variable.copy(self, deep, data)
    989     if self.shape != ndata.shape:
    990         raise ValueError(
    991             "Data shape {} must match shape of object {}".format(
    992                 ndata.shape, self.shape
    993             )
    994         )
--> 996 attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
    997 encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
    999 # note: dims is already an immutable tuple

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1190, in DataArray.__deepcopy__(self, memo)
   1187 def __deepcopy__(self: T_DataArray, memo=None) -> T_DataArray:
   1188     # memo does nothing but is required for compatibility with
   1189     # copy.deepcopy
-> 1190     return self.copy(deep=True)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1172, in DataArray.copy(self, deep, data)
   1104 def copy(self: T_DataArray, deep: bool = True, data: Any = None) -> T_DataArray:
   1105     """Returns a copy of this array.
   1106 
   1107     If `deep=True`, a deep copy is made of the data array.
   (...)
   1170     pandas.DataFrame.copy
   1171     """
-> 1172     variable = self.variable.copy(deep=deep, data=data)
   1173     indexes, index_vars = self.xindexes.copy_indexes(deep=deep)
   1175     coords = {}

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:996, in Variable.copy(self, deep, data)
    989     if self.shape != ndata.shape:
    990         raise ValueError(
    991             "Data shape {} must match shape of object {}".format(
    992                 ndata.shape, self.shape
    993             )
    994         )
--> 996 attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
    997 encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
    999 # note: dims is already an immutable tuple

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1190, in DataArray.__deepcopy__(self, memo)
   1187 def __deepcopy__(self: T_DataArray, memo=None) -> T_DataArray:
   1188     # memo does nothing but is required for compatibility with
   1189     # copy.deepcopy
-> 1190     return self.copy(deep=True)

    [... skipping similar frames: DataArray.copy at line 1172 (495 times), DataArray.__deepcopy__ at line 1190 (494 times), _deepcopy_dict at line 231 (494 times), Variable.copy at line 996 (494 times), deepcopy at line 146 (494 times), deepcopy at line 153 (494 times)]

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:996, in Variable.copy(self, deep, data)
    989     if self.shape != ndata.shape:
    990         raise ValueError(
    991             "Data shape {} must match shape of object {}".format(
    992                 ndata.shape, self.shape
    993             )
    994         )
--> 996 attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
    997 encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
    999 # note: dims is already an immutable tuple

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:146, in deepcopy(x, memo, _nil)
    144 copier = _deepcopy_dispatch.get(cls)
    145 if copier is not None:
--> 146     y = copier(x, memo)
    147 else:
    148     if issubclass(cls, type):

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:231, in _deepcopy_dict(x, memo, deepcopy)
    229 memo[id(x)] = y
    230 for key, value in x.items():
--> 231     y[deepcopy(key, memo)] = deepcopy(value, memo)
    232 return y

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:153, in deepcopy(x, memo, _nil)
    151 copier = getattr(x, "__deepcopy__", None)
    152 if copier is not None:
--> 153     y = copier(memo)
    154 else:
    155     reductor = dispatch_table.get(cls)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1190, in DataArray.__deepcopy__(self, memo)
   1187 def __deepcopy__(self: T_DataArray, memo=None) -> T_DataArray:
   1188     # memo does nothing but is required for compatibility with
   1189     # copy.deepcopy
-> 1190     return self.copy(deep=True)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/dataarray.py:1172, in DataArray.copy(self, deep, data)
   1104 def copy(self: T_DataArray, deep: bool = True, data: Any = None) -> T_DataArray:
   1105     """Returns a copy of this array.
   1106
   1107     If `deep=True`, a deep copy is made of the data array.
   (...)
   1170     pandas.DataFrame.copy
   1171     """
-> 1172     variable = self.variable.copy(deep=deep, data=data)
   1173     indexes, index_vars = self.xindexes.copy_indexes(deep=deep)
   1175     coords = {}

File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/xarray/core/variable.py:985, in Variable.copy(self, deep, data)
    982         ndata = indexing.MemoryCachedArray(ndata.array)
    984     if deep:
--> 985         ndata = copy.deepcopy(ndata)
    987 else:
    988     ndata = as_compatible_data(data)

File ~/miniconda3/envs/satpy_py310/lib/python3.10/copy.py:137, in deepcopy(x, memo, _nil)
    134 if memo is None:
    135     memo = {}
--> 137 d = id(x)
    138 y = memo.get(d, _nil)
    139 if y is not _nil:

RecursionError: maximum recursion depth exceeded while calling a Python object

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I have at least one other issue related to the new xarray release but I'm still tracking it down. I think it is also related to the deep copy behavior change which was merged a day before the release so our CI didn't have time to test the "unstable" version of xarray.

Environment

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
python-bits: 64
OS: Linux
OS-release: 5.19.0-76051900-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.9.0
pandas: 1.5.0
numpy: 1.23.3
scipy: 1.9.1
netCDF4: 1.6.1
pydap: None
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.13.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.2
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.9.1
distributed: 2022.9.1
matplotlib: 3.6.0
cartopy: 0.21.0
seaborn: None
numbagg: None
fsspec: 2022.8.2
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.4.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: 8.5.0
sphinx: 5.2.3
@djhoese djhoese added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 30, 2022
@djhoese
Copy link
Contributor Author

djhoese commented Sep 30, 2022

CC @headtr1ck any idea if this is supposed to work with your new #7089?

@djhoese
Copy link
Contributor Author

djhoese commented Sep 30, 2022

I get a similar error for different structures and if I do something like data_arr.where(data_arr > 5, drop=True). In this case I have dask array based DataArrays and dask ends up trying to hash the object and it ends up in a loop trying to get xarray to hash the DataArray or something and xarray trying to hash the DataArrays inside .attrs.

In [9]: import dask.array as da

In [15]: a = xr.DataArray(da.zeros(5.0), attrs={}, dims=("a_dim",))

In [16]: b = xr.DataArray(da.zeros(8.0), attrs={}, dims=("b_dim",))

In [20]: a.attrs["other"] = b

In [24]: lons = xr.DataArray(da.random.random(8), attrs={"ancillary_variables": [b]})

In [25]: lats = xr.DataArray(da.random.random(8), attrs={"ancillary_variables": [b]})

In [26]: b.attrs["some_attr"] = [lons, lats]

In [27]: cond = a > 5

In [28]: c = a.where(cond, drop=True)
...
File ~/miniconda3/envs/satpy_py310/lib/python3.10/site-packages/dask/utils.py:1982, in _HashIdWrapper.__hash__(self)
   1981 def __hash__(self):
-> 1982     return id(self.wrapped)

RecursionError: maximum recursion depth exceeded while calling a Python object

@headtr1ck
Copy link
Collaborator

I basically copied the behavior of Dataset.copy which should already show this problem.
In principle we are doing a new_attrs = copy.deepcopy(attrs).

I would claim that the new behavior is correct, but maybe other devs can confirm this.

Coming from netCDF, it does not really make sense to put complex objects in attrs, but I guess for in-memory only it works.

@djhoese
Copy link
Contributor Author

djhoese commented Sep 30, 2022

I'd have to check, but this structure I think was originally produce by xarray reading a CF compliant NetCDF file. That is my memory at least. It could be that our library (satpy) is doing this as a convenience, replacing the name of an ancillary variable with the DataArray of that ancillary variable.

My other new issue seems to be related to .copy() doing a .copy() on dask arrays which then makes them not equivalent anymore. Working on an MVCE now.

@max-sixty
Copy link
Collaborator

Hmmm, python seems to deal with this reasonably for its builtins:

In [1]: a = [1]

In [2]: b = [a]

In [3]: a.append(b)

In [4]: import copy

In [5]: copy.deepcopy(a)
Out[5]: [1, [[...]]]

I doubt this is getting hit that much given it requires a recursive data structure, but it does seem like a gnarly error.

Is there some feature that python uses to check whether a data structure is recursive when it's copying, which we're not taking advantage of? I can look more later.

@rhkleijn
Copy link
Contributor

rhkleijn commented Sep 30, 2022

Is there some feature that python uses to check whether a data structure is recursive when it's copying, which we're not taking advantage of? I can look more later.

yes, def __deepcopy__(self, memo=None) has the memo argument exactly for the purpose of dealing with recursion, see https://docs.python.org/3/library/copy.html.
Currently, xarray's __deepcopy__ methods do not pass on the memo argument when deepcopying its components.

@headtr1ck
Copy link
Collaborator

headtr1ck commented Oct 1, 2022

I think our implementations of copy(deep=True) and __deepcopy__ are reverted, the first should call the latter and not the other way around to be able to pass the memo dict.

This will lead to a bit of duplicate code between __copy__ and __deepcopy__ but would be the correct way.

@rhkleijn
Copy link
Contributor

rhkleijn commented Oct 1, 2022

To avoid code duplication you may consider moving all logic from the copy methods to new _copy methods and extending that with an optional memo argument and have the copy, __copy__ and __deepcopy__ methods as thin wrappers around it.

@headtr1ck
Copy link
Collaborator

I will set up a PR for that.
Another issue has arisen: the repr is also broken for recursive data. With your example python should also raise a RecursionError when looking at this data?

@headtr1ck
Copy link
Collaborator

Ok, even xarray.testing.assert_identical fails with recursive definitions.
Are we sure that it is a good idea to support this?

@djhoese
Copy link
Contributor Author

djhoese commented Oct 1, 2022

I'm a little torn on this. Obviously I'm not an xarray maintainer so I'm not the one who would have to maintain it or answer support questions about it. We actually had the user-side of this discussion in the Satpy library group a while ago which is leading to this whole problem for us now. In Satpy we don't typically use or deal with xarray Datasets (the new DataTree library is likely what we'll move to) so when we have relationships between DataArrays we'll use something like ancillary variables to connect them. For example, a data quality flag that is used by the other variables in a file. Our users don't usually care about the DQF but we don't want to stop them from being able to easily access it. I was never a huge fan of putting a DataArray in the attrs of another DataArray, but nothing seemed to disallow it so I ultimately lost that argument.

So on one hand I agree it seems like there shouldn't be a need in most cases to have a DataArray inside a DataArray, especially a circular dependency. On the other hand, I'm not looking forward to the updates I'll need to make to Satpy to fix this. Note, we don't do this everywhere in Satpy, just something we use for a few formats we read.

@djhoese
Copy link
Contributor Author

djhoese commented Oct 1, 2022

Also note the other important change in this new behavior which is that dask arrays are now copied (.copy()) when they weren't before. This is causing some equality issues for us in Satpy, but I agree with the change on xarray's side (xarray should be able to call .copy() on whatever array it has.

dask/dask#9533

@headtr1ck
Copy link
Collaborator

I added a PR that fixes the broken reprs and deepcopys.
The other issues are not addressed yet.

@djhoese
Copy link
Contributor Author

djhoese commented Oct 1, 2022

It looks like that PR fixes all of my Satpy unit tests. I'm not sure how that is possible if it doesn't also change when dask arrays are copied.

@headtr1ck headtr1ck removed the needs triage Issue that has not been reviewed by xarray team member label Oct 1, 2022
@djhoese
Copy link
Contributor Author

djhoese commented Oct 2, 2022

Sorry, false alarm. I was running with an old environment. With this new PR it seems the ancillary_variables tests that were failing now pass, but the dask .copy() related ones still fail...which is expected so I'm ok with that.

Edit: I hacked variable.py so it had this:

            if deep:
                if is_duck_dask_array(ndata):
                    ndata = ndata
                else:
                    ndata = copy.deepcopy(ndata, memo)

and that fixed a lot of my dask related tests, but also seems to have introduced two new failures from what I can tell. So 🤷‍♂️

@TomNicholas
Copy link
Member

I was never a huge fan of putting a DataArray in the attrs of another DataArray, but nothing seemed to disallow it so I ultimately lost that argument.

Out of curiosity, why do you need to store a DataArray object as opposed to merely the values in one?

@djhoese
Copy link
Contributor Author

djhoese commented Oct 3, 2022

@TomNicholas Do you mean the "name" of the sub-DataArray? Or the numpy/dask array of the sub-DataArray?

This is what I was trying to describe in #7111 (comment). In Satpy we have our own Dataset-like/DataTree-like object where the user explicitly says "I want to load X from input files". As a convenience we put any ancillary variables (ex. data quality flags) in the DataArray .attrs for easier access. In Satpy there is no other direct connection between one DataArray and another. They are overall independent objects on a processing level so there may not be access to this higher-level Dataset-like container object in order to get ancillary variables by name.

@mraspaud was one of the original people who proposed our current design so maybe he can provide more context.

@dcherian
Copy link
Contributor

dcherian commented Oct 3, 2022

The ancillary variables stuff doesn't really fit the DataArray data model, so you have to do something.

Here's an example with Dataset and cf_xarray using the ancillary_variables attribute https://cf-xarray.readthedocs.io/en/latest/selecting.html#associated-variables

@djhoese
Copy link
Contributor Author

djhoese commented Oct 3, 2022

@dcherian Thanks for the feedback. When these decisions were made in Satpy xarray was not able to contain dask arrays as coordinates and we depend heavily on dask for our use cases. Putting some of these datasets as coordinates as cf xarray does may have caused extra unnecessary loading/computation. I'm not sure that would be the case with modern xarray.

Note that ancillary_variables are not the only case of "embedded" DataArrays in our code. We also needed something for CRS + bounds or other geolocation information. As you know I'm very much interested in CRS and geolocation handling in xarray, but for backwards compatibility we also have pyresample AreaDefinition and SwathDefinition objects in our DataArray .attrs["area"] attributes. A SwathDefinition is able to contain two DataArray objects for longitude and latitude. These also get copied with this new deep copy behavior.

We have a monthly Pytroll/Satpy meeting tomorrow so if you have any other suggestions or points for or against our usage please comment here and we'll see what we can do.

@mraspaud
Copy link
Contributor

mraspaud commented Oct 4, 2022

Thanks for pinging me. Regarding the ancillary variables, this comes from the CF conventions, allowing to "link" two or more arrays together. For example, we might have a radiance array, with quality_flags as an ancillary variable array, that characterises the quality of each radiance pixel. Now, in netcdf/CF, the ancillary variables are just references, but the logical way to do this in xarray is to use an ancillary_variables attribute to a DataArray. I'm not sure how we could do it in another way.

@headtr1ck
Copy link
Collaborator

I think the behavior of deepcopy in #7112 is correct.
I you really want to prevent the ancillary_variables attrs to be deep-copied as well, you can try to add it to the memo dict in deepcopy, e.g.:

memo = {id(da.attrs["ancillary_variables"]): da.attrs["ancillary_variables"]}
da_new = deepcopy(da, memo)

(untested!)

@djhoese
Copy link
Contributor Author

djhoese commented Oct 4, 2022

@mraspaud See the cf-xarray link from Deepak. We could make them coordinates. Or we could reference them by name:

ds = xr.open_dataset(...)
anc_name = ds["my_var"].attrs["ancillary_variables"][0]
anc_var = ds[anc_name]

Edit: Let's talk more in the pytroll meeting today.

@djhoese
Copy link
Contributor Author

djhoese commented Oct 4, 2022

We talked about this today in our pytroll/satpy meeting. We're not sure we agree with cf-xarray putting ancillary variables as coordinates or that it will work for us, so we think we could eventually remove any "automatic" ancillary variable loading and require that the user explicitly request any ancillary variables they want from Satpy's readers.

That said, this will take a lot of work to change. Since it seems like #7112 fixes a majority of our issues I'm hoping that that can still be merged. I'd hope that the memo logic when deepcopying will still protect against other recursive objects (possibly optimize?) even if they can't be directly serialized to NetCDF.

Side note: I feel like there is a difference between the NetCDF model and serializing/saving to a NetCDF file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants