Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: deep-copy wasn't copying coords, bug fixed within IndexVariable #2936

Merged
merged 1 commit into from
May 8, 2019

Conversation

pletchm
Copy link
Contributor

@pletchm pletchm commented May 2, 2019

This pull request fixes a bug that prevented making a complete deep copy of a DataArray or Dataset, because the coords weren't being deep copied. It took a small fix in the IndexVariable.copy method. This method now allows both deep and shallow copies of coords to be made.

This pull request corresponds to this issue #1463.

  • Tests added
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

@pletchm
Copy link
Contributor Author

pletchm commented May 2, 2019

Tests and documentation forthcoming. I had already opened a PR, but was perplexed when the build and tests weren't passing -- suspected it was related to my commits.

@pletchm pletchm force-pushed the bugfix/deep-copy-coords branch from df2a875 to f8a0569 Compare May 2, 2019 23:53
@max-sixty
Copy link
Collaborator

@pletchm thanks for the PR!

I think the test failuers are dependencies rather than your changes! I kicked off a build of master here: https://travis-ci.org/pydata/xarray/builds/524754660 to confirm - if it's the same failures then we'll know it's not this code

@pletchm
Copy link
Contributor Author

pletchm commented May 6, 2019

@pletchm thanks for the PR!

I think the test failuers are dependencies rather than your changes! I kicked off a build of master here: https://travis-ci.org/pydata/xarray/builds/524754660 to confirm - if it's the same failures then we'll know it's not this code

Thanks for looking into this, @max-sixty! It looks like the errors are the same, so the problem is dependencies, right? How do we go about fixing that?

@max-sixty
Copy link
Collaborator

Then it's fine! They'll be fixed separately.

But did we lose the tests on this branch? Can we add those back?

@pletchm pletchm force-pushed the bugfix/deep-copy-coords branch from f8a0569 to 1db1673 Compare May 7, 2019 02:05
@pletchm
Copy link
Contributor Author

pletchm commented May 7, 2019

Yep, the tests are back now.

@max-sixty
Copy link
Collaborator

Perfect @pletchm !

Sorry for the test failures on our end - we had some dependency issues.

I'll merge tomorrow unless anyone has other comments.

@max-sixty max-sixty merged commit c04234d into pydata:master May 8, 2019
@max-sixty
Copy link
Collaborator

Thanks @pletchm !

@shoyer
Copy link
Member

shoyer commented May 9, 2019

Can you take a look at the test failures on Windows? These seem to be due to your charge:

================================== FAILURES ===================================
____________ TestDataArray.test_copy_coords[False-expected_orig1] _____________
self = <xarray.tests.test_dataarray.TestDataArray object at 0x000000AAE0637F28>
deep = False
expected_orig = <xarray.DataArray 'a' (a: 2)>
array([999,   2])
Coordinates:
  * a        (a) int32 999 2
    @pytest.mark.parametrize('deep, expected_orig', [
        [True,
         xr.DataArray(xr.IndexVariable('a', np.array([1, 2])),
                      coords={'a': [1, 2]}, dims=['a'])],
        [False,
         xr.DataArray(xr.IndexVariable('a', np.array([999, 2])),
                      coords={'a': [999, 2]}, dims=['a'])]])
    def test_copy_coords(self, deep, expected_orig):
        da = xr.DataArray(
            np.ones([2, 2, 2]),
            coords={'a': [1, 2], 'b': ['x', 'y'], 'c': [0, 1]},
            dims=['a', 'b', 'c'])
        da_cp = da.copy(deep)
        da_cp['a'].data[0] = 999
    
        expected_cp = xr.DataArray(
            xr.IndexVariable('a', np.array([999, 2])),
            coords={'a': [999, 2]}, dims=['a'])
>       assert_identical(da_cp['a'], expected_cp)
xarray\tests\test_dataarray.py:3342: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
a = <xarray.DataArray 'a' (a: 2)>
array([1, 2])
Coordinates:
  * a        (a) int32 1 2
b = <xarray.DataArray 'a' (a: 2)>
array([999,   2])
Coordinates:
  * a        (a) int32 999 2
    def assert_identical(a, b):
>       xarray.testing.assert_identical(a, b)
E       AssertionError: Left and right DataArray objects are not identical
E       
E       Differing values:
E       L
E           array([1, 2])
E       R
E           array([999,   2])
E       Differing coordinates:
E       L * a        (a) int32 1 2
E       R * a        (a) int32 999 2
xarray\tests\__init__.py:195: AssertionError
_____________ TestDataset.test_copy_coords[False-expected_orig1] ______________
self = <xarray.tests.test_dataset.TestDataset object at 0x000000AAE05256A0>
deep = False
expected_orig = <xarray.DataArray 'a' (a: 2)>
array([999,   2])
Coordinates:
  * a        (a) int32 999 2
    @pytest.mark.parametrize('deep, expected_orig', [
        [True,
         xr.DataArray(xr.IndexVariable('a', np.array([1, 2])),
                      coords={'a': [1, 2]}, dims=['a'])],
        [False,
         xr.DataArray(xr.IndexVariable('a', np.array([999, 2])),
                      coords={'a': [999, 2]}, dims=['a'])]])
    def test_copy_coords(self, deep, expected_orig):
        ds = xr.DataArray(
            np.ones([2, 2, 2]),
            coords={'a': [1, 2], 'b': ['x', 'y'], 'c': [0, 1]},
            dims=['a', 'b', 'c'],
            name='value').to_dataset()
        ds_cp = ds.copy(deep=deep)
        ds_cp.coords['a'].data[0] = 999
    
        expected_cp = xr.DataArray(
            xr.IndexVariable('a', np.array([999, 2])),
            coords={'a': [999, 2]}, dims=['a'])
>       assert_identical(ds_cp.coords['a'], expected_cp)
xarray\tests\test_dataset.py:1993: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
a = <xarray.DataArray 'a' (a: 2)>
array([1, 2])
Coordinates:
  * a        (a) int32 1 2
b = <xarray.DataArray 'a' (a: 2)>
array([999,   2])
Coordinates:
  * a        (a) int32 999 2
    def assert_identical(a, b):
>       xarray.testing.assert_identical(a, b)
E       AssertionError: Left and right DataArray objects are not identical
E       
E       Differing values:
E       L
E           array([1, 2])
E       R
E           array([999,   2])
E       Differing coordinates:
E       L * a        (a) int32 1 2
E       R * a        (a) int32 999 2
xarray\tests\__init__.py:195: AssertionError
============================== warnings summary ===============================

@shoyer
Copy link
Member

shoyer commented May 9, 2019

It seems xarray (mistakenly?) relies on the immutability of index data somewhere else, and hence reuses the same pandas.Index object for data in both data and coords.

@pletchm
Copy link
Contributor Author

pletchm commented May 9, 2019

Sure. I'm looking into it now.

Sorry about that.

@shoyer
Copy link
Member

shoyer commented May 9, 2019 via email

@pletchm
Copy link
Contributor Author

pletchm commented May 9, 2019

@shoyer, I spent sometime looking into it and it looks like the test fails for the shallow copy, and apparently only on Windows for some reason. In Windows coords seem to be immutable unless it's one dataarray deep copied from another (which is why only the deep=False test fails). So I decided to just mark the tests as xfail for now (but I'd be happy to create an issue and look into it more in the future).

I'll open a new PR marking the tests with xfail shortly.

@max-sixty
Copy link
Collaborator

Ah, thanks @pletchm , and @shoyer for spotting this

I should have caught this - I didn't separately check the windows tests; I expected the same unrelated failures as on Travis and didn't consider this would be an area that would be different between platforms. Mea culpa. It's always the avenues you least suspect...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants