Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray.Dataset.reproject doesn't modify the crs attribute #488

Closed
GCBallesteros opened this issue Mar 23, 2022 · 11 comments · Fixed by #490
Closed

xarray.Dataset.reproject doesn't modify the crs attribute #488

GCBallesteros opened this issue Mar 23, 2022 · 11 comments · Fixed by #490
Labels
bug Something isn't working

Comments

@GCBallesteros
Copy link
Contributor

Code Sample, a copy-pastable example if possible

print("target crs: ", best_crs)
print("current crs: ", ds.crs)
reprojected = ds.rio.reproject(best_crs, resampling=0)
print("crs after reprojection: ", reprojected.crs)

The code above prints

target crs:  +init=epsg:32631
current crs:  +init=epsg:32630
crs after reprojection:  +init=epsg:32630

Problem description

I would expect that the dataset being output after the reprojection has a modified CRS. Since this is not happening the reality of the data and the metadata are out of sync causing problems downstream. This behaviour is also observed on merge_datasets which will generate a dataset with nominally the CRS of the first dataset that it received, regardless of the crs parameter.

That being said, the output looks as it has been correctly reprojected.

Expected Output

A reprojected dataset with a CRS modified to be that of the destiny CRS.

Environment Information

rioxarray (0.10.2) deps:
rasterio: 1.2.10
xarray: 2022.3.0
GDAL: 3.3.2

Other python deps:
scipy: 1.7.1
pyproj: 3.2.1

System:
python: 3.9.7 (default, Sep 15 2021, 08:09:16) [GCC 10.3.0]
executable: /home/guillem/.pyenv/versions/3.9.7/envs/GISEnv/bin/python3.9
machine: Linux-5.11.0-7633-generic-x86_64-with-glibc2.33

Installation method

  • pip
@GCBallesteros GCBallesteros added the bug Something isn't working label Mar 23, 2022
@snowman2
Copy link
Member

Check: reprojected.rio.crs

See: https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html

@GCBallesteros
Copy link
Contributor Author

reprojected.rio.crs does change. Thanks! Would it be worthwhile adding a note on the documentation in this regard?

@snowman2
Copy link
Member

It looks like the crs attribute is removed for DataArray objects when re-projecting:

for unwanted_attr in FILL_VALUE_NAMES + UNWANTED_RIO_ATTRS:
new_attrs.pop(unwanted_attr, None)

new_attrs = _generate_attrs(self._obj, dst_nodata)

However, it keeps all attributes on the Dataset object:

resampled_dataset = xarray.Dataset(attrs=self._obj.attrs)

@snowman2
Copy link
Member

Would it be worthwhile adding a note on the documentation in this regard?

What were your thoughts on what to add & where to put it?

@GCBallesteros
Copy link
Contributor Author

I presume there is a high risk of breaking existing code if the behaviors described above are changed to have the ds.crs attribute be modified whenever ds.rio.crs changes. Not being able to do that I would:

  1. Change the docs at https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html#Accessing-the-CRS-object to explicitly state that some rioxarray operations may lead to a crs that is out of sync with ds.crs.
  2. Restating this on the getting started page as a note (one of those with a big orange box around)
  3. Potentially adding a link to Accessing the CRS object to the docs of every function that has the potential to modify the CRS.

In my opinion implementing any combination of three points above would reduce the potential for confusion from new users as myself.

There is also the question of rasterio modifying the CRS causing rio.crs to be out of sync. Perhaps on the note introduced on point 1 and 2 above a comment to this effect should be added too, perhaps with a recommendation to try to use either rioxarray or rasterio but not both if possible.

@snowman2
Copy link
Member

I presume there is a high risk of breaking existing code if the behaviors described above are changed to have the ds.crs attribute be modified whenever ds.rio.crs changes.

Actually, I think this change would be a welcome one to remove the potentially conflicting attributes on the Dataset. Only the attributes on the DataArray has been considered so far as it was what was returned by xarray.open_rasterio (deprecated).

If you feel up to it, a PR would be welcome to update this functionality.

@GCBallesteros
Copy link
Contributor Author

I'll give it a go

@snowman2
Copy link
Member

One potential solution could be to remove crs from the .attrs whenever rio.write_crs is called:

def write_crs(self, input_crs=None, grid_mapping_name=None, inplace=False):

@GCBallesteros
Copy link
Contributor Author

I was thinking on just making sure that crs and rio.crs remain consistent by updating the underlying xarray attributes. Quite a few people will be getting the CRS without going through rioxarray and having it disappear from under their feet wouldn't be nice.

@snowman2
Copy link
Member

snowman2 commented Mar 24, 2022

I was thinking on just making sure that crs and rio.crs remain consistent by updating the underlying xarray attributes.

I am glad you clarified that. rioxarray currently follows the CF conventions for storing the CRS mentioned here to provide compatibility across GIS software.

Reasons why writing the crs attribute likely won't be supported:

  • crs attribute is non-standard
  • Writing the CRS in multiple locations/formats has the potential to get confusing and inconsistent.
  • Attributes are easily lost ref
  • Encourage more usage of standard mechanisms of CRS storage

Quite a few people will be getting the CRS without going through rioxarray and having it disappear from under their feet wouldn't be nice.

For users who don't use rioxarray, this is a useful guide: https://pyproj4.github.io/pyproj/stable/build_crs_cf.html

@GCBallesteros
Copy link
Contributor Author

I made a tiny pull request. Instead of modifying write_crs it removes the crs from the original xarray once we have rioxarray's one ready. Any attempt to access my_dataset.crs will fail with a KeyError as crs will be removed from the attrs.

snowman2 added a commit that referenced this issue Mar 30, 2022
* BUG: use rasterio CRS or rioxarray CRS in open_rasterio

Co-authored-by: snowman2 <alansnow21@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants