Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we make "rasterio" an engine option? #4142

Closed
WeatherGod opened this issue Jun 10, 2020 · 6 comments
Closed

Should we make "rasterio" an engine option? #4142

WeatherGod opened this issue Jun 10, 2020 · 6 comments

Comments

@WeatherGod
Copy link
Contributor

In a similar vein to how #4003 is going for zarr files, I would like to see if a rasterio engine could be created so that geotiff files could get opened through open_mfdataset() and friends. I am willing to put some cycles to putting this together. It has been a long time since I did the initial prototype for the pynio backend back in the xray days. Does anyone see any immediate pitfalls or gotchas for doing this?

@WeatherGod
Copy link
Contributor Author

So, one important difference I see off the bat is that zarr already had a DataStore implementation, while rasterio does not. I take it that implementing one would be the preferred approach?

@jhamman
Copy link
Member

jhamman commented Jun 11, 2020

Thanks @WeatherGod for opening this issue. It is good to have you back around these parts.

The main thing to consider at the moment is that we are about to start a major backend refactor (#1970). So I'd suggest holding on off on any major work that builds on our current DataStore implementation.

From an API design, I think it makes sense to revisit the rasterio->DataArray convention we currently have and instead treat rasterio like most other drivers and return a Dataset by default. For most applications, the following would be true:

open_rasterio(...) == open_dataarray(..., engine='rasterio') == open_dataset(..., engine='rasterio')[default_var]

There's also the question of where the rasterio backend should live once #1970 is implemented. I think we can make a pretty strong argument that it should move out of xarray and into a 3rd party package. Perhaps rasterio itself, or, more likely something like rio-xarray (cc @snowman2). See #3166 for a demo of how this may work.

As you can see, there are a few big questions up in the air right now. Work on #1970 will begin this summer so now is a great time to game out the various options for the rasterio backend.

@fmaussion
Copy link
Member

As a not-so-active-but-still-interested xarray dev my opinion doesn't count much, but I would be a proponent of having the rasterio backend live outside of xarray proper.

At the time we wrote the rasterio->DataArray conversion we already noticed a lot of issues regarding differences between the two datamodels, and rio-xarray shows that there is a lot of logic and dev work necessary for this to go smoothly, which would be better handled outside of xarray.

@snowman2
Copy link
Contributor

snowman2 commented Jun 11, 2020

In rioxarray we updated open_rasterio so it opens a tif file like a DataArray and netcdf-like files as a dataset. It also loads in the metadata/tags as well:

NetCDF file as Dataset

import rioxarray
xds = rioxarray.open_rasterio("PLANET_SCOPE_3D.nc")
<xarray.Dataset>
Dimensions:      (time: 2, x: 10, y: 10)
Coordinates:
  * y            (y) float64 8.085e+06 8.085e+06 ... 8.085e+06 8.085e+06
  * x            (x) float64 4.663e+05 4.663e+05 ... 4.663e+05 4.663e+05
  * time         (time) object 2016-12-19 10:27:29 2016-12-29 12:52:41.659696
    spatial_ref  int64 0
Data variables:
    blue         (time, y, x) float64 6.611 5.581 0.3996 ... 3.491 5.056 3.368
    green        (time, y, x) float64 7.921 66.15 30.1 ... 21.76 27.29 18.41
Attributes:
    coordinates:  spatial_ref

GeoTiff as DataArray

rds = rioxarray.open_rasterio("test_albedo.tif")
<xarray.DataArray (band: 1, y: 936, x: 745)>
[697320 values with dtype=int8]
Coordinates:
  * band         (band) int64 1
  * y            (y) float64 4.225e+06 4.225e+06 ... 4.216e+06 4.216e+06
  * x            (x) float64 6.168e+05 6.168e+05 ... 6.242e+05 6.242e+05
    spatial_ref  int64 0
Attributes:
    transform:     (10.0, 0.0, 616800.0, 0.0, -10.0, 4225260.0)
    _FillValue:    -1.0
    scale_factor:  1.0
    add_offset:    0.0
    long_name:     Albedo dry - the estimated ratio of the incident short-wav...
    grid_mapping:  spatial_ref

So, it could be modified to use as an engine in its current state.

@alexamici
Copy link
Collaborator

I started a PR against rioxarray to add an engine via the plugin interface see corteva/rioxarray#281

@keewis
Copy link
Collaborator

keewis commented May 27, 2021

open_dataset("RGB.byte.tif", engine="rasterio") works if rioxarray>=0.4 is installed so I guess we can close this. For the deprecation see #4697.

@keewis keewis closed this as completed May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants