-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opening zarr dataset doesn't return as expected #521
Comments
|
Ahh sorry. The reading in rioxarray is where im seeing an issue
|
That was just notes for the investigation. The subdatasets are why there are troubles:
It is strange that the X/Y coordinates have their own subdataset ... |
https://gdal.org/drivers/raster/zarr.html#particularities-of-the-classic-raster-api
|
The tests for the Zarr driver mainly use the multiimensional API: This is currently not supported by rasterio unfortunately. Related: #174 |
Usually there isn't data in the main raster when subdatasets exist ... I wonder if that is the intended behavior? |
A simple fix might be to only switch to reading subsets if they exist and there is no data in the main raster. |
This seems to be a driver specific issue. When opening >>> import rasterio
>>> rds = rasterio.open("tmmx_20190121.nc")
>>> rds_sub = rasterio.open('NETCDF:"tmmx_20190121.nc":air_temperature')
>>> assert (rds.read() == rds_sub.read()).all()
>>> assert rds.crs == rds_sub.crs Though, most of the time, there isn't data in the main raster when there are subdatasets. |
#522 should hopefully address this specific scenario without impacting other datasets. |
@rouault, do you had thoughts on this (since you implemented the Zarr driver in GDAL)? Is there something we're missing? Is the behavior expected? |
There's no well defined rule on whether a main dataset that has a band (that is not just a place holder for subdatasets) should expose itself in the subdatasets it exposes. This can vary between drivers. But here the driver should probably hide the X and Y variables as it recognizes them to build a geotransform, and they don't really bring any value as being exposed as subdatasets. You can open a ticket in OSGeo/gdal about that |
That fix seems very specific This would be problematic for maybe In contrast to my test setup above im using
This could be done at zarr creation time but xarray.to_zarr throws an error when writing attributes that are dictionaries as GDAL wants for the CRS. |
It appears the same data in the base dataset is the one in the subdataset, so no issues there. |
Thanks @rouault 👍, I opened a ticket: OSGeo/gdal#5681 |
Is there a scenario in the GDAL test suite where there is a band in the main dataset with data and has subdatasets? |
For example, the NITF and GTiff drivers can do that when there are multiple images in a file. GDALOpen() on such files will return a dataset with the first image, and a list of subdatasets listing all images (_SUBDATASET_1 will be the first image)
|
If I understand correctly, defaulting to only reading subdatasets in this scenario is safe as as |
Reverting the patch as it is best fixed in GDAL: #523 |
Thanks. |
Problem description
I've been experimenting with the gdal zarr driver and
rioxarray.open_rasterio
doesn't return as expected.Rather then getting the expected DataArray the output is a list of 2 xarray.Datasets that represent the x and y dimensions.
Maybe im missing something but i cant see a way to get the actual data.
However if I remove this section of code it works fine.
rioxarray/rioxarray/_io.py
Lines 854 to 868 in dc8efe3
This should reproduce the issue.
Before you ask, yes i know there are other ways of working with zarrs in xarray but doing it through rioxarray has some advantages such as being able to wrap it in a warpedvrt...which incidentally works ( i guess because gdal ignores the subdatasets when it wraps the zarr in a vrt)
Environment Information
rioxarray (0.11.1) deps:
rasterio: 1.2.10
xarray: 2022.3.0
GDAL: 3.4.1
GEOS: None
PROJ: None
PROJ DATA: None
GDAL DATA: None
Other python deps:
scipy: 1.8.0
pyproj: 3.3.1
System:
python: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
executable: /env/bin/python
machine: Linux-5.4.181-99.354.amzn2.x86_64-x86_64-with-glibc2.29
The text was updated successfully, but these errors were encountered: