Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_rasterio does not read coordinates from netCDF file properly with netCDF4>=1.4.2 #3185

Closed
weiji14 opened this issue Aug 5, 2019 · 12 comments

Comments

@weiji14
Copy link
Contributor

weiji14 commented Aug 5, 2019

MCVE Code Sample

Adapted from the test_serialization unit test at here.

import xarray as xr
from xarray.tests.test_backends import assert_identical, create_tmp_geotiff, create_tmp_file

with create_tmp_geotiff(additional_attrs={}) as (tmp_file, expected):
    # write it to a netcdf and read again (roundtrip) using open_rasterio
    with xr.open_rasterio(tmp_file) as rioda:
        with create_tmp_file(suffix='.nc') as tmp_nc_file:
            # Write geotiff to netcdf file
            rioda.to_netcdf(tmp_nc_file)
            
            # Read using open_dataarray works
            with xr.open_dataarray(tmp_nc_file) as ncds:
                assert_identical(rioda, ncds)
            
            # Read using open_rasterio doesn't work!!
            with xr.open_rasterio(tmp_nc_file) as ncds:
                assert_identical(rioda, ncds)

Actual Output (using netCDF4>=1.4.2)

AssertionError: Left and right DataArray objects are not identical

Differing coordinates:
L * x        (x) float64 5.5e+03 6.5e+03 7.5e+03 8.5e+03
R * x        (x) float64 0.5 1.5 2.5 3.5
L * y        (y) float64 7.9e+04 7.7e+04 7.5e+04
R * y        (y) float64 0.5 1.5 2.5
Differing attributes:
L   transform: (1000.0, 0.0, 5000.0, 0.0, -2000.0, 80000.0)
R   transform: (1.0, 0.0, 0.0, 0.0, 1.0, 0.0)
L   res: (1000.0, 2000.0)
R   res: (1.0, -1.0)
Attributes only on the left object:
    crs: +init=epsg:32618

Expected Output (using netCDF4==1.4.1)

AssertionError: Left and right DataArray objects are not identical

Differing attributes:
L   nodatavals: (nan, nan, nan)
R   nodatavals: (nan, nan, nan)
Attributes only on the left object:
    crs: +init=epsg:32618

Problem Description

I have a script which takes in either NetCDF or GeoTIFF files as an input and I've been using xr.open_rasterio to read them quite successfully before, as I'm basically just interested in parsing the correct XY coordinates out. However, upgrading from NetCDF 1.4.1 to 1.4.2 breaks this behaviour (xr.open_rasterio no longer parses the coordinates properly from a .nc file).

My hunch is that it has something to do with different NetCDF4 formats (e.g. ‘NETCDF4’, 'NETCDF4_CLASSIC’, ‘NETCDF3_64BIT’) but looking at the xr.open_rasterio code, I'm not too sure what's going on... Also not very sure if this is an upstream netcdf4-python or rasterio issue but I thought I'd report it here first.

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.14.127+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.8.18
libnetcdf: 4.4.1.1

xarray: 0.12.3
pandas: 0.25.0
numpy: 1.17.0rc2
scipy: 1.3.0
netCDF4: 1.4.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.24
cfgrib: None
iris: None
bottleneck: None
dask: 2.1.0
distributed: None
matplotlib: 3.1.1
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.0.1
pip: 19.2.1
conda: None
pytest: 5.0.1
IPython: 7.6.1
sphinx: None

@jhamman
Copy link
Member

jhamman commented Aug 6, 2019

Thanks @weiji14 for opening this issue. When we implemented open_rasterio, I don't think we intended this to be used for netCDF data so I'm surprised it works (worked) at all. I think we'll need someone to dig into the xarray implementation a bit to see if we can pull out the correct coordinates (metadata) using rasterio.

@weiji14
Copy link
Contributor Author

weiji14 commented Aug 6, 2019

Well open_rasterio did have an "experimental" warning on it in the docs 😆, but it was really nice having it work on GeoTIFFs and NetCDF files. I've forked the repo and will try to debug the situation a bit more. If anyone who's worked on that part of the codebase before has any pointers on what might be the cause of this issue / where to start that would be great.

@weiji14
Copy link
Contributor Author

weiji14 commented Aug 7, 2019

Hold on, the coordinates seems to be parsed out correctly from the netCDF file (even with netCDF==1.5.1.2) when I have a clean conda installation created following the instructions at https://xarray.pydata.org/en/latest/contributing.html#creating-a-python-environment.

I've isolated the issue and think the problem arises when I also import salem (an xarray accessor)... Will try to narrow this down before I close this issue.

@shoyer
Copy link
Member

shoyer commented Aug 10, 2019

I guess GDAL can also read netCDF files :).

Indeed, we did not anticipate people using open_rasterio this way.

@jhamman
Copy link
Member

jhamman commented Aug 11, 2019

GDAL can open so many formats (https://gdal.org/drivers/raster/index.html). We've really only tested against TIFF like formats though. I think the assumption will need to be is that rasterio will provide standard metadata for each of these drivers.

@weiji14
Copy link
Contributor Author

weiji14 commented Aug 13, 2019

Yes, there's https://gdal.org/drivers/raster/netcdf.html 😄 I've done a bit more debugging (having temporarily isolated salem from my script) and am still having issues with my setup.

The clean xarray-tests conda environment that works with netcdf==1.5.1.2 has libnetcdf: 4.6.2, but for some strange reason, running xr.show_versions() on my setup shows libnetcdf: 4.6.3 even though conda list | grep libnetcdf shows that I've installed libnetcdf 4.6.2 h056eaf5_1002 conda-forge.

Not sure if this libnetcdf 4.6.3 version is the problem, but it stands out the most (to me at least) when looking at the diff between my setup and the clean one. Is there a way to check the order in which xarray looks for the netcdf binaries as I feel it might be a PATH related issue. Also not sure if this issue fits here in xarray or somewhere else now...

@shoyer
Copy link
Member

shoyer commented Aug 13, 2019

The clean xarray-tests conda environment that works with netcdf==1.5.1.2 has libnetcdf: 4.6.2, but for some strange reason, running xr.show_versions() on my setup shows libnetcdf: 4.6.3 even though conda list | grep libnetcdf shows that I've installed libnetcdf 4.6.2 h056eaf5_1002 conda-forge.

xr.show_versions() gets its libnetcdf version from netCDF4 (specially netCDF4.__netcdf4libversion__). So I'm guessing that somehow netCDF4 is picking up libnetcdf from somewhere else -- maybe you pip installed it? It might be worth trying another fresh conda environment...

@weiji14
Copy link
Contributor Author

weiji14 commented Aug 13, 2019

xr.show_versions() gets its libnetcdf version from netCDF4 (specially netCDF4.__netcdf4libversion__). So I'm guessing that somehow netCDF4 is picking up libnetcdf from somewhere else -- maybe you pip installed it? It might be worth trying another fresh conda environment...

I'm not sure where it's picking up the libnetcdf 4.6.3 version from, but I found your comment at #2535 (comment) and think it might indeed be an incompatibility issue with rasterio and netCDF4 binary wheels (do rasterio wheels include netcdf binaries?). Probably somewhat related to rasterio/rasterio#1574 too.

Managed to get things to work by combining the workaround in this Pull Request and StackOverflow post, basically having pip compile the netcdf python package from source instead of using the wheel:

HDF5_DIR=$CONDA_PREFIX pip install --no-binary netCDF4 netCDF4==1.4.2

where $CONDA_PREFIX is the path to the conda environment e.g. /home/jovyan/.conda/envs/name-of-env. I've tested my MCVE code sample above and it works up to the latest netCDF4==1.5.1.2 version!

@naught101
Copy link

I'm seeing a problem with geotiffs that I think might be related:

$ gdalinfo /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif
Driver: GTiff/GeoTIFF
Files: /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif
       /home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif.aux.xml
Size is 10, 10
Coordinate System is:
PROJCS["GDA94 / MGA zone 55",
    GEOGCS["GDA94",
        DATUM["Geocentric_Datum_of_Australia_1994",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            TOWGS84[0,0,0,0,0,0,0],
            AUTHORITY["EPSG","6283"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4283"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",147],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",10000000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","28355"]]
Origin = (-406507.954209543997422,7588834.152862589806318)
Pixel Size = (263.500000000000000,-265.500000000000000)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( -406507.954, 7588834.153) (138d16' 6.99"E, 21d34'24.95"S)
Lower Left  ( -406507.954, 7586179.153) (138d16' 1.84"E, 21d35'50.30"S)
Upper Right ( -403872.954, 7588834.153) (138d17'37.56"E, 21d34'29.73"S)
Lower Right ( -403872.954, 7586179.153) (138d17'32.42"E, 21d35'55.09"S)
Center      ( -405190.454, 7587506.653) (138d16'49.70"E, 21d35'10.02"S)
Band 1 Block=10x10 Type=Float32, ColorInterp=Gray
  Min=0.069 Max=8.066 
  Minimum=0.069, Maximum=8.066, Mean=2.556, StdDev=1.749
  NoData Value=-3.40282306073709653e+38
  Metadata:
    STATISTICS_MAXIMUM=8.06591796875
    STATISTICS_MEAN=2.5563781395387
    STATISTICS_MINIMUM=0.068740844726562
    STATISTICS_STDDEV=1.7493082797107
    STATISTICS_VALID_PERCENT=89
import xarray as xr
from osgeo.gdal import Open

ras = Open('/home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif')
ras.GetGeoTransform()
# (-406507.954209544, 263.5, 0.0, 7588834.15286259, 0.0, -265.5)

ds = xr.open_rasterio('/home/nedcr/cr/software/ana/lib/geo/tests/data/Urandangi_MGA55.tif')
ds.transform
# (263.5, 0.0, -406507.954209544, 0.0, -265.5, 7588834.15286259)

The transform in the xarray dataset is transposed, and not really useful anymore.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-37-lowlatency machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: en_AU.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.14.1
pandas: 0.25.1
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.22
cfgrib: None
iris: None
bottleneck: None
dask: 1.2.0
distributed: 1.27.1
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 40.8.0
pip: 19.0.3
conda: None
pytest: 4.3.1
IPython: 7.3.0
sphinx: 2.1.2

@kmuehlbauer
Copy link
Contributor

@naught101 Looks like those coordinates are now in the order as declared in the affine package (since rasterio >= 1.0). See also: https://rasterio.readthedocs.io/en/latest/topics/migrating-to-v1.html.

All coordinates are created correctly, so from my point of view, only the docs here have to be changed.

from affine import Affine
da = xr.open_rasterio('path_to_file.tif')
transform = Affine(*da.attrs['transform'])
nx, ny = da.sizes['x'], da.sizes['y']
x, y = np.meshgrid(np.arange(nx)+0.5, np.arange(ny)+0.5) * transform

@gabriel-abrahao
Copy link
Contributor

Maybe this can be closed now, since #5021 changed the docs?

@naught101 Looks like those coordinates are now in the order as declared in the affine package (since rasterio >= 1.0). See also: https://rasterio.readthedocs.io/en/latest/topics/migrating-to-v1.html.

All coordinates are created correctly, so from my point of view, only the docs here have to be changed.

from affine import Affine
da = xr.open_rasterio('path_to_file.tif')
transform = Affine(*da.attrs['transform'])
nx, ny = da.sizes['x'], da.sizes['y']
x, y = np.meshgrid(np.arange(nx)+0.5, np.arange(ny)+0.5) * transform

@keewis
Copy link
Collaborator

keewis commented Mar 15, 2021

yes, I would think so. Thanks for the reminder, @gabriel-abrahao.

The original issue was that libnetcdf picked up by netCDF4 didn't match the version installed by the netCDF4 wheels, which is not really xarray's fault.

@keewis keewis closed this as completed Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants