Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can only read 32 layers from .hdf files before returning a FileNotFound error #544

Closed
jamie-sgro opened this issue Jul 5, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@jamie-sgro
Copy link

jamie-sgro commented Jul 5, 2022

Code Sample, a copy-pastable example if possible

I've created a small repo with the necessary code to recreate the below error:
https://github.com/jamie-sgro/xarray-recreate-bug

Problem description

In Docker environments only, throws the below error. This only occurs
when trying to read .hdf files with a cumulative total of >32 layers. It always fails on the 33rd layer being read into memory regardless of the order of the files
and the contents of the files themselves. Note we use a copy of a file
for each iteration and it still fails

rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/
pytest-5/test_can_open_hdf4_closer_to_e0/file3:MODIS_Grid_16DAY_1km_VI:1
km 16 days blue reflectance: No such file or directory

I believe this is an error in the intersection between xarray, rioxarray, and rasterio. See these two other issues for more details:

Full Error
Last login: Tue Jul  5 12:28:09 on ttys003
docker exec -it 9763aa865198baad81e9e25fd70580f20cb3d4fb0b83ef64edc2f3fba60c9e92 /bin/sh
(base) jamiesgro@Jamies-MacBook-Pro ~ % docker exec -it 9763aa865198baad81e9e25fd70580f20cb3d4fb0b83ef64edc2f3fba60c9e92 /bin/sh
# pytest
========================================================================================================================================== test session starts ==========================================================================================================================================
platform linux -- Python 3.9.2, pytest-7.1.2, pluggy-1.0.0
rootdir: /app
collected 3 items                                                                                                                                                                                                                                                                                       

tests/test_rasterio_open.py .                                                                                                                                                                                                                                                                     [ 33%]
tests/test_xarray_open_hdf4.py .F                                                                                                                                                                                                                                                                 [100%]

=============================================================================================================================================== FAILURES ================================================================================================================================================
____________________________________________________________________________________________________________________________________ test_using_xarray_via_rioxarray ____________________________________________________________________________________________________________________________________

>   ???

rasterio/_base.pyx:261: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???

rasterio/_shim.pyx:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   rasterio._err.CPLE_OpenFailedError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory

rasterio/_err.pyx:216: CPLE_OpenFailedError

During handling of the above exception, another exception occurred:

tmp_path = PosixPath('/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0')

    def test_using_xarray_via_rioxarray(tmp_path: Path):
        """Same as above but using the rioxaray library to open via rasterio
        """
    
        num_files = 4
        filepaths = [tmp_path / f"file{i}" for i in range(num_files)]
    
        for i in range(num_files):
            shutil.copyfile(FILEPATH, filepaths[i])
    
        for filepath in filepaths:
>           with xr.open_dataset(filepath, engine="rasterio") as _:

tests/test_xarray_open_hdf4.py:57: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.9/site-packages/xarray/backends/api.py:496: in open_dataset
    backend_ds = backend.open_dataset(
/usr/local/lib/python3.9/site-packages/rioxarray/xarray_plugin.py:55: in open_dataset
    rds = _io.open_rasterio(
/usr/local/lib/python3.9/site-packages/rioxarray/_io.py:855: in open_rasterio
    return _load_subdatasets(
/usr/local/lib/python3.9/site-packages/rioxarray/_io.py:619: in _load_subdatasets
    with rasterio.open(subdataset) as rds:
/usr/local/lib/python3.9/site-packages/rasterio/env.py:437: in wrapper
    return f(*args, **kwds)
/usr/local/lib/python3.9/site-packages/rasterio/__init__.py:220: in open
    s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory

rasterio/_base.pyx:263: RasterioIOError
======================================================================================================================================== short test summary info ========================================================================================================================================
FAILED tests/test_xarray_open_hdf4.py::test_using_xarray_via_rioxarray - rasterio.errors.RasterioIOError: HDF4_EOS:EOS_GRID:/tmp/pytest-of-root/pytest-0/test_using_xarray_via_rioxarra0/file2:MODIS_Grid_16DAY_1km_VI:1 km 16 days blue reflectance: No such file or directory
====================================================================================================================================== 1 failed, 2 passed in 9.07s =============================================================================================================

Expected Output

The expected output is that all layers are read into memory (in this case, as an xr.Dataset) with no challenges

Environment Information

  • python -c "import rioxarray; rioxarray.show_versions()"
  • rioxarray version (python -c "import rioxarray; print(rioxarray.__version__)")
  • 0.11.1
  • rasterio version (rio --version)
  • 1.2.10
  • GDAL version (rio --gdal-version)
  • 3.5.0
  • Python version (python -c "import sys; print(sys.version.replace('\n', ' '))")
  • 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110]
  • Operation System Information (python -c "import platform; print(platform.platform())")
  • Linux-5.10.25-linuxkit-aarch64-with-glibc2.33

Installation method

@jamie-sgro jamie-sgro added the bug Something isn't working label Jul 5, 2022
@snowman2
Copy link
Member

snowman2 commented Jul 7, 2022

If you have time to find the most recent version of rasterio/xarray/rioxarray where this wasn't an issue, that would be very helpful.

@J-Levitt
Copy link

J-Levitt commented Jul 8, 2022

A quick note as referenced in rasterio/rasterio#2490 that looking forward with gdal 3.5.1 and rasterio 1.3.0 the issue persists

@ShengpeiWang
Copy link

Inspired by @snowman2's comment here rasterio/rasterio#2490 (comment). I found that the target files are kept open when reading in the data in rioxarray/_io.py:619: in _load_subdatasets.

When the method was updated to load the data into memory and close the file after, the test passed:

        if subdataset_filter is not None and not subdataset_filter.match(subdataset):
            continue
        with rasterio.open(subdataset) as rds:
            shape = rds.shape
        rioda: DataArray
        with open_rasterio(  # type: ignore
            subdataset,
            parse_coordinates=shape not in dim_groups and parse_coordinates,
            chunks=chunks,
            cache=cache,
            lock=lock,
            masked=masked,
            mask_and_scale=mask_and_scale,
            default_name=subdataset.split(":")[-1].lstrip("/").replace("/", "_"),
            decode_times=decode_times,
            decode_timedelta=decode_timedelta,
            **open_kwargs,
        ) as rioda:
            rioda.load()
        if shape not in dim_groups:
            dim_groups[shape] = {rioda.name: rioda}
        else:
            dim_groups[shape][rioda.name] = rioda```
I'm happy to open a PR to address the issue.

@snowman2
Copy link
Member

We don't always want all of the data loaded into memory as there are scenarios with larger files when you only want to load in a subset of the data. If you wanted to add a rioda.close() after open_rasterio without loading in the data, it should work fine. xarray should re-open the file and load in the data when requested.

@snowman2
Copy link
Member

Running into this in #606. Seems it was fine with GDAL 3.4 and the problem was introduced in GDAL 3.5.

Investigation here: OSGeo/gdal#6665

@snowman2
Copy link
Member

Fix identified in GDAL.

@snowman2
Copy link
Member

#607 should help as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants