Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable reading of file-like HDF5 objects #2781

Closed
scottyhq opened this issue Feb 20, 2019 · 2 comments
Closed

enable reading of file-like HDF5 objects #2781

scottyhq opened this issue Feb 20, 2019 · 2 comments

Comments

@scottyhq
Copy link
Contributor

xarray 11.3 currently won't read HDF5 file-like objects

import xarray as xr
import gcsfs
fs = gcsfs.GCSFileSystem()
images = fs.ls('pangeo-data/grfn-v2/137/')
fileObj = fs.open('pangeo-data/grfn-v2/137/S1-GUNW-A-R-137-tops-20181129_20181123-020010-43220N_41518N-PP-e2c7-v2_0_0.nc')
# but, can we open this w/ xarray anyway? Yes! with modifications to xarray and h5netcdf
da = xr.open_dataset(fileObj, group='/science/grids/data', engine='h5netcdf')
da 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-22e0010de1f2> in <module>()
      1 # but, can we open this w/ xarray anyway? Yes! with modifications to xarray and h5netcdf
----> 2 da = xr.open_dataset(fileObj, group='/science/grids/data', engine='h5netcdf')
      3 da

/srv/conda/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs)
    347     else:
    348         if engine is not None and engine != 'scipy':
--> 349             raise ValueError('can only read file-like objects with '
    350                              "default engine or engine='scipy'")
    351         # assume filename_or_obj is a file-like object

ValueError: can only read file-like objects with default engine or engine='scipy'

Problem description

It is now possible to do this with h5py >2.9.0. see h5py/h5py#1105. This would be a useful feature because there is a lot of NASA data out there in HDF5. This functionality could open up reading without first writing to disk (to translate to Zarr or other formats possibly). There seem to be many issues related to this:
fsspec/s3fs#144
#2535

I'm guessing adding this functionality doesn't fix many of the performance issues related to HDF5 and Dask
dask/dask#2488
dask/distributed#2319

Expected Output

<xarray.Dataset>
Dimensions:              (latitude: 2045, longitude: 4158)
Coordinates:
  * longitude            (longitude) float64 -123.1 -123.1 ... -119.6 -119.6
  * latitude             (latitude) float64 43.22 43.22 43.22 ... 41.52 41.52
Data variables:
    crs                  int32 ...
    unwrappedPhase       (latitude, longitude) float32 ...
    coherence            (latitude, longitude) float32 ...
    connectedComponents  (latitude, longitude) float32 ...
    amplitude            (latitude, longitude) float32 ...

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 03:09:43) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.14.65+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.11.3
pandas: 0.24.1
numpy: 1.16.1
scipy: 1.2.0
netCDF4: 1.4.2
pydap: None
h5netcdf: 0.6.2
h5py: 2.9.0
Nio: None
zarr: 2.2.0
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: 1.0.18
cfgrib: None
iris: None
bottleneck: None
cyordereddict: None
dask: 1.1.0
distributed: 1.25.2
matplotlib: 3.0.2
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 40.7.1
pip: 19.0.2
conda: 4.6.3
pytest: None
IPython: 7.1.1
sphinx: None

@shoyer
Copy link
Member

shoyer commented Feb 22, 2019

Yes, this would be a welcome feature addition!

It looks like you've already gotten started -- let me know if you have any further questions.

@scottyhq
Copy link
Contributor Author

Just noting here that I've gotten this to work reading a netcd4/fhdf5 file via gcsfs, but not for the same file accessed via s3fs:
fsspec/s3fs#168

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants