Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for querying netCDF4 file for groups #3538

Closed
fergu opened this issue Nov 15, 2019 · 4 comments
Closed

Add support for querying netCDF4 file for groups #3538

fergu opened this issue Nov 15, 2019 · 4 comments
Labels

Comments

@fergu
Copy link

fergu commented Nov 15, 2019

Basically as the title says - it would be nice to have support to query NetCDF4 files for the groups that they contain.

I can do something similar using the netCDF4 library directly:

import netCDF4 as nc

myDset = nc.Dataset("/path/to/dataset.nc","r")
# From here you can use myDset.groups to get all groups in the file, or use a dict-like interface to get the groups of subgroups, etc

Assuming implementing this is really this simple, I can take a swing at this and submit a PR.

@shoyer
Copy link
Member

shoyer commented Nov 15, 2019

This has come up a couple of times before -- please see #2916 and #1092.

The basic issue is that xarray.Dataset doesn't have any notion of a group in it, so it isn't clear how we could work in xarray.

If it's just a matter of listing groups that can be found in a file, it seems like netCDF4 works fine for that.

@lamorton
Copy link

I hacked a quick solution for exploring HDF5 files that might be of interest.

import h5py
def explore_file(filepath,show="arrays"):
    """
    View the internal structure of an HDF5 file
    Returns a dictionary of the entity names & representations of their values
    Arguments:
        filepath: string
        show: one of ('groups','arrays','all')
            groups: display the number of direct array-type members of each group/subgroup
            arrays: display the shape & dtype of each array (if not a scalar)
            all: display the shape & dtype of every array
    """
    with h5py.File(filepath,mode='r') as f:
        descriptions = {}
        if show=="groups":
            def visitor(k,v):
                if isinstance(v, h5py.Group):
                    arrays = [k for k in v.keys() if isinstance(v[k],h5py.Dataset)]
                    if len(arrays) >0:
                        descriptions[k] = len(arrays)
        elif show == "arrays":
            def visitor(k,v):
                if isinstance(v,h5py.Dataset) and len(v.shape)>0:
                    descriptions[k] = "{},{}".format(v.shape,v.dtype)
        elif show =="all":
            def visitor(k,v):
                if isinstance(v,h5py.Dataset):
                    descriptions[k] = "{},{}".format(v.shape,v.dtype)
        f.visititems(visitor)#Apply names.append to each name in the file
    return descriptions

@stale
Copy link

stale bot commented May 2, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label May 2, 2022
@dcherian
Copy link
Contributor

dcherian commented May 2, 2022

Closing as duplicate of #2916. Eventually this will be fixed by datatree

@dcherian dcherian closed this as completed May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants