Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dictionaries to point to filter classes - not IDs #207

Closed
rayosborn opened this issue Nov 22, 2022 · 6 comments · Fixed by #212
Closed

Adding dictionaries to point to filter classes - not IDs #207

rayosborn opened this issue Nov 22, 2022 · 6 comments · Fixed by #212

Comments

@rayosborn
Copy link

rayosborn commented Nov 22, 2022

I am interested in adding GUI support for hdf5plugin filters to NeXpy, but it would be useful if there were a way to access both the installed filters and their default options. At the moment, I can find which filters are installed.

>>> import hdf5plugin
>>> hdf5plugin.FILTERS
{'blosc': 32001,
 'bshuf': 32008,
 'lz4': 32004,
 'zfp': 32013,
 'zstd': 32015,
 'fcidecomp': 32018}

However, I can only find the compression options if I instantiate their respective classes.

>>> hdf5plugin.Blosc()['compression']
32001
>>> hdf5plugin.Blosc()['compression_opts']
(0, 0, 0, 0, 5, 1, 1)

It would be helpful (to me) if there were an additional dictionary defined in hdf5plugin.__init__ that linked the filters to the classes, e.g.,

INSTALLED_FILTERS = {'blosc': Blosc,
                    'bshuf': Bitshuffle,
                    'bzip2': BZip2,
                    'lz4': LZ4,
                    'zfp': Zfp,
                    'zstd': Zstd,
                    'fcidecomp': FciDecomp
                    }

This would allow, e.g.,

>>> for f in INSTALLED_FILTERS:
        filter = INSTALLED_FILTERS[f]()
        print(f, filter['compression'], filter['compression_opts'])
blosc 32001 (0, 0, 0, 0, 5, 1, 1)
bshuf 32008 (0, 2)
lz4 32004 (0,)
zfp 32013 ()
zstd 32015 (3,)
fcidecomp 32018 ()

I wanted to get feedback before issuing a PR in case alternative solutions are already being considered or there are any objections.

@vasole vasole mentioned this issue Nov 22, 2022
@vasole
Copy link
Member

vasole commented Nov 22, 2022

Just to clarify. Do you intend to write compressed files from within NexPy?

I am asking it because those compression options are defaults, but the datasets in a file can be compressed with different values and one has to get the information from the dataset and not from the hdf5plugin defaults.

@rayosborn
Copy link
Author

@vasole, the idea was to allow people to set compression filters and their associated parameters when initializing a dataset. After posting, I realized that the example I gave is not really how I would implement this. Instead, I would use the inspect module to find out what keyword arguments are defined for each filter and their defaults, and then add them to the initialization form. This would still need the dictionary pointing to the classes themselves.

I know very little about optimizing compression options, so I am willing to be persuaded that it would be better to let power-users do this on the command-line. This is not my top priority, but it would be helpful if the hdf5plugin module supported it before I implement something and I think it could be useful in other contexts.

@vasole
Copy link
Member

vasole commented Nov 23, 2022

This would still need the dictionary pointing to the classes themselves.

I see the convenience.

A function retrieving the class from the case-insensitive filter name would also do the job but that is just a detail. One can have both.

@t20100 What's your opinion on this?

@t20100
Copy link
Member

t20100 commented Nov 23, 2022

It make sense to me too.

It should not be called INSTALLED_FILTERS however because all filters supported by hdf5plugin (listed in FILTERS) might not be available depending on the environment, e.g., lack of C++11 compiler or other constraints. Maybe FILTER_CLASSES?

For information, you can get the list of hdf5plugin's filter currently available at runtime with:
hdf5plugin.get_config().registered_filters

A function (e.g, get_filter_class()) sounds more versatile: case insensitive but it can also accept filter ID as input. Or eventually a get_filter_classes() that can take multiple names/IDs (that defaults to hdf5plugin.get_config().registered_filters) and returns a tuple, so one can write:

available_filter_classes = hdf5plugin.get_filter_classes()
blosc, bitshuffle = hdf5plugin.get_filter_classes('blosc', 32008)
blosc = hdf5plugin.get_filter_classes('blosc')[0]

@vasole
Copy link
Member

vasole commented Nov 24, 2022

@rayosborn

To associate filter names and classes is going to be implemented (most likely via a function).

That will not be enough, when reading datasets, to go from (0, 0, 0, 0, 5, 1, 1) back to the arguments needed to pass to the blosc filter class.

The last functionality would require a lot more work.

@rayosborn
Copy link
Author

@vasole, @t20100, thanks for producing #212 so quickly. It looks great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants