Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstandard filter is not built or installed? #2294

Closed
edwardhartnett opened this issue Apr 19, 2022 · 22 comments
Closed

zstandard filter is not built or installed? #2294

edwardhartnett opened this issue Apr 19, 2022 · 22 comments
Assignees
Milestone

Comments

@edwardhartnett
Copy link
Contributor

I just tried to add a test for zstandard to the code. But it fails.

It turns out that we are not building and installing the zstandard filter code itself. This is a small C program that we must compile and install on the users system, or else zstandard cannot work for them.

We build (but do not install) the bzip2 filter.

We need to build the zstandard one as well.

And we should install both filters in the HDF5 filter directory, so that netCDF can use them.

@WardF I believe this should be resolved before the release.

As it is now, the zstandard feature is non-functional. Users would still have to install CCR to use it (because CCR builds and installs the filter code). So this makes putting zstandard in netCDF pretty pointless. ;-)

@WardF
Copy link
Member

WardF commented Apr 19, 2022

I am home from a week off and wading through emails; I will return to this as I review the others, in the meantime, @DennisHeimbigner any thoughts?

@WardF WardF added this to the 4.9.0 milestone Apr 19, 2022
@WardF WardF self-assigned this Apr 19, 2022
@DennisHeimbigner
Copy link
Collaborator

This problem is with current main, correct?

@DennisHeimbigner
Copy link
Collaborator

I just built the main branch and ran the test that exercises various filters and it seems to
work fine, except the relevant test: nc_test4/tst_specific_filters.sh has some things turned off. Use the attached version.

Did you look at the output of configure to see if it found libzstd?

tst_specific_filters.txt

@edwardhartnett
Copy link
Contributor Author

It found libzstd.

Could it be that you already have the zstandard filter installed in your HDF5 filter path? ;-)

@DennisHeimbigner
Copy link
Collaborator

I guess I assumed you had set HDF5_PLUGIN_PATH to point to your own directory
of plugins.

@edwardhartnett
Copy link
Contributor Author

Well imagine the poor netCDF user who has never heard of HDF5 plugins but wants to use this new compression. It won't work! :-)

So we need to build the zstandard filter, and install it in HDF5_PLUGIN_PATH, or else the nc_def_var_zstandard() function cannot work. This is what CCR does. It's easy and works fine. Then the user just installs netcdf-c, and they can start using nc_def_var_zstandard().

Also, if some other user sends them a netCDF file with zstandard compression, it will not be readable. In order for zstandard to be readable to all users, like zlib, we must also install the filter in HDF5_PLUGIN_PATH.

@DennisHeimbigner
Copy link
Collaborator

I am lost. Let me make sure that I understand,

  1. your HDF5_PLUGIN_PATH points to some ccr created directory that holds the zstd HDF5 filter wrapper.
  2. Your attempt to create a file with zstd compression fails, or
  3. Your attempt to open a file with zstd compression fails.

Correct?

@edwardhartnett
Copy link
Contributor Author

No I do not have the HDF5 filter wrapper installed. This is a fresh machine.

So I install zlib and zstandard with apt-get, and build HDF5.

When netcdf-c is built, zstandard is found and built. But when I try to use it, I get a filter not found error.

@DennisHeimbigner
Copy link
Collaborator

What is the value of HDF5_PLUGIN_PATH ?

@DennisHeimbigner
Copy link
Collaborator

I think I get it. You want to bypass the HDF5_PLUGIN_PATH altogether, right?

@edwardhartnett
Copy link
Contributor Author

What CCR does is include this self-contained autoconf project:
https://github.com/ccr/ccr/tree/master/hdf5_plugins/ZSTANDARD

This builds the code here: https://github.com/ccr/ccr/blob/master/hdf5_plugins/ZSTANDARD/src/H5Zzstandard.c

This generates the plugin shared library for zstandard.

During testing, CCR manipulates the HDF5_PLUGIN_PATH to get the tests to work. During install, the plugin is installed in HDF5_PLUGIN_DIR.

Once that is done, then of course netcdf-c can use the zstandard plugin. But without the plugin, putting zstandard into netCDF is incomplete.

IIRC you have a copy of the bzip filter code, you could just do the same for the zstandard. Then install them in HDF5_PLUGIN_DIR, and everything will magically work for readers and writers.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Apr 20, 2022

I did some experimenting with this idea. One problem that crops up
is that HDF5_PLUGIN_PATH is an actual path, not a single directory.
So it could be of the form "<dir1>:<dir2>: ..." or "<dir1>;<dir2>; ..."
In this case, it is not clear where to install the wrappers.
I think it would be best if we had a ./configure option of this form:

--with-standard-filter-install=<dir>

so that a specific install location is defined.

@DennisHeimbigner
Copy link
Collaborator

Another point. In order to protect the user, I would propose
that the installation not overwrite existing filter shared librarys.
Rather, the user would be required to manually remove any existing
installed filter wrappers. The idea is to prevent inadvertent overwrites.

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Apr 20, 2022

I agree with the extra option.

CCR has an option:

# If the env. variable HDF5_PLUGIN_PATH is set, or if
# --with-hdf5-plugin-dir=<directory>, use it as the plugin
# directory. This is necessary at the top level to provide the help
# message and --with option. If used, the option will be passed to the
# subdirs, and also handled by the configure in each filter
# subdirectory.
AC_MSG_CHECKING([where to put HDF5 plugins])
AC_ARG_WITH([hdf5-plugin-path],
            [AS_HELP_STRING([--with-hdf5-plugin-path=<directory>],
                            [specify HDF5 plugin path (defaults to /usr/local/hdf5/lib/plugin, or value of HDF5_PLUGIN_PATH, if set)])],
            [HDF5_PLUGIN_PATH=$with_hdf5_plugin_path])
HDF5_PLUGIN_PATH=${HDF5_PLUGIN_PATH-.}
AC_MSG_RESULT($HDF5_PLUGIN_PATH)

As for not overwriting, OK, if you want, but I note we don't apply that rule to any other binary in the package. That is, we will happily install our netcdf.so file on top of an existing one, without even warning the user. When they do a "make install" they get it installed, and we trust they know what they are doing.

@edwardhartnett
Copy link
Contributor Author

@DennisHeimbigner do you want me to make a PR or are you going to do it?

I sense that @WardF wants to release soon, so we should get this done. It needs to be in the next release...

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Apr 29, 2022
re: Unidata#2294

Ed Hartnett suggested that the netcdf library installation process
be extended to install the standard filters into a user specified
location. The user can then set HDF5_PLUGIN_PATH to that location.

This PR provides that capability using:
````
configure option: --with-plugin-dir=&lt;absolute directory path&gt;
cmake option: -DPLUGIN_INSTALL_DIR=&lt;absolute directory path&gt;
````

Currently, the following plugins are always installed, if
available: bzip2, zstd, blosc.
If NCZarr is enabled, then additional plugins are installed:
fletcher32, shuffle, deflate, szip.

Additionally, the necessary codec support is installed
for each of the above filters that is installed.

## Changes:
1. Cleanup handling of built-in bzip2.
2. Add documentation to docs/filters.md
3. Re-factor the NCZarr codec libraries
4. Add a test, although it can only be exercised after
   the library is installed, so it cannot be used during
   normal testing.
5. Cleanup use of HDF5_PLUGIN_PATH in the filter test cases.
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue May 14, 2022
…ript

re: Unidata#2338
re: Unidata#2294

In issue Unidata#2338,
Ed Hartnett suggested a better way to install filters to a user
defined location -- for Automake, anyway.

This PR implements that suggestion. It turns out to be more
complicated than it appears, so there are fair number of changes;
mostly to shell scripts. Most of the change is in plugins/Makefile.am.

NOTE: this PR still does NOT address the use of HDF5_PLUGIN_PATH
as the default; this turns out to be complex when dealing with NCZarr.
So this will be addressed in a subsequent post 4.9.0 PR.

## Misc. Changes
1. Record the occurrences of incomplete codecs in libnczarr so that
   they can be included in _Codecs attribute correctly. This allows
   users to see what missing filters are referenced in the Zarr file.
   Primarily affects libnczarr/zfilter.[ch]. Also required creating a
   new no-effect filter: H5Zunknown.c.
2. Move the unknown filter test to a separate test file.
3. Incorporates PR Unidata#2343
@WardF WardF modified the milestones: 4.9.0, 4.9.1 Jun 15, 2022
@mkitti
Copy link

mkitti commented Aug 17, 2022

I'm coming from the outside, but it seems you are considering setting HDF5_PLUGIN_PATH. Environment variables can quite problematic, especially on Windows where mutiple copies of the C runtime library often exist. I would like to discourage that and instead encourage the use of the H5PL family of functions, which allows you to add a list of paths via a API.

https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html

@DennisHeimbigner
Copy link
Collaborator

One problem for us is that until HDF5 version 1.13 is in very general use,
we are stuck with HDF5_PLUGIN_PATH.
But in the end, the client has to specify the search path for finding plugins.
And that spec must be potentially different for every use of the library.
The new 1.13 API would give us the freedom to get the path from places
other than an environment variable. What do you propose we use to replace
HDF5_PLUGIN_PATH?

@mkitti
Copy link

mkitti commented Aug 17, 2022

H5PLappend was introduced in 1.10.1:
https://portal.hdfgroup.org/display/HDF5/H5PL_APPEND

1.13 is not required for this functionality.

@mkitti
Copy link

mkitti commented Aug 17, 2022

The plugin interface might even be available in HDF5 1.8 series:
https://portal.hdfgroup.org/display/HDF5/Software+Changes+from+Release+to+Release+for+HDF5-1.8

@DennisHeimbigner
Copy link
Collaborator

I stand corrected about the append operation.
Still, the question remains about what alternative do we use.
As I see it, we have the following options:

  1. HDF5_PLUGIN_PATH
  2. The .netrc/.dodsrc file

Remember that the issue is being able to pass the path info
to the netcdf-c library at run-time independent of the program
that is using the library.

@WardF
Copy link
Member

WardF commented Aug 17, 2022

I think retaining the first option is a good idea, as we've already started down that route. Adding additional options (to the extent it remains practical) is a great idea, but I'm hesitant to remove ones we have introduced.

@WardF WardF modified the milestones: 4.9.1, 4.9.2 Feb 13, 2023
@WardF WardF modified the milestones: 4.9.2, 4.9.3 May 16, 2023
@edwardhartnett
Copy link
Contributor Author

OK, I am going to close this issue. The zstd filter is built and installed correctly, although the user still has to know what to do with HDF5_PLUGIN_PATH if they install plugins somewhere other than the HDF5 default directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants