Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement the nc_get/put_vars operations for netcdf-4 using the corresponding HDF5 operations. #1001

Merged
merged 4 commits into from
Jun 8, 2018

Conversation

DennisHeimbigner
Copy link
Collaborator

re: github issue #908
also in reference to pydata/xarray#2004

The netcdf-c library has implemented the nc_get_vars and nc_put_vars
operations as element at a time. This has resulted in very slow
operation.

This pr attempts to improve the situation for netcdf-4/hdf5 files
by using the slab operations provided by the hdf5 library. The new
implementation passes the get/put vars stride information down to
the hdf5 slab operations.

The result appears to improve performance significantly. Some simple
tests on large 2-D arrays shows speedups in excess of 150.

Misc. other changes:

  1. fix bug in ncgen/semantics.c; using a list's allocated length
    instead of actual length.
  2. Added a temporary hook in the netcdf library plus a performance
    test case (tst_varsperf.c) to estimate the speedup. After users
    have had some experience with this, I will remove it, probably
    after the 4.7 release.

corresponding HDF5 operations.

re: github issue #908
also in reference to pydata/xarray#2004

The netcdf-c library has implemented the nc_get_vars and nc_put_vars
operations as element at a time. This has resulted in very slow
operation.

This pr attempts to improve the situation for netcdf-4/hdf5 files
by using the slab operations provided by the hdf5 library. The new
implementation passes the get/put vars stride information down to
the hdf5 slab operations.

The result appears to improve performance significantly. Some simple
tests on large 2-D arrays shows speedups in excess of 150.

Misc. other changes:
1. fix bug in ncgen/semantics.c; using a list's allocated length
   instead of actual length.
2. Added a temporary hook in the netcdf library plus a performance
   test case (tst_varsperf.c) to estimate the speedup. After users
   have had some experience with this, I will remove it, probably
   after the 4.7 release.
@DennisHeimbigner DennisHeimbigner changed the title Re-Implement the nc_get/put_vars operations for netcdf-4 using the corresponding HDF5 operations. Re-implement the nc_get/put_vars operations for netcdf-4 using the corresponding HDF5 operations. May 22, 2018
nc4_set_default_vars(int tf)
{
defaultvars = (tf ? 1 : 0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this temporary flag be removed?

Changing to the old default methods can easily be done by changing the dispatch table.

#endif
int retval = NC_NOERR, range_error = 0, i, d2;
void *bufr = NULL;
#ifndef HDF5_CONVERT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where this code came from. It looks like I wrote this code, but if so, why does it seem like it was just added?

In any case, the HDF5_CONVERT macro can never be defined. I have removed it from other code. I can be removed from this code as well. HDF5 never was able to convert data and generate the same output and error results that netCDF does, and HDF5_CONVERT is never set in the build. When it is set, it inevitably will result in failed tests.

@DennisHeimbigner
Copy link
Collaborator Author

Please leave it. When this gets released and actually used, I anticipate problems.
I may need to build test cases that switch between the two cases programmatically
to track down bugs.

@edhartnett
Copy link
Contributor

@DennisHeimbigner you are referring to the defaultvars, not the HDF_CONVERT, correct?

I also believe their will be problems with this code, as it does not seem to do the correct thing for parallel I/O in all cases. After it is merged I will add some tests and probably submit a PR with additional code for parallel cases.

@DennisHeimbigner
Copy link
Collaborator Author

Yes, NCDEFAULT_get/put_vars.
I am not actually worried about the parallel case
WRT to debugging.
But the test case probably needs to be conditional
on not parallel.

@edhartnett
Copy link
Contributor

You don't have to make the test case conditional on not parallel. The parallel issue is not one that any sequentially written code will active - the test has to do something trickier than that. So please just let your test run on parallel builds as normal.

@DennisHeimbigner
Copy link
Collaborator Author

I am a bit confused. Are you saying that NC4_put/get_vars code will not operate
correctly when parallel is used? If so, then that is a problem that needs to be fixed.

@edhartnett
Copy link
Contributor

Yes, that is what I am saying. I believe they will not operate correctly in some cases.

I have some test code for a similar case for the vara get/put which is up in a PR right now. Once it gets merged I will add some tests to check similar behavior on vars.

Would you like me to make the necessary changes? I can take a branch and merge this branch, then add the tests and extra code needed.

@DennisHeimbigner
Copy link
Collaborator Author

I have some test code for a similar case for the vara get/put which is up in a PR right now.
I was about to ask about vara because the vars code is taken directly from the var code.
Would I be able to apply the vara changes to fix vars?

@DennisHeimbigner
Copy link
Collaborator Author

On second thought I think it would be best if you made the changes
since you understand the parallel code.

@edhartnett
Copy link
Contributor

OK I'll have something tomorrow or the next day...

#define DIMSIZE1 1024
#define TOTALSIZE (DIMSIZE0*DIMSIZE1)

#define CHECK(expr) assert((expr) == NC_NOERR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we cannot use asserts in tests like this.

The reason is that this will cause the test to fail to compile with -DNDEBUG builds. The -DNDEBUG is usual in HPC builds (and also in code coverage builds, which is how my CI system caught this.)

So I have made minor modifications to the test to no longer use asserts, but instead to use the macros in err_macros.h.

@edhartnett
Copy link
Contributor

OK, I have a version of this code which seems to work well for paralllel I/O. It depends on coord checking being relaxed. If we cannot eliminate the relax coord bound issue, then relaxed coord bounds will have to be turned on for all parallel builds anyway.

As soon as we get my current PR merged, we can move on with this issue.

@DennisHeimbigner I can confirm that this results in ~600 times speedup, which is great.

As part of my vars changes I removed the vara functions, which are just a special case of the vars functions. Otherwise this leads to a large amount of duplicated code.

@WardF WardF added this to the 4.7.0 milestone Jun 8, 2018
@WardF WardF self-assigned this Jun 8, 2018
@WardF
Copy link
Member

WardF commented Jun 8, 2018

Wires got crossed while merging master back in but I think that's fixed. Waiting for tests to finish, should be merged by EOD.

@shoyer
Copy link
Contributor

shoyer commented Feb 6, 2019

Just a heads up, this change seems to have slowed down a different path for indexing in netCDF4-Python:
Unidata/netcdf4-python#870

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants