Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF 4.9.0 fails to create a NetCDF4 file on Windows (x86_64-w64-mingw32-gcc) with HDF5 1.12.1 #2248

Closed
Alexander-Barth opened this issue Mar 15, 2022 · 10 comments
Assignees
Milestone

Comments

@Alexander-Barth
Copy link
Contributor

  • the version of the software with which you are encountering an issue

NetCDF 4.8.1

  • environmental information (i.e. Operating System, compiler info, java version, python version, etc.)

Windows, mingw compiler (x86_64-w64-mingw32-gcc (GCC) 4.8.5)

  • a description of the issue with the steps needed to reproduce it

NetCDF 4.8.1 fails to create a NetCDF4 files on Windows with HDF5 1.12.1 (binary from mingw).

The issue has been reported here (in the context of julia) by @visr
Alexander-Barth/NCDatasets.jl#164

The julia code in the issue correspond to the following C code:

retval = nc_create("test.nc4", NC_NETCDF4, &ncid)

So just creating a NetCDF4 file triggers the issue.

The error message is:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7905cc0c -- nc4_create_file at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:124
in expression starting at C:\Users\visser_mn\.julia\packages\NCDatasets\TCrQh\test\runtests.jl:10
nc4_create_file at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:124
NC4_create at /workspace/srcdir/netcdf-c-4.8.1/libhdf5\hdf5create.c:313
NC_create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:1926
nc__create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:464
nc_create at /workspace/srcdir/netcdf-c-4.8.1/libdispatch\dfile.c:391
nc_create at C:\Users\visser_mn\.julia\packages\NCDatasets\TCrQh\src\netcdf_c.jl:255
[...]

This corresponds to the following line:
https://github.com/Unidata/netcdf-c/blob/v4.8.1/libhdf5/hdf5create.c#L124

To build NetCDF4 on Windows I have to apply these patches:
https://github.com/Alexander-Barth/Yggdrasil/tree/NetCDF-v4.8.1/N/NetCDF/bundled/patches

This first patch is based on #2138.

NetCDF 4.8.1 works on all other tested platforms (Linux, Mac OS, even Mac OS-M1). We had also this issue with NetCDF 4.7.4 and HDF5 1.12.1 on Windows.

In julia, all libraries are cross-compiled from a Linux-x86_64 environment targeting the different OS and CPU architectures.
I am not sure where actually the problem is. It could also be in HDF5 , the mingw compiler, ...

Any help would be greatly appreciated :-)

As a Linux user, I am not too familiar with Windows. I just want to get our software to work for our students which are primarily Windows users.

Ref:
Alexander-Barth/NCDatasets.jl#164
JuliaPackaging/Yggdrasil#4511
JuliaGeo/NetCDF.jl#151
#2138
#2124

CC: @visr, @giordano

@WardF
Copy link
Member

WardF commented Mar 15, 2022

Thank you for the very comprehensive report! I'll take a look, I am working primarily on Linux/OSX myself, with Windows being a secondary environment. I will try to duplicate this environment and issue, and hopefully being able to provide some insight. Hopefully I will be able to track this down without going into Julia, as I unfortunately have no experience there.

@WardF WardF self-assigned this Mar 15, 2022
@WardF WardF added this to the 4.9.1 milestone Mar 15, 2022
@edwardhartnett
Copy link
Contributor

Could this be a file permission problem? Do you have permission to create the file, or is there already a file of that name that you are attempting to overwrite without using NC_CLOBBER?

@Alexander-Barth
Copy link
Contributor Author

In our case, we have this issue also with a randomly created filenames in the windows temporary directory (windows equivalent of /tmp) when running our test suite. I guess that @visr also checked that the file test.nc4 did not exist in his tests.

@Alexander-Barth Alexander-Barth changed the title NetCDF 4.8.1 fails to create a NetCDF4 files on Windows with HDF5 1.12.1 NetCDF 4.8.1 fails to create a NetCDF4 file on Windows (x86_64-w64-mingw32-gcc) with HDF5 1.12.1 Mar 16, 2022
@visr
Copy link

visr commented Mar 16, 2022

Yes indeed, I don't think it is a file permission issue. The filename didn't exist, and I could create a netcdf3-classic file in the same place with the same library.

@visr
Copy link

visr commented Aug 3, 2022

Just noting that we still see this on netCDF 4.9.0 built against HDF5 1.12.2, ref Alexander-Barth/NCDatasets.jl#164 (comment), only when cross compiling netCDF for Windows.

@Alexander-Barth
Copy link
Contributor Author

As a test, I added a simple test function to libnetcdf which only creates a HDF5 File Access Properties list:

int my_test_function() {
    hid_t fapl_id = -1;
    int retval = NC_NOERR;
    printf("start\n");
    fapl_id = H5Pcreate(H5P_FILE_ACCESS);
    printf("end\n");
    return retval;
}

Calling this function reproduces this crash:

$ /c/Users/Alexander\ Barth/AppData/Local/Programs/Julia-1.8.0-rc3/bin/julia.exe --eval ' using NetCDF_jll; ccall((:my_test_function, libnetcdf), Cint, ())'

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1758 -- my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
in expression starting at none:1
my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
top-level scope at .\none:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:897
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
exec_options at .\client.jl:276
_start at .\client.jl:522
jfptr__start_37025.clone_1 at C:\Users\Alexander Barth\AppData\Local\Programs\Julia-1.8.0-rc3\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 2903 (Pool: 2891; Big: 12); GC: 0
start

@WardF WardF changed the title NetCDF 4.8.1 fails to create a NetCDF4 file on Windows (x86_64-w64-mingw32-gcc) with HDF5 1.12.1 NetCDF 4.9.0 fails to create a NetCDF4 file on Windows (x86_64-w64-mingw32-gcc) with HDF5 1.12.1 Aug 9, 2022
@WardF
Copy link
Member

WardF commented Aug 9, 2022

Updated issue title to reflect it is still observed in latest version.

@WardF
Copy link
Member

WardF commented Aug 9, 2022

As a test, I added a simple test function to libnetcdf which only creates a HDF5 File Access Properties list:

int my_test_function() {
    hid_t fapl_id = -1;
    int retval = NC_NOERR;
    printf("start\n");
    fapl_id = H5Pcreate(H5P_FILE_ACCESS);
    printf("end\n");
    return retval;
}

Calling this function reproduces this crash:

$ /c/Users/Alexander\ Barth/AppData/Local/Programs/Julia-1.8.0-rc3/bin/julia.exe --eval ' using NetCDF_jll; ccall((:my_test_function, libnetcdf), Cint, ())'

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1758 -- my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
in expression starting at none:1
my_test_function at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:45
top-level scope at .\none:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:897
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
exec_options at .\client.jl:276
_start at .\client.jl:522
jfptr__start_37025.clone_1 at C:\Users\Alexander Barth\AppData\Local\Programs\Julia-1.8.0-rc3\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 2903 (Pool: 2891; Big: 12); GC: 0
start

This adds an interesting wrinkle, insofar as this is an HDF5 function being called absent any netCDF function calls. Because it's not completely separated from libnetcdf I can't say with absolute certainty that it's an HDF5 issue, but given the simple nature of the test program and the reliance on pure HDF5 code (the single HDF5 function call), it would suggest to me that this is an issue in libhdf5.

I'm not unfamiliar with cross-compilation, but it is not part of my regular workflow. Let me take a look and see if I can replicate this in a stand-alone test program that only uses hdf5 and is also cross-compiled. I hate to ask, but if this is something you can easily test on your end, @Alexander-Barth, it might be worth doing so so that we can really nail down whether this is in the netCDF layer or not.

@Alexander-Barth
Copy link
Contributor Author

Indeed, I was wondering the same thing:

JuliaPackaging/Yggdrasil#4511 (comment)

The small example program using just HDF5 also failed when using the gcc compiler from the julia build environment (x86_64-w64-mingw32-gcc 4.8.5). Surprisingly, I can cross-compile the example program using the cross-compiler from Ubuntu 20.04 (with a more up-to-date version "9.3-win32 20200320").

So I am wondering if this could be compiler bug triggered by some changes in HDF5 since version 1.12.1 (completely unrelated to NetCDF).

@Alexander-Barth
Copy link
Contributor Author

I am closing this issue since the problem did not show up again after upgrading the GCC version. Thanks to all who contributed to the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants