Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in H5F_addr_decode due to dereferencing of null valued f->shared #3649

Closed
hmaarrfk opened this issue Oct 9, 2023 · 2 comments
Closed

Comments

@hmaarrfk
Copy link

hmaarrfk commented Oct 9, 2023

Describe the bug
I'm still trying to track down exactly what caused it, but I wanted to maybe get some input from others.

What we have for now is pseudo code for a reproducer:

  1. Use h5netcdf or libnetcdf's python bindings to open a file with xarray. Read only. Call this reference f1.
  2. In a loop, re-open it with xarray. call this reference f2
  3. Do a bunch of math.
  4. Close the reference f2.
  5. Try o open the file again, with a new reference f3.
  6. Hit a segfault.

Expected behavior
Not to segfault.

Platform (please complete the following information)

  • HDF5 version: 1.14.2
  • OS and version: linux 64.
  • Compiler and version: conda-forge
  • Build system (e.g. CMake, Autotools) and version: autotools
  • Any configure options you specified: direct driver enabled
  • MPI library and version (parallel HDF5): none

Additional context

I feel like I tracked it down to:

https://github.com/HDFGroup/hdf5/blob/hdf5-1_14_2/src/H5Fint.c#L2947

hdf5/src/H5Fint.c

Line 2947 in 2d926cf

H5F_addr_decode_len(H5F_SIZEOF_ADDR(f), pp, addr_p);

For some reason, f->shared (or H5F_SHARED(f)) cannot be dereferenced. It isn't always set to null, sometimes it is set to some large "pointer value".

I took the conda-forge build scripts, built it in debug mode locally, and obatined the following from gdb

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffad7190b4 in H5F_addr_decode (f=0x55555817a830, pp=pp@entry=0x7fffffff8368, addr_p=addr_p@entry=0x7fffffff8370) at H5Fint.c:2948
2948	   H5F_addr_decode_len(H5F_SIZEOF_ADDR(f), pp, addr_p);
(gdb) print(f->shared)
$1 = (H5F_shared_t *) 0x6500000065
(gdb) print(f->shared->sizeof_addr)
Cannot access memory at address 0x65000000a5
(gdb) print(f)
$2 = (const H5F_t *) 0x55555817a830
(gdb) print(f->shared)
$3 = (H5F_shared_t *) 0x6500000065

The backtrace is the following

#0  0x00007fffad7190b4 in H5F_addr_decode (f=0x55555817a830, pp=pp@entry=0x7fffffff8368, addr_p=addr_p@entry=0x7fffffff8370) at H5Fint.c:2948
#1  0x00007fffad932e15 in H5VL__native_blob_specific (obj=<optimized out>, blob_id=<optimized out>, args=0x7fffffff83d0) at H5VLnative_blob.c:156
#2  0x00007fffad9183c4 in H5VL__blob_specific (obj=<optimized out>, blob_id=<optimized out>, args=args@entry=0x7fffffff83d0, cls=<optimized out>) at H5VLcallback.c:7460
#3  0x00007fffad92b183 in H5VL_blob_specific (vol_obj=<optimized out>, blob_id=<optimized out>, args=args@entry=0x7fffffff83d0) at H5VLcallback.c:7489
#4  0x00007fffad912790 in H5T__vlen_disk_isnull (file=<optimized out>, _vl=<optimized out>, isnull=<optimized out>) at H5Tvlen.c:764
#5  0x00007fffad89fee3 in H5T__conv_vlen (src_id=src_id@entry=216172782113805003, dst_id=dst_id@entry=216172782113805002, cdata=cdata@entry=0x555558197e80, nelmts=nelmts@entry=1, 
    buf_stride=buf_stride@entry=0, bkg_stride=bkg_stride@entry=0, buf=0x555558789110, bkg=0x0) at H5Tconv.c:3277
#6  0x00007fffad888fac in H5T_convert (tpath=tpath@entry=0x555558197e10, src_id=src_id@entry=216172782113805003, dst_id=dst_id@entry=216172782113805002, nelmts=nelmts@entry=1, buf_stride=buf_stride@entry=0, 
    bkg_stride=bkg_stride@entry=0, buf=0x555558789110, bkg=0x0) at H5T.c:5306
#7  0x00007fffad6d6ba5 in H5D_get_create_plist (dset=<optimized out>) at H5Dint.c:3659
#8  0x00007fffad933f26 in H5VL__native_dataset_get (obj=<optimized out>, args=0x7fffffff9000, dxpl_id=<optimized out>, req=<optimized out>) at H5VLnative_dataset.c:468
#9  0x00007fffad919c47 in H5VL__dataset_get (obj=<optimized out>, args=args@entry=0x7fffffff9000, dxpl_id=dxpl_id@entry=792633534417207304, req=req@entry=0x0, cls=<optimized out>) at H5VLcallback.c:2426
#10 0x00007fffad91f906 in H5VL_dataset_get (vol_obj=<optimized out>, args=args@entry=0x7fffffff9000, dxpl_id=792633534417207304, req=req@entry=0x0) at H5VLcallback.c:2457
#11 0x00007fffad6a9407 in H5Dget_create_plist (dset_id=360287970189646678) at H5D.c:774
#12 0x00007fffa3108336 in __pyx_f_4h5py_4defs_H5Dget_create_plist () from /home/mark/mambaforge/envs/mcam_dev/lib/python3.10/site-packages/h5py/defs.cpython-310-x86_64-linux-gnu.so
#13 0x00007fffa2e3b21e in __pyx_pw_4h5py_3h5d_9DatasetID_15get_create_plist () from /home/mark/mambaforge/envs/mcam_dev/lib/python3.10/site-packages/h5py/h5d.cpython-310-x86_64-linux-gnu.so
#14 0x000055555569b2e4 in method_vectorcall_NOARGS (func=0x7fffa3033240, args=0x7fff4ca721e8, nargsf=<optimized out>, kwnames=0x0) at /usr/local/src/conda/python-3.10.12/Objects/descrobject.c:432
#15 0x00007fffa2fc9fa9 in __pyx_pw_4h5py_8_objects_9with_phil_1wrapper () from /home/mark/mambaforge/envs/mcam_dev/lib/python3.10/site-packages/h5py/_objects.cpython-310-x86_64-linux-gnu.so
#16 0x000055555569171b in _PyObject_MakeTpCall (tstate=tstate@entry=0x555555908cf0, callable=callable@entry=0x7fffa303d220, args=args@entry=0x7fff981b7618, nargs=1, keywords=0x0)
    at /usr/local/src/conda/python-3.10.12/Objects/call.c:215

I'll try to trim my reproducible code before sharing it, it is a bit much right now, but I wanted to share some of my findings in case this is a known issue.

I've also tested this with libhdf5 from ubuntu. they seem to have 1.10.8 and it recreates the issue, though sometimes not on the first try.

@hmaarrfk
Copy link
Author

hmaarrfk commented Oct 9, 2023

Sorry about opening this here. I feel like, this might very well be a bug in xarray / h5netcdf / libnetcdf4 (the python and c++ libraries built atop of HDF5), but the segfault I feel should be something that HDF5 should be interested in avoiding, even with improper usage of the library.

@hmaarrfk hmaarrfk closed this as completed Oct 9, 2023
@hmaarrfk
Copy link
Author

hmaarrfk commented Oct 9, 2023

sorry, let me try to find out what is causing this exactly before you spend your time on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant