-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault with a particular netcdf4 file #8289
Comments
Ok this was bugging me enough, but here is a reproducer that just "runs" without any data: import xarray
engine = 'netcdf4'
dataset = xarray.Dataset()
dataset.coords['x'] = ['a']
dataset.to_netcdf('mrc.nc')
dataset = xarray.open_dataset('mrc.nc', engine=engine)
for i in range(10):
print(f"i={i}")
xarray.open_dataset('mrc.nc', engine=engine) the key was making the coordinate a |
Sorry to rapid fire post, but the following "hack" seems to resolve the issues I am observing: diff --git a/xarray/backends/netCDF4_.py b/xarray/backends/netCDF4_.py
index f21f15bf..8f1243da 100644
--- a/xarray/backends/netCDF4_.py
+++ b/xarray/backends/netCDF4_.py
@@ -394,8 +394,8 @@ class NetCDF4DataStore(WritableCFDataStore):
kwargs = dict(
clobber=clobber, diskless=diskless, persist=persist, format=format
)
- manager = CachingFileManager(
- netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
+ manager = DummyFileManager(
+ netCDF4.Dataset(filename, mode=mode, **kwargs)
)
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) I have a feeling some reference isn't being kept, and the file is being free'ed somehow during garbage collection. While this "hack" somewhat "works", if I try to open the same file with two different backends, it really likes to complain. It may be that libnetcdf4 just expects to be in control of the file at a all times.. |
Running a similar segfaulting benchmark on xarray's main branch import xarray
import numpy as np
write_engine = 'h5netcdf'
hold_engine = 'h5netcdf'
read_engine = 'netcdf4'
filename = f'{write_engine}_mrc.nc'
# %%
dataset = xarray.Dataset()
dataset.coords['x'] = ['a']
dataset.coords['my_version'] = '1.2.3.4.5.6'
dataset['images'] = (('x', ), np.zeros((1,)))
dataset.to_netcdf(filename, engine=write_engine)
# %%
dataset = xarray.open_dataset(filename, engine=hold_engine)
for i in range(100):
print(f"i={i}")
xarray.open_dataset(filename, engine=read_engine)
|
While I know these issues are hard, can anybody else confirm that this happens on their system as well? Maybe my machine is really weird.... As a final reproducer: import xarray
xarray.set_options(warn_for_unclosed_files=True)
# Also needs a small patch....
"""
diff --git a/xarray/backends/file_manager.py b/xarray/backends/file_manager.py
index df901f9a..a2e8af03 100644
--- a/xarray/backends/file_manager.py
+++ b/xarray/backends/file_manager.py
@@ -252,11 +252,10 @@ class CachingFileManager(FileManager):
self._lock.release()
if OPTIONS["warn_for_unclosed_files"]:
- warnings.warn(
+ print(
f"deallocating {self}, but file is not already closed. "
"This may indicate a bug.",
- RuntimeWarning,
- stacklevel=2,
+ flush=True
)
def __getstate__(self):
"""
dataset = xarray.Dataset()
dataset.coords['x'] = ['a']
dataset.to_netcdf('mrc.nc', engine='netcdf4')
dataset = xarray.open_dataset('mrc.nc', engine='netcdf4')
for i in range(100):
print(f"i={i}")
xarray.open_dataset('mrc.nc', engine='netcdf4') Gives the output:
|
I can confirm that your original reproducer segfaults on my system (Linux/x86_64). I also agree with your diagnosis that this seems to be an issue with the caching file manager. FWIW, adding |
thanks for confirming. it has been puzzling me for no end. |
I'm also struggling with the problem and I have simplified the code a bit more: import xarray
import numpy as np
import os
import sys
filename = 'test_mrc.nc'
if not os.path.exists(filename):
dataset_w = xarray.Dataset()
dataset_w['x'] = ['a']
dataset_w.to_netcdf(filename)
print("try open 1", file=sys.stderr)
dataset = xarray.open_dataset(filename)
print("try open 2", file=sys.stderr)
dataset2 = xarray.open_dataset(filename)
dataset2 = None
print("try open 3", file=sys.stderr)
dataset3 = xarray.open_dataset(filename)
print("success") The problem only occurs if certain features of netcdf4 are used on file (e.g. superblock 2, strings), but those are common. I've tested with |
The above example succeeds in my case (
No conda; just a regular python virtual environment |
I just managed to upgrade my xarray to 2024.03.0 (pinning the version) and still get the error, though it works sometimes?
(this was independ if the test_mrc.nc file existed or not...) |
from a quick We can observe, though, that the file manager id in the least-recently used cache changes every time we open it, but that the underlying |
I think it is related to #7359 (comment) |
I'm curious if there has been any recent progress on this issue? I'm running into this problem even with Anyways, my code is currently long and complex, but the minimal example of @heikoklein does segfault in my current environment. Running my code with I can provide more detail if helpful, but I don't really know how to tackle these kinds of library errors and would be grateful for any assistance. It's a bit unfortunate because the last time I ran my code using |
We hit the same bug independently on this setup: >>> xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.10 (main, Mar 15 2022, 15:56:56)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.49.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.4-development
xarray: 2024.7.0
pandas: 2.2.2
numpy: 2.0.2
scipy: 1.13.1
netCDF4: 1.7.2
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.9.1
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 58.1.0
pip: 24.3.1
conda: None
pytest: 8.3.3
mypy: 1.13.0
IPython: 8.18.1
sphinx: None |
What happened?
The following code yields a segfault on my machine (and many other machines with a similar environment)
tiny.nc.txt
mrc.nc.txt
What did you expect to happen?
Not to segfault.
Minimal Complete Verifiable Example
hand crafting the file from start to finish seems to not segfault:
MVCE confirmation
Relevant log output
Anything else we need to know?
At first I thought it was deep in hdf5, but I am less convinced now
xref: HDFGroup/hdf5#3649
Environment
The text was updated successfully, but these errors were encountered: