-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when differencing cubes #2063
Comments
As @ajdawson notes "Interestingly, it is sufficient to load only one of the cubes data payloads for this bug to disappear... This should be reported as a bug on the iris issue tracker." I'm pretty sure the solution is that Iris needs to use a thread lock when accessing data from netCDF4, because the HDF5 library is not thread safe. We use such a thread lock in xarray/dask. |
Thanks @shoyer this is useful information. I did a quick test by adding a global lock to data reads within a |
https://github.com/SciTools/iris/blob/master/INSTALL#L89 also provides a suggestion that there may be an issue here if the HDF5 build is not thread safe. I have tripped over this in the past @PAGWatson is this still an open issue, from your p.o.v.? |
Well I can work around this, so it's not a huge problem for me for now. I've not checked if the bug is still present in the latest version of Iris, though. |
There's no harm in adding a defensive |
I've run into the same issue over the last couple of days (iris 1.10.0). Any updates on this? I can work around it as well for now, but seems like something that people are very prone to run into and seems hard to make sense of since all you get is the segmentation fault. |
Hi @mheikenfeld, thanks for letting us know you're also encountering this issue. Our advice remains to ensure that you're using a thread-safe install of hdf5 (see the Iris install note). For more information, see §4.3.11 of the HDF5 INSTALL document. |
We should work towards actually solving this (in biggus) because many people are using a pre-built binary HDF5 and have no control over whether it is thread-safe or not. |
See SciTools/biggus#194. |
conda-forge now has a thread-safe version of hdf5 available |
I think this would be a neat thing to deliver However, there is an ongoing activity to replace biggus with dask for irisv2 the do you think that:
could fit into that structure @bjlittle @dkillick @pp-mo @lbdreyer
|
Is this issue still current? It looks like it has not been solved in Unidata/netcdf4-python#844. I can see that the proposed lock has not been implemented yet: iris/lib/iris/fileformats/netcdf.py Lines 434 to 442 in f5feb28
We seem to be running into this with iris 2.4, see ESMValGroup/ESMValCore#644, example of the crash happening: https://app.circleci.com/pipelines/github/ESMValGroup/ESMValCore/2482/workflows/f8e73729-c4cf-408c-bdae-beec24238ac1/jobs/10300/steps Using these libraries installed from conda:
|
I just downloaded OP's sample files and could not reproduce the problem at Iris2.2. (The files won't load in the Iris2.4 environment I have access to as they have invalid variable names "pearson's_r" ). |
might be useful for you guys to know the rates at which a SegFault occurs at the point of realizing the data (have a look at this comment and this comment) - a segfault happens 1-2% times a call to realize a cube's data is executed; this may seem low for a rate, but this is the rate for a single event (single call to realize), and since these segfaults are Poisson distributed if you have 100 such calls in a script, statistically, your script will segfault everytime you run it. Have you guys looked into this for iris3 by any chance? |
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity. If this issue is still important to you, then please comment on this issue and the stale label will be removed. Otherwise this issue will be automatically closed in 28 days time. |
@valeriupredoi Have you seen any segfaults in our CI recently? |
The OP’s example here has not been reproducible for some time (#2063 (comment)), and significant recent work has gone into thread safety (#5095). So I reckon this issue could be closed. I’ll leave it to @trexfeathers’ decision as he is assignee. |
Hello,
I get a seg fault when I take the difference between two cubes and then try to access the data. I've attached the files containing the data at https://groups.google.com/forum/#!topic/scitools-iris/OgFbHKtNGqU .
However, replacing the last two lines with the line below works fine.
(I'm using Iris 1.9.2 in ipython 4.0.3.)
Andrew Dawson reproduced the issue and posted the full traceback at the above link.
The text was updated successfully, but these errors were encountered: