-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SharedArray access in @parallel for
causes Segmentation fault
#14764
Comments
I've reproduced it a few times on 0.4.2 but not on the master. Can you give the 0.4.3 and the nightly binary a try and see if that makes any difference? |
In versions 0.4.3 and 0.4.4-pre+2 the bug still appears.
In the latest version (versioninfo below) I can't reproduce the bug.
|
I also cannot reproduce this on a newish version. julia> versioninfo()
Julia Version 0.5.0-dev+749
Commit 83eac1e* (2015-10-13 16:00 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT NO_AFFINITY NEHALEM)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.3
|
FWIW, the same with 3 runs on OSX and 0.4.3-pre+6
|
I also have a similar issue when using shared array with remotecall_wait (with 10 workers). The code runs for a while but crashes. (The timing of crash is random...) I got the following message
The julia version is
Is this issue still open? |
Yes. |
On master it now hangs after sometime, and a Ctrl-C results in
The master process is in a busy loop state at the time of interruption. |
Hi everyone, I think I have a similar issue using the parallelism tools First, I found an issue in my code arising from the use of the object versioninfo()
a) when the code is stuck, at some point I lose patience and hit Ctrl+C, and with another version:
I must change a little the code, but basically, it is the same. The To me, it looks like the program is in an infinite loop (not exactly So, from here, I started mining information from the forums and ended on
I have the following behavior that is really similar to what @pgawron show in his post.
if N=10 and nprocs=1 at some point there is a memory problem (gc?), the
If I set N=1000, even with all the RAM, I cannot reach the end of the In a multithreaded mode (julia -p 7), the code with N=10 crashes after a So, my question is the following: Is there any way to avoid these issues Thx Matthew |
Probably the same underlying cause as #15923 . |
Indeed, it looks really similar to #15923 . I tried to disable the gc, but I ran out of memory really quickly. |
Have exactly this issue, and experiencing all three types of failures decsribed by @matthewozon :
Version info is
A bit about the code: I'm passing vectors of SharedArrays to workers via calls that look like @sync for i = 1:n_chunks
@async remotecall_wait(procs[i], function, vector_of_shared_arrays, i)
end |
On 0.4.6, with the test-script of #14764 (comment) I see at times the first error pasted here #14764 (comment). But I also saw this:
which killed one worker (of 4). And sometimes this somewhat different segfault:
|
@yuyichao I think this looks like the array/finalizer issue you've dissected? |
Hard to tell from the backtrace but likely. |
Just browsing WeakKeyDict issues: could this be related to #3002? |
On julia-0.7-beta2.0 neither the original example nor #14764 (comment) error for me (I ran those cases each twice). Maybe someone else can check as well? |
I can confirm that they both work on |
Following code always causes Segmentation fault when run as parallel
julia -p 20
. When run on single process it does not crash.Unfortunately the moment of crash may vary.
The stack I get is following
The text was updated successfully, but these errors were encountered: