synchronize(blocking = false) hangs in julia 1.7 eventually #1350

anj00 · 2022-02-04T16:45:19Z

I have a general pattern like this

function work()
   do_work_on_gpu()
   synchronize(blocking = false)
end

while true
   work()
end

It works 24/7 with julia 1.6 for months already with no issues (I restart it about once a week due to new data I need to add. In general, I think I have seen only one unexplained hang over past 2 years/2B calls. very happy about CUDA.jl stability). However with julia 1.7 after a while (between 10000 and 500_000 calls/loops, which in my case means typically once every 10-60 minutes) it just hangs. This "hang" is happening on different data input, different cards on the server (if I let it run eventually all 6 cards hang). It happens on my development pc as well.
"hang" means the code is stuck in synchronize(blocking = false) line, GPU stops showing any load, i.e. GPU does nothing yet the code doesn't return. if I call work() with the same input parameters as the call which just hang (i.e. pressing Ctrl+C first to get out of the loop), it works just fine.

I am trying to create a simple snippet for the bug reproduction, but as you can imagine it is bit difficult so far. And then if issue is related to timing would not guaranteed to reproduce on different hardware.

So wonder if there are any tips how to debug it?
Or if CUDA.jl developers could be having ideas how julia 1.7.x could be affecting this pattern? (again same code/CUDA works fine in julia 1.6.5)

Meanwhile, maybe an interesting hint on what is going on. Running processing as a task seems to reduce the probably of a hang dramatically. Maybe will give ideas.
This code is about 5-10x less likely to hang. But I still managed to fail in my tests

function work()
   do_work_on_gpu()
   fetch(@async synchronize(blocking = false))
end

I currently run this code

function work()
   do_work_on_gpu()
   synchronize(blocking = false)
end

while true
   fetch(@async work())
end

And already 10M+ calls with no issues. Kind of pointing that the problem is not likely in the data I ship but somewhere else.

Any input on how to debug it is appreciated. (as task workaround is bit ugly and looks like costing a fair amount of extra CPU then I run 60-100 calls a second and obviously just hides a problem which shouldn't be there in the first place)

This is how my dev PC looks like

julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_CUDA_NSYS = C:\Program Files\NVIDIA Corporation\Nsight Systems 2021.2.1\target-windows-x64\nsys.exe

julia> CUDA.versioninfo()
CUDA toolkit 11.6, artifact installation
Unknown NVIDIA driver, for CUDA 11.6
CUDA driver 11.6

Libraries:
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: missing
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.7.1
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

Environment:
- JULIA_CUDA_NSYS: C:\Program Files\NVIDIA Corporation\Nsight Systems 2021.2.1\target-windows-x64\nsys.exe

1 device:
  0: NVIDIA GeForce RTX 2070 (sm_75, 7.013 GiB / 8.000 GiB available)

The text was updated successfully, but these errors were encountered:

guyvdbroeck · 2022-02-04T19:46:22Z

I can confirm I see the exact same problem on my experiments, it is highly stochastic, happens on different machines and cards, but half of them eventually get stuck at different synchronization points, sometimes hours into an experiment. Have been trying all week to make sense of it and extract a minimal example, but that proved difficult. On Julia 1.6.5 the problem does not arise.

For example CTRL-C on a single process single thread CUDA.jl application that has been stuck doing nothing for an hour gives:

^C
signal (2): Interrupt
in expression starting at /home/guy/.julia/dev/ProbabilisticCircuits/example/bug0.jl:2         
epoll_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)                                   
uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:240                                    
uv_run at /workspace/srcdir/libuv/src/unix/core.c:383                                          
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:481                  
poptask at ./task.jl:827
wait at ./task.jl:836
task_done_hook at ./task.jl:544
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]                
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429                    
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]               
jl_finish_task at /buildworker/worker/package_linux64/build/src/task.c:218                     
start_task at /buildworker/worker/package_linux64/build/src/task.c:888                         
unknown function (ip: (nil))
Allocations: 909734470 (Pool: 909281029; Big: 453441); GC: 322

Another one (after 1300 epochs of training) looks like this:

^CERROR: InterruptException:      
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))                                       
    @ Base ./task.jl:777                                                                                                                                                                        [2] wait()                                                                                                                                                                                  
    @ Base ./task.jl:837        
  [3] wait(c::Base.GenericCondition{ReentrantLock})                                            
    @ Base ./condition.jl:123
  [4] wait(e::Base.Event)      
    @ Base ./lock.jl:366                                                                                                                                                                        [5] nonblocking_synchronize                                                                                                                                                                     @ ~/space/.julia/packages/CUDA/bki2w/lib/cudadrv/stream.jl:162 [inlined]                   
  [6] (::CUDA.var"#207#208"{Float32, Vector{Float32}, Int64, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Int64, Int64})()                                                                     
    @ CUDA ~/space/.julia/packages/CUDA/bki2w/src/array.jl:406                                 
  [7] #context!#59
    @ ~/space/.julia/packages/CUDA/bki2w/lib/cudadrv/state.jl:164 [inlined]                    
  [8] context!                                                                                 
    @ ~/space/.julia/packages/CUDA/bki2w/lib/cudadrv/state.jl:161 [inlined]                    
  [9] unsafe_copyto!(dest::Vector{Float32}, doffs::Int64, src::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64)                                                            
    @ CUDA ~/space/.julia/packages/CUDA/bki2w/src/array.jl:402
 [10] copyto!
    @ ~/space/.julia/packages/CUDA/bki2w/src/array.jl:356 [inlined]
 [11] getindex
    @ ~/space/.julia/packages/GPUArrays/umZob/src/host/indexing.jl:89 [inlined]
 [12] #25
    @ ~/space/.julia/packages/GPUArrays/umZob/src/host/indexing.jl:75 [inlined]
 [13] task_local_storage(body::GPUArrays.var"#25#28"{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, key::Symbol, val::Bool)
    @ Base ./task.jl:281
 [14] macro expansion
    @ ~/space/.julia/packages/GPUArrays/umZob/src/host/indexing.jl:74 [inlined]
 [15] _mapreduce(f::typeof(identity), op::typeof(Base.add_sum), As::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; dims::Colon, init::Nothing)
    @ GPUArrays ~/space/.julia/packages/GPUArrays/umZob/src/host/mapreduce.jl:65
 [16] #mapreduce#20
    @ ~/space/.julia/packages/GPUArrays/umZob/src/host/mapreduce.jl:28 [inlined]
 [17] mapreduce
    @ ~/space/.julia/packages/GPUArrays/umZob/src/host/mapreduce.jl:28 [inlined]
 [18] #_sum#735
    @ ./reducedim.jl:894 [inlined]
 [19] _sum
    @ ./reducedim.jl:894 [inlined]
 [20] #_sum#734
    @ ./reducedim.jl:893 [inlined]
 [21] _sum
    @ ./reducedim.jl:893 [inlined]
 [22] #sum#732
    @ ./reducedim.jl:889 [inlined]
 [23] sum
    @ ./reducedim.jl:889 [inlined]
 [24] mini_batch_em(bpc::CuBitsProbCircuit, raw_data::CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}, num_epochs::Int64; batch_size::Int64, pseudocount::Float64, softness::Float64, param_inertia::F
loat64, param_inertia_end::Float64, flow_memory::Int64, flow_memory_end::Int64, shuffle::Symbol, mars_mem::Nothing, flows_mem::Nothing, node_aggr_mem::Nothing, edge_aggr_mem::Nothing, mine::
Int64, maxe::Int64, debug::Bool)
    @ ProbabilisticCircuits ~/space/.julia/dev/ProbabilisticCircuits/src/bit_circuits/em.jl:296
 [25] macro expansion
    @ ./timing.jl:220 [inlined]
 [26] experiment(train::CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}, test::CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}, epochs1::Int64, epochs2::Int64, epochs3::Int64, latents::Int64; batch_size::In
t64, latent_heuristic::String, pseudocount::Float64, softness::Float64, param_inertia1::Float64, param_inertia_end1::Float64, param_inertia2::Float64, param_inertia_end2::Float64, shuffle::S
ymbol)
    @ Main /scratch/guyvdb/.julia/dev/ProbabilisticCircuits/example/single_experiment.jl:25
 [27] top-level scope
    @ REPL[6]:1
 [28] top-level scope
    @ ~/space/.julia/packages/CUDA/bki2w/src/initialization.jl:52

anj00 · 2022-02-06T19:52:08Z

I can confirm now that moving from this pattern

function work()
   do_work_on_gpu()
   synchronize(blocking = false)
end

while true
   work()
end

to this

while true
   fetch(@async work())
end

"Fixes" the problem in Julia 1.7. Run a test with 70 million calls +. All good. Whereas calling work() directly consistently hangs after 30k-500k calls/iterations.

maleadt · 2022-02-07T10:12:04Z

That's concerning. Can you confirm you are using the exact same packages across Julia versions?

Another interesting datapoint would be to disable the nonblocking synchronization by commenting-out:

CUDA.jl/lib/cudadrv/stream.jl

Lines 126 to 128 in 00955dd

    
           # perform as much of the sync as possible without blocking in CUDA. 
        
           # XXX: remove this using a yield callback, or by synchronizing on a dedicated stream? 
        
           nonblocking_synchronize(stream)

Of course, if you rely on multitasking (to perform other GPU operations while the sync is happening and blocking the thread) this will change the dynamics of your application.

guyvdbroeck · 2022-02-07T21:16:34Z

This suggestion is coming from a place of complete ignorance, but I wonder if it can be related to JuliaLang/julia#44019
For context, I was randomly getting the exact same script to sometimes

crash with signal (6): Aborted
crash with segfault
deadlock as shown above.

So it seems that these are all random outcomes of the same bug.

maleadt · 2022-02-08T07:30:36Z

Are you using multiple threads? If so, it's possible there's some bugs lurking. But with plain multitasking we shouldn't be locking up.

guyvdbroeck · 2022-02-08T07:35:08Z

I'm running a single process with a single thread. I am running different Julia instances on different GPUs if that matters.

maleadt · 2022-02-08T13:26:22Z

The issue you linked to is about use of @threads, so it's unlikely to be related.

crash with signal (6): Aborted

crash with segfault

Those are very different from a deadlock. Can you post the error messages and backtraces?

roflmaostc · 2022-02-08T16:22:29Z

I just jump on the train, I see similar deadlocks in an iterative algorithm where I mainly use operations like abs2., sqrt. and broadcasting :/
Mean time until lock ~5-10min probably

maleadt · 2022-02-08T16:28:44Z

My questions remain though:

is this caused by an upgrade of Julia 1.6 to 1.7?
or is this caused by an upgrade of CUDA.jl, or any other package?
does disabling non-blocking synchronization help?
does this only deadlock, or also abort/segfault (as reported by @guyvdbroeck)?

Ideally a MWE or reproducer would be most helpful, but if that doesn't work a bisect of CUDA.jl (assuming it's an upgrade of the package that causes this issue) could also shed some light on this issue.

guyvdbroeck · 2022-02-08T16:59:59Z

I am not seeing the issue on Julia 1.6 with CUDA v3.8.0, so it is not a pure CUDA.jl bug.
Sorry I cannot be more helpful, my example takes hours to reach the bug. @roflmaostc's 5 minute example is the way to go.
I don't know how to get backtraces (--bug-report is too slow and couldn't run a julia debug instance, see other report), but one error message when it wasn't locking is:

Mini-batch EM iter 309; train LL -677.6212

signal (6): Aborted
in expression starting at REPL[2]:1
Allocations: 1006391627 (Pool: 1005937998; Big: 453629); GC: 416
Aborted (core dumped)

roflmaostc · 2022-02-08T17:04:15Z

I try to execute it in the REPL, maybe it exposes some log

roflmaostc · 2022-02-09T13:54:30Z

For some reason it occurred only in my Jupyter notebook so far but not in the REPL, despite executing the same code.

anj00 · 2022-02-09T16:02:21Z

Maybe we should focus on hang issue in this bug (with no threading/tasking)?

As for the original case:

is this caused by an upgrade of Julia 1.6 to 1.7
- yes, very same CUDA.jl in both. In fact I tried to migrate to 1.7 for a while already. I think I had CUDA.jl 3.6.4 and already had issues with julia 1.7. Then tried several CUDA.jl between 3.6.4 and 3.8.0 all with the same results. And it was happening in julia 1.7.0, 1.7.1 and still in 1.7.2
or is this caused by an upgrade of CUDA.jl, or any other package?
- I think some time ago I run a test where I had exactly same versions of all the packages (as I have a dedicated env folder as part of my git repo) and just different julia version. and had issue.

Now I am trying to disable "disable the nonblocking synchronization" but somehow have this error while switching to package dev mode (julia 1.7.2). Any hints what am I doing wrong?

dev CUDA
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package LLVM [929cbde3]:
 LLVM [929cbde3] log:
 ├─possible versions are: 0.9.0-4.7.1 or uninstalled
 ├─restricted to versions 1.5.2-2 by CUDA [052768ef], leaving only versions 1.5.2-2.0.0
 │ └─CUDA [052768ef] log:
 │   ├─possible versions are: 1.2.0 or uninstalled
 │   └─CUDA [052768ef] is fixed to version 1.2.0
 └─restricted by julia compatibility requirements to versions: 4.0.0-4.7.1 or uninstalled — no versions left

here is the top packages I have

  [6e4b80f9] BenchmarkTools v1.3.0
  [336ed68f] CSV v0.10.2
  [052768ef] CUDA v3.8.0
  [a93c6f00] DataFrames v1.3.2
  [5789e2e9] FileIO v1.13.0
  [708ec375] Gumbo v0.8.0
  [cd3eb016] HTTP v0.9.17
  [033835bb] JLD2 v0.4.20
  [682c06a0] JSON v0.21.2
  [bdcacae8] LoopVectorization v0.12.101
  [f0f68f2c] PlotlyJS v0.18.8
  [91a5bcdd] Plots v1.25.8
  [2913bbd2] StatsBase v0.33.14
  [f269a46b] TimeZones v1.7.1

and here is how CUDA.jl decencies look like

CUDA : 3.8.0 
  RandomNumbers        : 1.5.3
    Requires             : 1.3.0
  AbstractFFTs         : 1.1.0
    ChainRulesCore       : 1.12.0
      Compat               : 3.41.0
  TimerOutputs         : 0.5.15
    ExprTools            : 0.1.8
  GPUCompiler          : 0.13.11
    LLVM                 : 4.7.1
      CEnum                : 0.4.1
      LLVMExtra_jll        : 0.0.13+1
        JLLWrappers          : 1.4.1
          Preferences          : 1.2.3
    ExprTools            : 0.1.8
    TimerOutputs         : 0.5.15
      ExprTools            : 0.1.8
  LLVM                 : 4.7.1
    CEnum                : 0.4.1
    LLVMExtra_jll        : 0.0.13+1
      JLLWrappers          : 1.4.1
        Preferences          : 1.2.3
  CEnum                : 0.4.1
  BFloat16s            : 0.2.0
  GPUArrays            : 8.2.1
    LLVM                 : 4.7.1
      CEnum                : 0.4.1
      LLVMExtra_jll        : 0.0.13+1
        JLLWrappers          : 1.4.1
          Preferences          : 1.2.3
    Adapt                : 3.3.3
  SpecialFunctions     : 2.1.2
    IrrationalConstants  : 0.1.1
    ChainRulesCore       : 1.12.0
      Compat               : 3.41.0
    LogExpFunctions      : 0.3.6
      IrrationalConstants  : 0.1.1
      ChainRulesCore       : 1.12.0
        Compat               : 3.41.0
      ChangesOfVariables   : 0.1.2
        ChainRulesCore       : 1.12.0
          Compat               : 3.41.0
      DocStringExtensions  : 0.8.6
      InverseFunctions     : 0.1.2
    OpenSpecFun_jll      : 0.5.5+0
      JLLWrappers          : 1.4.1
        Preferences          : 1.2.3
  ExprTools            : 0.1.8
  Requires             : 1.3.0
  Reexport             : 1.2.2
  Adapt                : 3.3.3
  Random123            : 1.4.2
    RandomNumbers        : 1.5.3
      Requires             : 1.3.0

maleadt · 2022-02-09T16:20:39Z

 LLVM [929cbde3] log:
 ├─possible versions are: 0.9.0-4.7.1 or uninstalled
 ├─restricted to versions 1.5.2-2 by CUDA [052768ef], leaving only versions 1.5.2-2.0.0

Do you have an old CUDA.jl clone in your dev folder?

anj00 · 2022-02-09T20:43:19Z

Indeed, sorry about that. Had an old CUDA.jl in the dev folder. Forgot about it.

Now I commented the line you suggested

And it looks like I don't see the code hanging. It has been running for 2.1 million loops (at least 10-20x better than with that line). At least we can say it is a significant improvement. And I will let the test running a bit more.

Of course, as I understand commenting that line is causing a CPU being 100% busy waiting for GPU results. Which I hope won't be the solution (this test is one process, in production I run run 6-12 julia processors (1-4 per card), meaning with such a solution CPU will be 100% busy actually slowing the process which generates data for gpu to process :) kind of ironic GPU is starving because other gpu processors use CPU to wait :) ) But just a hint where to look for the solution

maleadt · 2022-02-10T06:27:13Z

Of course, as I understand commenting that line is causing a CPU being 100% busy waiting for GPU results.

Correct. We can make it so that the synchronization doesn't consume CPU, but blocks on an OS primitive, but that still blocks other Julia tasks from making progress.

@luraess's testing seems to imply this may be related to Julia 1.7.1 -- could you verify the nonblocking_sync hangs on that version but still works on 1.7.0?

anj00 · 2022-02-10T08:02:28Z

The problem appeared in 1.7.0. And I tried 1.7.1 and 1.7.2. All with the same result

maleadt · 2022-02-10T09:41:46Z

OK, we'll have to debug this then. What could be useful is a backtrace of all the live tasks during the hang. That isn't easy to come by though, and needs some gdb wrangling using a custom Julia build. I've prepared an appropriate build here, https://drive.google.com/file/d/1C3wtlaIzAw6kQuZ8JqubA4BCbOJwLkl8/view?usp=sharing, which is just Julia 1.7.3-pre (from release-1.7) with the necessary patch applied.

Please try to reproduce the hang with this build of Julia. Once the deadlock happens, attach gdb to the process (it may be useful to note the output of getpid() from Julia before launching your application):

sudo gdb --pid 83937

Alternatively, if you don't have sudo on that machine, you could run the custom Julia under gdb (gdb --args ./julia $ANY_OTHER_ARGS and then run). To get back to GDB once the deadlock happens, hit Ctrl-C. If GDB breaks before that, because of another unrelated signal (e.g. SIGSEGV as used by the GC) you can tell it to ignore that signal using handle SIGSEGV nostop and continue to continue.

Once have GDB at the point of deadlock, we first need to find a thread that we can use to print the backtraces from. Typically that will just be thread 1, but if that thread happens to be doing GC (ptls->gc_state != 0) you can't use it (in that case, either continue and try again later, or try another thread but take care to only go up to the amount of threads Julia was launched with and not use the OpenBLAS threads):

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f3ccc3cab80 (LWP 83937))]
#0  0x00007f3ccc4eb92e in epoll_wait () from /usr/lib/libc.so.6
(gdb) print (int8_t) ((jl_ptls_t)jl_get_ptls_states())->gc_state
$6 = 0 '\000'

So here thread 1 isn't doing GC and can be used to dump the task backtraces. First check how many live tasks there are:

(gdb) print jl_live_tasks()->length
$7 = 3

Now we can print the backtraces for each of these (numbering starts at 0 to the length reported above):

(gdb) call jlbacktracet(jl_arrayref(jl_live_tasks(), 0))

This will print a back-trace in the process' terminal. For example, if I do a simple wait(Condition()) from the REPL I get:

jl_unw_swapcontext at /tmp/julia/src/task.c:958 [inlined]
jl_swap_fiber at /tmp/julia/src/task.c:970
ctx_switch at /tmp/julia/src/task.c:437
jl_switch at /tmp/julia/src/task.c:502
try_yieldto at ./task.jl:767
wait at ./task.jl:837
wait at ./condition.jl:123
#134 at /tmp/julia/usr/share/julia/stdlib/v1.7/Distributed/src/remotecall.jl:281 [inlined]
lock at ./lock.jl:190
lock at ./condition.jl:78 [inlined]
macro expansion at /tmp/julia/usr/share/julia/stdlib/v1.7/Distributed/src/remotecall.jl:279 [inlined]
#133 at ./threadingconstructs.jl:178
jfptr_YY.133_50259 at /tmp/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /tmp/julia/src/julia.h:1788 [inlined]
start_task at /tmp/julia/src/task.c:877

Please report those here for all live tasks. If you have any troubles with this, contact me on Slack.

anj00 · 2022-02-10T10:58:46Z

Unfortunately, I run Windows for this project. I have linux in docker, but then GPUs don't get exposed correctly there (at least using VirtualBox and my limited knowledge ) .

Any chance doing similar in windows?
Quick search shows that there is gdb port for windows (have zero experience with it, but fingers crossed it works). If you can build me that special julia for windows as well. I can try to run it.

maleadt · 2022-02-10T11:30:38Z

What about WSL2? That should be easier than running gdb in Windows, I think.

luraess · 2022-02-10T11:36:27Z

Following up on #1350 (comment), after more testing on 1.6.5, 1.7.0, 1.7.1 and 1.7.2, it seems that both for Spack-built binaries and binaries downloaded from julialang.org:
1.6.5 - pass
1.7.0 - pass
1.7.1 - fail (freezing)
1.7.2 - pass (no Spack-build available yet)

anj00 · 2022-02-10T17:46:06Z

Cool that WSL2 now supports GPU. I managed to make it working.

Good news: The test hangs in WSL2. If that helps, then I press Ctrl+C I get the following

signal (2): Interrupt
in expression starting at /mnt/c/Src/test.jl:54
epoll_wait at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:240
uv_run at /workspace/srcdir/libuv/src/unix/core.c:383
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:481
poptask at ./task.jl:827
wait at ./task.jl:836
task_done_hook at ./task.jl:544
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_finish_task at /buildworker/worker/package_linux64/build/src/task.c:218
start_task at /buildworker/worker/package_linux64/build/src/task.c:888
unknown function (ip: (nil))
Allocations: 2378839661 (Pool: 2378701340; Big: 138321); GC: 1885

The bad news is that I can't seem to be able to start the julia version you sent me. I unzipped it and just try to start julia from the bin folder. I get the following error

ERROR: Unable to load dependent library /_path_to_unzipped_julia_debug_/../lib/julia/libjulia-internal.so.1
Message:/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /_path_to_unzipped_julia_debug_/../lib/julia/libjulia-internal.so.1)

quick search on web says that one shouldn't mess with this lib but instead ask developer to build for correct OS version. But I know so little about linux. maybe there are other ways to make it work

for reference here is the setup I have. Windows WSL2. Ubuntu 20.04.3

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
CUDA toolkit 11.6, artifact installation
NVIDIA driver 511.65.0, for CUDA 11.6
CUDA driver 11.6

Libraries:
- CUBLAS: 11.8.1
- CURAND: 10.2.9
- CUFFT: 10.7.0
- CUSOLVER: 11.3.2
- CUSPARSE: 11.7.1
- CUPTI: 16.0.0
- NVML: 11.0.0+510.47.3
  Downloaded artifact: CUDNN
- CUDNN: 8.30.2 (for CUDA 11.5.0)
  Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.7.2
- LLVM: 12.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: NVIDIA GeForce RTX 2070 (sm_75, 7.843 GiB / 8.000 GiB available)

maleadt · 2022-02-10T20:10:03Z

Ah yes, I'm building on a fairly recent Linux distro. Can you build it yourself? The patch you need:

diff --git a/src/julia_threads.h b/src/julia_threads.h
index 5727083212..9832fa9ac4 100644
--- a/src/julia_threads.h
+++ b/src/julia_threads.h
@@ -45,10 +45,10 @@ typedef win32_ucontext_t jl_ucontext_t;
 #endif
 #if 0
 // very slow, but more debugging
-//#elif defined(_OS_DARWIN_)
-//#define JL_HAVE_UNW_CONTEXT
-//#elif defined(_OS_LINUX_)
-//#define JL_HAVE_UNW_CONTEXT
+#elif defined(_OS_DARWIN_)
+#define JL_HAVE_UNW_CONTEXT
+#elif defined(_OS_LINUX_)
+#define JL_HAVE_UNW_CONTEXT
 #elif defined(_OS_EMSCRIPTEN_)
 #define JL_HAVE_ASYNCIFY
 #elif !defined(JL_HAVE_ASM)

If not, I can have the Julia buildbots generate a build instead.

luraess · 2022-02-10T20:34:33Z

@maleadt I will give it a try now as well since it turns out that 1.7.0, 1.7.1 and 1.7.2 hang.

Using your debug Julia build, one encounters that GLIBC_2.32 is missing. I'll try to install it locally on my system that has GLIBC_2.31 and see how far I can get.

luraess · 2022-02-10T21:12:04Z

@maleadt getting:

LD_LIBRARY_PATH=/home/luraess/scratch/julia_tmp/glibc/glibc-2.32-install/lib ./julia
Segmentation fault (core dumped)

when running your Julia build using GLIBC_2.32 installed in my tmp.

anj00 · 2022-02-11T10:01:34Z

Ah yes, I'm building on a fairly recent Linux distro. Can you build it yourself?
....
If not, I can have the Julia buildbots generate a build instead.

Could you please make a build for Ubuntu 20.04.3? Who knows how long it will take to make whole build chain for julia running

maleadt · 2022-02-11T10:45:40Z

This build should work: https://s3.amazonaws.com/julialangnightlies/pretesting/linux/x64/1.7/julia-9bdd9302cd-linux64.tar.gz

anj00 · 2022-02-11T13:20:10Z

Thanks! Here is that I am getting. Hopefully followed the instructions correctly

Attaching to process 5038
[New LWP 5039]
[New LWP 5048]
[New LWP 5049]
[New LWP 5050]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f52535715ce in epoll_wait (epfd=3, events=0x7f523a5978c0, maxevents=1024, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory.
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f525322db80 (LWP 5038))]
#0  0x00007f52535715ce in epoll_wait (epfd=3, events=0x7f523a5978c0, maxevents=1024, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30      in ../sysdeps/unix/sysv/linux/epoll_wait.c
(gdb) print jl_live_tasks()->length
$1 = 3


(gdb) call jlbacktracet(jl_arrayref(jl_live_tasks(), 0))
jl_start_fiber_swap at /buildworker/worker/package_linux64/build/src/task.c:1064 [inlined]
ctx_switch at /buildworker/worker/package_linux64/build/src/task.c:465
jl_switch at /buildworker/worker/package_linux64/build/src/task.c:502
try_yieldto at ./task.jl:767
wait at ./task.jl:837
wait at ./condition.jl:123
wait at ./lock.jl:366
nonblocking_synchronize at /home/wls_user/.julia/packages/CUDA/bki2w/lib/cudadrv/stream.jl:162 [inlined]
#synchronize#12 at /home/wls_user/.julia/packages/CUDA/bki2w/lib/cudadrv/stream.jl:128
synchronize##kw at /home/wls_user/.julia/packages/CUDA/bki2w/lib/cudadrv/stream.jl:122 [inlined]
synchronize##kw at /home/wls_user/.julia/packages/CUDA/bki2w/lib/cudadrv/stream.jl:122 [inlined]
...
user code pointing to synchronize(blocking = false)
...
unknown function (ip: 0x7f523a0aa851)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
....
user code 
....
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:876
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
include_string at ./loading.jl:1196
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
_include at ./loading.jl:1253
include at ./Base.jl:418
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
exec_options at ./client.jl:292
_start at ./client.jl:495
jfptr__start_34903.clone_1 at /_julia_debug_install_/bin/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at /buildworker/worker/package_linux64/build/cli/loader_exe.c:42
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at bin/bin/julia (unknown line)


(gdb) call jlbacktracet(jl_arrayref(jl_live_tasks(), 1))
jl_unw_swapcontext at /buildworker/worker/package_linux64/build/src/task.c:958 [inlined]
jl_swap_fiber at /buildworker/worker/package_linux64/build/src/task.c:970
ctx_switch at /buildworker/worker/package_linux64/build/src/task.c:437
jl_switch at /buildworker/worker/package_linux64/build/src/task.c:502
try_yieldto at ./task.jl:767
wait at ./task.jl:837
wait at ./condition.jl:123
#134 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Distributed/src/remotecall.jl:281 [inlined]
lock at ./lock.jl:190
lock at ./condition.jl:78 [inlined]
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Distributed/src/remotecall.jl:279 [inlined]
#133 at ./threadingconstructs.jl:178
jfptr_YY.133_50291.clone_1 at //_julia_debug_install_/bin/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:877


(gdb) call jlbacktracet(jl_arrayref(jl_live_tasks(), 2))
jl_rec_backtrace at /buildworker/worker/package_linux64/build/src/stackwalk.c:700 [inlined]
jlbacktracet at /buildworker/worker/package_linux64/build/src/stackwalk.c:770
unknown function (ip: 0x7f523a59775e)

maleadt · 2022-02-11T14:03:06Z

Thanks for the backtraces! They don't reveal anything though, or at least they reveal that the problem is strictly with nonblocking_synchronization (and not a complicated deadlock between different tasks). #1366 made me realize how such a hang can occur though, so could you try #1369?

anj00 · 2022-02-12T00:42:31Z

I have run 4M loops with CUDA from tb/async_errors and it is running ok. So looks very promising.

maleadt · 2022-02-14T11:01:41Z

With that and #1369 (comment) I hope we can close this. Please re-open if the issue remains.

roflmaostc · 2022-02-14T11:03:33Z

Thanks!
Right now (not master version) it just got stuck again. Still I only observe it inside Jupyter notebooks.
I report back whether it is better with master.

anj00 added the bug Something isn't working label Feb 4, 2022

guyvdbroeck mentioned this issue Feb 4, 2022

unsupported call through a literal pointer (call to log1pf) on Julia 1.6.5 #1352

Closed

maleadt mentioned this issue Feb 11, 2022

Make nonblocking synchronization robust to errors. #1369

Merged

maleadt mentioned this issue Feb 14, 2022

Make the default pool visible when doing P2P #1357

Merged

maleadt closed this as completed Feb 14, 2022

lmh91 mentioned this issue Feb 28, 2022

wait(kernel(...) performs much worse on Julia v1.7 JuliaGPU/KernelAbstractions.jl#290

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synchronize(blocking = false) hangs in julia 1.7 eventually #1350

synchronize(blocking = false) hangs in julia 1.7 eventually #1350

anj00 commented Feb 4, 2022 •

edited

Loading

guyvdbroeck commented Feb 4, 2022 •

edited

Loading

anj00 commented Feb 6, 2022

maleadt commented Feb 7, 2022

guyvdbroeck commented Feb 7, 2022

maleadt commented Feb 8, 2022

guyvdbroeck commented Feb 8, 2022

maleadt commented Feb 8, 2022

roflmaostc commented Feb 8, 2022

maleadt commented Feb 8, 2022

guyvdbroeck commented Feb 8, 2022

roflmaostc commented Feb 8, 2022

roflmaostc commented Feb 9, 2022

anj00 commented Feb 9, 2022

maleadt commented Feb 9, 2022

anj00 commented Feb 9, 2022

maleadt commented Feb 10, 2022

anj00 commented Feb 10, 2022

maleadt commented Feb 10, 2022

anj00 commented Feb 10, 2022

maleadt commented Feb 10, 2022

luraess commented Feb 10, 2022

anj00 commented Feb 10, 2022

maleadt commented Feb 10, 2022

luraess commented Feb 10, 2022

luraess commented Feb 10, 2022 •

edited

Loading

anj00 commented Feb 11, 2022

maleadt commented Feb 11, 2022

anj00 commented Feb 11, 2022 •

edited

Loading

maleadt commented Feb 11, 2022

anj00 commented Feb 12, 2022

maleadt commented Feb 14, 2022

roflmaostc commented Feb 14, 2022

synchronize(blocking = false) hangs in julia 1.7 eventually #1350

synchronize(blocking = false) hangs in julia 1.7 eventually #1350

Comments

anj00 commented Feb 4, 2022 • edited Loading

guyvdbroeck commented Feb 4, 2022 • edited Loading

anj00 commented Feb 6, 2022

maleadt commented Feb 7, 2022

guyvdbroeck commented Feb 7, 2022

maleadt commented Feb 8, 2022

guyvdbroeck commented Feb 8, 2022

maleadt commented Feb 8, 2022

roflmaostc commented Feb 8, 2022

maleadt commented Feb 8, 2022

guyvdbroeck commented Feb 8, 2022

roflmaostc commented Feb 8, 2022

roflmaostc commented Feb 9, 2022

anj00 commented Feb 9, 2022

maleadt commented Feb 9, 2022

anj00 commented Feb 9, 2022

maleadt commented Feb 10, 2022

anj00 commented Feb 10, 2022

maleadt commented Feb 10, 2022

anj00 commented Feb 10, 2022

maleadt commented Feb 10, 2022

luraess commented Feb 10, 2022

anj00 commented Feb 10, 2022

maleadt commented Feb 10, 2022

luraess commented Feb 10, 2022

luraess commented Feb 10, 2022 • edited Loading

anj00 commented Feb 11, 2022

maleadt commented Feb 11, 2022

anj00 commented Feb 11, 2022 • edited Loading

maleadt commented Feb 11, 2022

anj00 commented Feb 12, 2022

maleadt commented Feb 14, 2022

roflmaostc commented Feb 14, 2022

anj00 commented Feb 4, 2022 •

edited

Loading

guyvdbroeck commented Feb 4, 2022 •

edited

Loading

luraess commented Feb 10, 2022 •

edited

Loading

anj00 commented Feb 11, 2022 •

edited

Loading