-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk caching; mixed compute config; single machine #23
Comments
Ah it looks like a gc finalizer issue again, will need to catch the process with gdb |
Another one ┌ Error: Fatal error on process 1
│ exception =
│ ArgumentError: Cannot serialize a Thunk
│ Stacktrace:
│ [1] serialize(io::Distributed.ClusterSerializer{Sockets.TCPSocket}, t::Dagger.Thunk)
│ @ Dagger ~/.julia/packages/Dagger/vNUsP/src/thunk.jl:96
│ [2] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
│ @ Serialization ~/julia/usr/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:675
│ [3] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
│ @ Serialization ~/julia/usr/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:654
│ [4] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
│ @ Serialization ~/julia/usr/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:675
│ [5] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
│ @ Serialization ~/julia/usr/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:654
│ [6] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
│ @ Serialization ~/julia/usr/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:675
│ [7] serialize
│ @ ~/julia/usr/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:654 [inlined]
│ [8] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, t::Tuple{Bool, Dagger.ThunkFailedException{Dagger.ThunkFailedException{RemoteException}}})
│ @ Serialization ~/julia/usr/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:205
│ [9] serialize_msg(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, o::Distributed.ResultMsg)
│ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/messages.jl:78
│ [10] #invokelatest#2
│ @ ./essentials.jl:729 [inlined]
│ [11] invokelatest
│ @ ./essentials.jl:726 [inlined]
│ [12] send_msg_(w::Distributed.Worker, header::Distributed.MsgHeader, msg::Distributed.ResultMsg, now::Bool)
│ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/messages.jl:181
│ [13] send_msg_now
│ @ ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/messages.jl:118 [inlined]
│ [14] send_msg_now(s::Sockets.TCPSocket, header::Distributed.MsgHeader, msg::Distributed.ResultMsg)
│ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/messages.jl:113
│ [15] deliver_result(sock::Sockets.TCPSocket, msg::Symbol, oid::Distributed.RRID, value::Tuple{Bool, Dagger.ThunkFailedException{Dagger.ThunkFailedException{RemoteException}}})
│ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:102
│ [16] macro expansion
│ @ ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:293 [inlined]
│ [17] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
│ @ Distributed ./task.jl:484
└ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:106
Worker 3 terminated.┌ Warning: Worker 3 died, rescheduling work
└ @ Dagger.Sch ~/.julia/packages/Dagger/vNUsP/src/sch/Sch.jl:492
Unhandled Task ERROR: EOFError: read end of file
Stacktrace:
[1] (::Base.var"#wait_locked#680")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base ./stream.jl:944
[2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base ./stream.jl:953
[3] unsafe_read
@ ./io.jl:759 [inlined]
[4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base ./io.jl:758
[5] read!
@ ./io.jl:760 [inlined]
[6] deserialize_hdr_raw
@ ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/messages.jl:167 [inlined]
[7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:172
[8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
[9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed ./task.jl:484
From worker 3: ┌ Error: Fatal error on process 3
From worker 3: │ exception =
From worker 3: │ EOFError: read end of file
From worker 3: │ Stacktrace:
From worker 3: │ [1] (::Base.var"#wait_locked#680")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
From worker 3: │ @ Base ./stream.jl:945
From worker 3: │ [2] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
From worker 3: │ @ Base ./stream.jl:953
From worker 3: │ [3] unsafe_read
From worker 3: │ @ ./io.jl:759 [inlined]
From worker 3: │ [4] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
From worker 3: │ @ Base ./io.jl:758
From worker 3: │ [5] read!
From worker 3: │ @ ./io.jl:760 [inlined]
From worker 3: │ [6] deserialize_hdr_raw
From worker 3: │ @ ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/messages.jl:167 [inlined]
From worker 3: │ [7] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
From worker 3: │ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:172
From worker 3: │ [8] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
From worker 3: │ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
From worker 3: │ [9] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
From worker 3: │ @ Distributed ./task.jl:484
From worker 3: └ @ Distributed ~/julia/usr/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:229
From worker 2:
From worker 2: signal (15): Terminated
From worker 2: in expression starting at none:0
From worker 2: unknown function (ip: 0x7f30f2c89117)
From worker 2: pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
From worker 2: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
From worker 2: ijl_task_get_next at /home/krynju/julia/src/partr.c:596
From worker 2: poptask at ./task.jl:921
From worker 2: wait at ./task.jl:930
From worker 2: task_done_hook at ./task.jl:634
From worker 2: jfptr_task_done_hook_26224 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
From worker 2: jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
From worker 2: jl_finish_task at /home/krynju/julia/src/task.c:254
From worker 2: start_task at /home/krynju/julia/src/task.c:942
From worker 2: unknown function (ip: (nil))
From worker 2: unknown function (ip: 0x7f30f2c89117)
From worker 2: pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
From worker 2: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
From worker 2: ijl_task_get_next at /home/krynju/julia/src/partr.c:596
From worker 2: poptask at ./task.jl:921
From worker 2: wait at ./task.jl:930
From worker 2: task_done_hook at ./task.jl:634
From worker 2: jfptr_task_done_hook_26224 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
From worker 2: jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
From worker 2: jl_finish_task at /home/krynju/julia/src/task.c:254
From worker 2: start_task at /home/krynju/julia/src/task.c:942
From worker 2: unknown function (ip: (nil))
From worker 2: unknown function (ip: 0x7f30f2c89117)
From worker 2: pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
From worker 2: uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
From worker 2: ijl_task_get_next at /home/krynju/julia/src/partr.c:596
From worker 2: poptask at ./task.jl:921
From worker 2: wait at ./task.jl:930
From worker 2: task_done_hook at ./task.jl:634
From worker 2: jfptr_task_done_hook_26224 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
From worker 2: jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
From worker 2: jl_finish_task at /home/krynju/julia/src/task.c:254
From worker 2: start_task at /home/krynju/julia/src/task.c:942
From worker 2: unknown function (ip: (nil))
From worker 2: epoll_wait at /usr/lib/libc.so.6 (unknown line)
From worker 2: uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:236
From worker 2: uv_run at /workspace/srcdir/libuv/src/unix/core.c:400
From worker 2: ijl_task_get_next at /home/krynju/julia/src/partr.c:565
From worker 2: poptask at ./task.jl:921
From worker 2: wait at ./task.jl:930
From worker 2: task_done_hook at ./task.jl:634
From worker 2: jfptr_task_done_hook_26224 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
From worker 2: jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
From worker 2: jl_finish_task at /home/krynju/julia/src/task.c:254
From worker 2: start_task at /home/krynju/julia/src/task.c:942
From worker 2: unknown function (ip: (nil))
From worker 2: Allocations: 26411673 (Pool: 26402943; Big: 8730); GC: 31
signal (15): Terminated
in expression starting at /home/krynju/Downloads/bench1/dtables_caching_test_tuple.jl:29
unknown function (ip: 0x7f4507689117)
pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /home/krynju/julia/src/partr.c:596
poptask at ./task.jl:921
wait at ./task.jl:930
task_done_hook at ./task.jl:634
jfptr_task_done_hook_26224 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
jl_finish_task at /home/krynju/julia/src/task.c:254
start_task at /home/krynju/julia/src/task.c:942
unknown function (ip: (nil))
unknown function (ip: 0x7f4507689117)
pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /home/krynju/julia/src/partr.c:596
poptask at ./task.jl:921
wait at ./task.jl:930
task_done_hook at ./task.jl:634
jfptr_task_done_hook_26224 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
jl_finish_task at /home/krynju/julia/src/task.c:254
start_task at /home/krynju/julia/src/task.c:942
unknown function (ip: (nil))
unknown function (ip: 0x7f4507689117)
pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /home/krynju/julia/src/partr.c:596
poptask at ./task.jl:921
wait at ./task.jl:930
task_done_hook at ./task.jl:634
jfptr_task_done_hook_26224 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
jl_finish_task at /home/krynju/julia/src/task.c:254
start_task at /home/krynju/julia/src/task.c:942
unknown function (ip: (nil))
epoll_wait at /usr/lib/libc.so.6 (unknown line)
uv__io_poll at /workspace/srcdir/libuv/src/unix/epoll.c:236
uv_run at /workspace/srcdir/libuv/src/unix/core.c:400
ijl_task_get_next at /home/krynju/julia/src/partr.c:565
poptask at ./task.jl:921
wait at ./task.jl:930
wait at ./condition.jl:124
#readuntil#681 at ./stream.jl:1012
readuntil##kw at ./stream.jl:996 [inlined]
#readline#397 at ./io.jl:543
readline at ./io.jl:542 [inlined]
macro expansion at /home/krynju/julia/usr/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:283 [inlined]
#37 at ./task.jl:484
unknown function (ip: 0x7f44f0180fdf)
jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
start_task at /home/krynju/julia/src/task.c:931
unknown function (ip: (nil))
Allocations: 51978913 (Pool: 51960682; Big: 18231); GC: 20
schedule: Task not runnable
atexit hook threw an error: ErrorException("task switch not allowed from inside gc finalizer")
ijl_error at /home/krynju/julia/src/rtutils.c:41
ijl_switch at /home/krynju/julia/src/task.c:530
try_yieldto at ./task.jl:861
wait at ./task.jl:931
uv_write at ./stream.jl:1046
unsafe_write at ./stream.jl:1118
write at ./strings/io.jl:244 [inlined]
print at ./strings/io.jl:246
jfptr_print_45001 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
showerror at ./errorshow.jl:144
unknown function (ip: 0x7f44df7693f1)
_atexit at ./initdefs.jl:374
jfptr__atexit_48368 at /home/krynju/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/krynju/julia/src/julia.h:1843 [inlined]
ijl_atexit_hook at /home/krynju/julia/src/init.c:219
ijl_exit at /home/krynju/julia/src/jl_uv.c:640
jl_exit_thread0_cb at /home/krynju/julia/src/signals-unix.c:428 |
Lead: Seems like the concurrency violation appearing often happens on |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Failure on a simple DTable create and reduce with caching on (without caching it doesn't appear)
Processes: 3
Threads: 2
Caching: on
Platform: linux, but observed elsewhere (mac & windows)
Julia 1.8.5; Dagger 0.16.3; DTables 0.2.1
Notes:
Process stuck after this error appeared
Appears randomly - not easily reproducible
The text was updated successfully, but these errors were encountered: