Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task cannot be serialized error during precompilation disappeared in 1.9 #49513

Closed
KristofferC opened this issue Apr 26, 2023 · 9 comments
Closed
Labels
packages Package management and loading regression Regression in behavior compared to a previous version

Comments

@KristofferC
Copy link
Member

KristofferC commented Apr 26, 2023

Doing the following in the Downloads.jl package (JuliaLang/Downloads.jl#224):

  • add Downloads.download("file://" * @__FILE__) to src/Downloads.jl
  • load the package (remember to change its uuid since it is a stdlib)

causes the following on 1.8:

julia> using Downloads
[ Info: Precompiling Downloads [f43b241f-c20a-4ad4-852c-f6b1247861c6]
fatal: error thrown and no exception handler available.
ErrorException("Task cannot be serialized")
ijl_error at /Users/kristoffercarlsson/.julia/juliaup/julia-1.8.5+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.8.dylib (unknown line)
...
jl_write_compiler_output at /Users/kristoffercarlsson/.julia/juliaup/julia-1.8.5+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.8.dylib (unknown line)
ijl_atexit_hook at /Users/kristoffercarlsson/.julia/juliaup/julia-1.8.5+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.8.dylib (unknown line)
jl_repl_entrypoint at /Users/kristoffercarlsson/.julia/juliaup/julia-1.8.5+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.8.dylib (unknown line)
ERROR: Failed to precompile Downloads [f43b241f-c20a-4ad4-852c-f6b1247861c6] to /Users/kristoffercarlsson/.julia/compiled/v1.8/Downloads/jl_vSquOA.

causing the precompilation to fail.

On 1.9 on the other hand, a precompile file is emitted but it segfaults when Julia tries to load it:

julia> using Downloads

[42067] signal (11.2): Segmentation fault: 11
in expression starting at REPL[1]:1
get_item_for_reloc at /Users/kristoffercarlsson/.julia/juliaup/julia-1.9.0-rc2+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_restore_system_image_from_stream_ at /Users/kristoffercarlsson/.julia/juliaup/julia-1.9.0-rc2+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
jl_restore_package_image_from_stream at /Users/kristoffercarlsson/.julia/juliaup/julia-1.9.0-rc2+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
ijl_restore_package_image_from_file at /Users/kristoffercarlsson/.julia/juliaup/julia-1.9.0-rc2+0.aarch64.apple.darwin14/lib/julia/libjulia-internal.1.9.dylib (unknown line)
_include_from_serialized at ./loading.jl:1011
_require_search_from_serialized at ./loading.jl:1469
_require at ./loading.jl:1750
_require_prelocked at ./loading.jl:1623
macro expansion at ./loading.jl:1611 [inlined]
macro expansion at ./lock.jl:267 [inlined]
require at ./loading.jl:1574

During the precompilation of Downloads there is also some debug output (on 1.9) that I don't know where it comes from:

julia> using Downloads
[ Info: Precompiling Downloads [f43b241f-c20a-4ad4-852c-f6b1247861c6]
Task
Task(next=nothing, queue=Base.IntrusiveLinkedList{Task}(head=<circular reference @-2>, tail=<circular reference @-2>), storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=Base.var"#702#703"{Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}, Base.Timer}(cb=Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}(multi=Downloads.Curl.Multi(lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0)), handle=0x0000000107e0df00, timer=Base.Timer(handle=0x0000600001935480, cond=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=<circular reference @-5>, tail=<circular reference @-5>), lock=Base.Threads.SpinLock(owned=0)), isopen=true, set=false), easies=Array{Downloads.Curl.Easy, (0,)}[], grace=0x0000000000007530)), timer=Base.Timer(handle=0x0000600001935480, cond=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=<circular reference @-4>, tail=<circular reference @-4>), lock=Base.Threads.SpinLock(owned=0)), isopen=true, set=false)), rngState0=0x72e8fe5dbec1fcb8, rngState1=0x1ad4b9517c7908c8, rngState2=0x1f47f16f23eccf1b, rngState3=0x0c14860e7434ccd2, _state=0x00, sticky=true, _isexception=false, priority=0x0000)
Task
Task(next=nothing, queue=Base.IntrusiveLinkedList{Task}(head=<circular reference @-2>, tail=<circular reference @-2>), storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=Base.var"#702#703"{Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}, Base.Timer}(cb=Downloads.Curl.var"#31#35"{Downloads.Curl.Multi}(multi=Downloads.Curl.Multi(lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0)), handle=0x0000000107e0df00, timer=Base.Timer(handle=0x0000600001935480, cond=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=<circular reference @-5>, tail=<circular reference @-5>), lock=Base.Threads.SpinLock(owned=0)), isopen=true, set=false), easies=Array{Downloads.Curl.Easy, (0,)}[], grace=0x0000000000007530)), timer=Base.Timer(handle=0x0000600001935480, cond=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=<circular reference @-4>, tail=<circular reference @-4>), lock=Base.Threads.SpinLock(owned=0)), isopen=true, set=false)), rngState0=0x72e8fe5dbec1fcb8, rngState1=0x1ad4b9517c7908c8, rngState2=0x1f47f16f23eccf1b, rngState3=0x0c14860e7434ccd2, _state=0x00, sticky=true, _isexception=false, priority=0x0000)
@KristofferC KristofferC added packages Package management and loading regression Regression in behavior compared to a previous version labels Apr 26, 2023
@KristofferC
Copy link
Member Author

This seems to work on 1.10 with the precompilation system waiting on the task:

┌ Downloads [f43b241f-c20a-4ad4-852c-f6b1247861c6]
│  
│  [pid 46790] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x6000008b2880->0x11577ca30
│  
│  [pid 46790] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA
│   timer              @0x6000008b2880->0x11577ca30
│  
│  [pid 46790] waiting for IO to finish:
│   TYPE[FD/PID]       @UV_HANDLE_T->DATA

@vtjnash
Copy link
Member

vtjnash commented Apr 26, 2023

Now you know why we made that addition/change.

@KristofferC
Copy link
Member Author

KristofferC commented Apr 26, 2023

Both 1.8 and 1.10 behavior is fine by me (with 1.10 being better of course). 1.9 behavior seems bad.

@NHDaly
Copy link
Member

NHDaly commented Oct 26, 2023

👀 I tried adding some PkgImage precompilation to our main application (RAICode) at RAI, and I saw something very similar.
It seemed to finish going through all the commands in the snoop file, but then it logged something like:

Task
Task(next=nothing, queue=Base.IntrusiveLinkedList{Task}(head=<circular reference @-2>, tail=<circular reference @-2>), storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=HTTP.Servers.var"#12#15"{typeof(HTTP.Servers.TRUE), Int64, Int64, Nothing, Int64, HTTP.Handlers.var"#1#2"{HTTP.Handlers.Router{typeof(HTTP.Handlers.default404), typeof(HTTP.Handlers.default405), Nothing}}, HTTP.Servers.Listener{Nothing, Sockets.TCPServer}, Base.Event, Base.Set{HTTP.Connections.Connection{IO_t} where IO_t<:IO}}(tcpisvalid=typeof(HTTP.Servers.TRUE)(), max_connections=9223372036854775807, readtimeout=0, access_log=nothing, verbose=-1, f=HTTP.Handlers.var"#1#2"{HTTP.Handlers.Router{typeof(HTTP.Handlers.default404), typeof(HTTP.Handlers.default405), Nothing}}(handler=HTTP.Handlers.Router{typeof(HTTP.Handlers.default404), typeof(HTTP.Handlers.default405), Nothing}(_404=typeof(HTTP.Handlers.default404)(), _405=typeof(HTTP.Handlers.default405)(), routes=HTTP.Handlers.Node(segment="*", exact=Array{HTTP.Handlers.Node, (7,)}[
...

and just kept logging and logging for hours and never returns. 😢

I'm not following from the discussion above - do we know what the cause of this is? Is there a way to fix it? Do we have to wait for 1.10 to do precompilation with PkgImages? That would be 😭.

@NHDaly
Copy link
Member

NHDaly commented Oct 26, 2023

🤔 okay on 1.10 i'm also seeing the same kind of log messages, but i have to say that these are mysterious and don't really help me understand what's happening:

[pid 35160] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 tcp[15]            @0x600001028b40->0x2a878b100
 timer              @0x6000cc02dc80->0x4603acc70

...

[pid 35160] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 tcp[15]            @0x600001028b40->0x2a878b100
 timer              @0x6000182c5f00->0x45ea43ee0

[pid 35160] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 tcp[15]            @0x600001028b40->0x2a878b100
 timer              @0x6000cc1be400->0x45ea43e80

  Progress [>                                        ]  0/1
  ◐ RAICode Waiting for background task / IO / timer. Interrupt to inspect

Does this mean that the module is somehow pointing to a running Task object, and that's preventing the precompilation from shutting down? Or that some task(s) are still running, and precompile won't finish until they've all shut down? Is that something new in 1.9 or 1.10?

@NHDaly
Copy link
Member

NHDaly commented Oct 26, 2023

... Okay so by attaching with LLDB and printing out those addresses, i was able to surmise that yes, this means there's a Task that's still running. It turns out our snoop script started up some background tasks for reporting out metrics and one of those was never shut down.

Some thoughts:

  1. I do not think that the current log message is sufficient to explain what's going on. It should more explicitly say something like "Package will not finish until all background tasks have finished." or something like that?
    • Interrupt to inspect didn't print any extra information
    • Is there any way we can make those pointers more helpful?
  2. Why do we wait for all the tasks to finish anyway? PackageCompiler definitely didn't do that. Do we need to be doing that?

@NHDaly
Copy link
Member

NHDaly commented Oct 26, 2023

But also also, i see that it's waiting on a Timer, yet when i print the timer in LLDB, it says isopen=false. So why is it still waiting on it?

[pid 96910] waiting for IO to finish:
 TYPE[FD/PID]       @UV_HANDLE_T->DATA
 timer              @0x6000c81dc980->0x3ab0e0fd0

Base.Timer(handle=0x0000000000000000, cond=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), isopen=false, set=false)
Array{Task, (0,)}[]

I also printed the cond variable's queue and there's nothing in there waiting...

I tried collecting task backtraces, and the only things i see waiting are builtin tasks on profiling and distributed, and the main task which is waiting as part of ijl_write_compiler_output - presumably waiting for the background tasks to end? But there aren't any!

thread (1) ++++ Task backtraces
thread (1) ==== Thread 1 created 4 live tasks
thread (1)     ---- Root task (0x10a09c010)
thread (1)          (sticky: 1, started: 1, state: 0, tid: 1)
thread (1) jl_swap_fiber at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:1346
thread (1) ctx_switch at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:571
thread (1) ijl_switch at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:650
thread (1) try_yieldto at ./task.jl:921
thread (1) wait at ./task.jl:995
thread (1) jfptr_wait_75012 at /Users/nathandaly/builds/julia-1.10+RAI/usr/lib/julia/sys.dylib (unknown line)
thread (1) _jl_invoke at /Users/nathandaly/builds/julia-1.10+RAI/src/gf.c:0 [inlined]
thread (1) ijl_apply_generic at /Users/nathandaly/builds/julia-1.10+RAI/src/gf.c:3074
thread (1) jl_task_wait_empty at /Users/nathandaly/builds/julia-1.10+RAI/src/partr.c:357
thread (1) ijl_write_compiler_output at /Users/nathandaly/builds/julia-1.10+RAI/src/precompile.c:78
thread (1) ijl_atexit_hook at /Users/nathandaly/builds/julia-1.10+RAI/src/init.c:251
thread (1) jl_repl_entrypoint at /Users/nathandaly/builds/julia-1.10+RAI/src/jlapi.c:732
thread (1)     ---- End root task
thread (1)     ---- Task 1 (0x10a09c4c0)
thread (1)          (sticky: 0, started: 1, state: 0, tid: 0)
thread (1) jl_swap_fiber at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:1346
thread (1) ctx_switch at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:571
thread (1) ijl_switch at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:650
thread (1) try_yieldto at ./task.jl:921
thread (1) wait at ./task.jl:995
thread (1) #wait#645 at ./condition.jl:130
thread (1) wait at ./condition.jl:125 [inlined]
thread (1) _trywait at ./asyncevent.jl:138
thread (1) wait at ./asyncevent.jl:155 [inlined]
thread (1) profile_printing_listener at ./Base.jl:566
thread (1) #1055 at ./Base.jl:602
thread (1) jfptr_YY.1055_85336 at /Users/nathandaly/builds/julia-1.10+RAI/usr/lib/julia/sys.dylib (unknown line)
thread (1) _jl_invoke at /Users/nathandaly/builds/julia-1.10+RAI/src/gf.c:0 [inlined]
thread (1) ijl_apply_generic at /Users/nathandaly/builds/julia-1.10+RAI/src/gf.c:3074
thread (1) jl_apply at /Users/nathandaly/builds/julia-1.10+RAI/src/./julia.h:1981 [inlined]
thread (1) start_task at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:1238
thread (1)     ---- End task 1
thread (1)     ---- Task 2 (0x10b7387e0)
thread (1)          (sticky: 0, started: 1, state: 0, tid: 0)
thread (1) jl_start_fiber_swap at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:1433
thread (1) ctx_switch at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:617
thread (1) ijl_switch at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:650
thread (1) try_yieldto at ./task.jl:921
thread (1) wait at ./task.jl:995
thread (1) #wait#645 at ./condition.jl:130
thread (1) wait at ./condition.jl:125 [inlined]
thread (1) #138 at /Users/nathandaly/builds/julia-1.10+RAI/usr/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:281 [inlined]
thread (1) lock at ./lock.jl:229
thread (1) lock at ./condition.jl:78 [inlined]
thread (1) #137 at /Users/nathandaly/builds/julia-1.10+RAI/usr/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:279
thread (1) unknown function (ip: 0x151a28093)
thread (1) _jl_invoke at /Users/nathandaly/builds/julia-1.10+RAI/src/gf.c:0 [inlined]
thread (1) ijl_apply_generic at /Users/nathandaly/builds/julia-1.10+RAI/src/gf.c:3074
thread (1) jl_apply at /Users/nathandaly/builds/julia-1.10+RAI/src/./julia.h:1981 [inlined]
thread (1) start_task at /Users/nathandaly/builds/julia-1.10+RAI/src/task.c:1238
thread (1)     ---- End task 2
thread (1)     ---- Task 3 (0x324675910)
thread (1)          (sticky: 0, started: 1, state: 0, tid: 1)
thread (1) jl_rec_backtrace at /Users/nathandaly/builds/julia-1.10+RAI/src/stackwalk.c:876 [inlined]
thread (1) jlbacktracet at /Users/nathandaly/builds/julia-1.10+RAI/src/stackwalk.c:1148
thread (1) jl_print_task_backtraces at /Users/nathandaly/builds/julia-1.10+RAI/src/stackwalk.c:1216
thread (1)     ---- End task 3
thread (1) ==== End thread 1
thread (1) ++++ Done

Anything obvious that i'm doing wrong? thanks

@NHDaly
Copy link
Member

NHDaly commented Oct 26, 2023

Oh, i am also only just noticing that the pointers for the timers are different in each log message. So does this mean something is creating Timers and then they're being invoked over and over?

@vtjnash vtjnash closed this as completed Oct 27, 2023
@vtjnash
Copy link
Member

vtjnash commented Oct 27, 2023

Addressed by #51895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packages Package management and loading regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

3 participants