Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompilation can't handle serialization of Task's enqued in Channels #52435

Closed
vchuravy opened this issue Dec 7, 2023 · 5 comments · Fixed by #52445
Closed

Precompilation can't handle serialization of Task's enqued in Channels #52435

vchuravy opened this issue Dec 7, 2023 · 5 comments · Fixed by #52445
Labels
compiler:precompilation Precompilation of modules regression Regression in behavior compared to a previous version

Comments

@vchuravy
Copy link
Member

vchuravy commented Dec 7, 2023

Splitting this out into a separate issue.

From @Liozou

On #52405 I have a different crash in a different context: ]generate a new package, for instance let's call it PrecompileCrash, and put the following in your src/PrecompileCrash.jl

module PrecompileCrash

const channel = Channel{Nothing}(0)
Base.Threads.@spawn take!($channel)

end # module PrecompileCrash

Then try to execute
julia -t1 --startup-file=no -e "using InteractiveUtils; versioninfo(); using PrecompileCrash"

I get:

Julia Version 1.11.0-DEV.1026
Commit 96c4164145 (2023-12-05 18:18 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 4

[22573] signal 6 (-6): Aborted
in expression starting at none:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
get_item_for_reloc at /LionelSSDext4/liozou/julia/src/staticdata.c:1905 [inlined]
jl_read_reloclist at /LionelSSDext4/liozou/julia/src/staticdata.c:1981
jl_restore_system_image_from_stream_ at /LionelSSDext4/liozou/julia/src/staticdata.c:3159
jl_restore_package_image_from_stream at /LionelSSDext4/liozou/julia/src/staticdata.c:3580
jl_restore_incremental_from_buf at /LionelSSDext4/liozou/julia/src/staticdata.c:3627
ijl_restore_package_image_from_file at /LionelSSDext4/liozou/julia/src/staticdata.c:3715
_include_from_serialized at ./loading.jl:1073
_include_from_serialized at ./loading.jl:1048 [inlined]
_require_search_from_serialized at ./loading.jl:1626
_require at ./loading.jl:2039
__require_prelocked at ./loading.jl:1916
jfptr___require_prelocked_63238 at /LionelSSDext4/liozou/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /LionelSSDext4/liozou/julia/src/julia.h:2142 [inlined]
jl_f__call_in_world at /LionelSSDext4/liozou/julia/src/builtins.c:888
#invoke_in_world#3 at ./essentials.jl:989 [inlined]
invoke_in_world at ./essentials.jl:986 [inlined]
_require_prelocked at ./loading.jl:1907
macro expansion at ./loading.jl:1845 [inlined]
macro expansion at ./lock.jl:267 [inlined]
__require at ./loading.jl:1806
jfptr___require_63170 at /LionelSSDext4/liozou/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /LionelSSDext4/liozou/julia/src/julia.h:2142 [inlined]
jl_f__call_in_world at /LionelSSDext4/liozou/julia/src/builtins.c:888
#invoke_in_world#3 at ./essentials.jl:989 [inlined]
invoke_in_world at ./essentials.jl:986 [inlined]
require at ./loading.jl:1799
jfptr_require_63167 at /LionelSSDext4/liozou/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /LionelSSDext4/liozou/julia/src/julia.h:2142 [inlined]
call_require at /LionelSSDext4/liozou/julia/src/toplevel.c:484 [inlined]
eval_import_path at /LionelSSDext4/liozou/julia/src/toplevel.c:521
jl_toplevel_eval_flex at /LionelSSDext4/liozou/julia/src/toplevel.c:757
jl_toplevel_eval_flex at /LionelSSDext4/liozou/julia/src/toplevel.c:884
jl_toplevel_eval_flex at /LionelSSDext4/liozou/julia/src/toplevel.c:884
ijl_toplevel_eval_in at /LionelSSDext4/liozou/julia/src/toplevel.c:992
eval at ./boot.jl:428 [inlined]
exec_options at ./client.jl:291
_start at ./client.jl:525
jfptr__start_64763 at /LionelSSDext4/liozou/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /LionelSSDext4/liozou/julia/src/julia.h:2142 [inlined]
true_main at /LionelSSDext4/liozou/julia/src/jlapi.c:586
jl_repl_entrypoint at /LionelSSDext4/liozou/julia/src/jlapi.c:738
main at /LionelSSDext4/liozou/julia/cli/loader_exe.c:58
unknown function (ip: 0x7f3e53382d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/liozou/julia/usr/bin/julia (unknown line)
Allocations: 757403 (Pool: 757387; Big: 16); GC: 1
Aborted (core dumped)

Originally posted by @Liozou in #52363 (comment)

@vchuravy
Copy link
Member Author

vchuravy commented Dec 7, 2023

I see on 1.10-rc1

JULIA_LOAD_PATH=".:" julia -e "using PrecompileCrash"
The latest version of Julia in the `alpha` channel is 1.10.0-rc2+0.x64.linux.gnu. You currently have `1.10.0-rc1+0.x64.linux.gnu` installed. Run:

  juliaup update

to install Julia 1.10.0-rc2+0.x64.linux.gnu and update the `alpha` channel to that version.
Task
Task(next=nothing, queue=Base.IntrusiveLinkedList{Task}(head=<circular reference @-2>, tail=<circular reference @-2>), storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=PrecompileCrash.var"#1#2"{Base.Channel{Nothing}}(##225=Base.Channel{Nothing}(cond_take=Base.GenericCondition{Base.ReentrantLock}(waitq=Base.IntrusiveLinkedList{Task}(head=<circular reference @-4>, tail=<circular reference @-4>), lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0))), cond_wait=Base.GenericCondition{Base.ReentrantLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0))), cond_put=Base.GenericCondition{Base.ReentrantLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0))), state=:open, excp=nothing, data=Array{Nothing, (0,)}[], n_avail_items=0, sz_max=0)), rngState0=0x99bfd8c11cbdd8cd, rngState1=0x9f537b17612fecd9, rngState2=0x2026c34eb687a51c, rngState3=0x0746338c957527f2, rngState4=0x01bfba426c0c2e0e, _state=0x00, sticky=false, _isexception=false, priority=0x0000)
Task
Task(next=nothing, queue=Base.IntrusiveLinkedList{Task}(head=<circular reference @-2>, tail=<circular reference @-2>), storage=nothing, donenotify=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), result=nothing, logstate=nothing, code=PrecompileCrash.var"#1#2"{Base.Channel{Nothing}}(##225=Base.Channel{Nothing}(cond_take=Base.GenericCondition{Base.ReentrantLock}(waitq=Base.IntrusiveLinkedList{Task}(head=<circular reference @-4>, tail=<circular reference @-4>), lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0))), cond_wait=Base.GenericCondition{Base.ReentrantLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0))), cond_put=Base.GenericCondition{Base.ReentrantLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.ReentrantLock(locked_by=nothing, reentrancy_cnt=0x00000000, havelock=0x00, cond_wait=Base.GenericCondition{Base.Threads.SpinLock}(waitq=Base.IntrusiveLinkedList{Task}(head=nothing, tail=nothing), lock=Base.Threads.SpinLock(owned=0)), _=(0, 0, 0))), state=:open, excp=nothing, data=Array{Nothing, (0,)}[], n_avail_items=0, sz_max=0)), rngState0=0x99bfd8c11cbdd8cd, rngState1=0x9f537b17612fecd9, rngState2=0x2026c34eb687a51c, rngState3=0x0746338c957527f2, rngState4=0x01bfba426c0c2e0e, _state=0x00, sticky=false, _isexception=false, priority=0x0000)

[425433] signal (6.-6): Aborted
in expression starting at none:1
unknown function (ip: 0x7f8ec415c83c)
raise at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
get_item_for_reloc at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-10/src/staticdata.c:1797 [inlined]
jl_read_reloclist at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-10/src/staticdata.c:1873
jl_restore_system_image_from_stream_ at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-10/src/staticdata.c:2995
jl_restore_package_image_from_stream at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-10/src/staticdata.c:3417
jl_restore_incremental_from_buf at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-10/src/staticdata.c:3464
ijl_restore_package_image_from_file at /cache/build/builder-amdci4-5/julialang/julia-release-1-dot-10/src/staticdata.c:3548

But it doesn't error on 1.9.
I think a more graceful error would be good here.

@vchuravy vchuravy added compiler:precompilation Precompilation of modules regression Regression in behavior compared to a previous version labels Dec 7, 2023
@vchuravy vchuravy added this to the 1.10 milestone Dec 7, 2023
@vchuravy
Copy link
Member Author

vchuravy commented Dec 7, 2023

@Liozou this seems like a distinct issue, it is interesting that 1.9 doesn't fail in this case.
So maybe we need to take a look where that changed.

@vchuravy vchuravy removed this from the 1.10 milestone Dec 7, 2023
@vchuravy
Copy link
Member Author

vchuravy commented Dec 7, 2023

@gbaraldi pointed out that in 1.9 we didn't give the @spawn a chance to execute.

So adding a sleep reproduces this error on 1.9 as well.

vchuravy@odin ~/b/julia> cat PrecompileCrash.jl 
module PrecompileCrash

const channel = Channel{Nothing}(0)
Base.Threads.@spawn take!($channel)
sleep(1)

end # module PrecompileCrash

@vtjnash
Copy link
Member

vtjnash commented Dec 7, 2023

there should be an explicit error for that:

1463         else if (jl_typetagis(v, jl_task_tag << 4)) {
   1             jl_error("Task cannot be serialized");
   2         }

@vchuravy
Copy link
Member Author

vchuravy commented Dec 7, 2023

We find the task through the serialization graph of queue, and we explicitly skip tasks at some point.

vtjnash added a commit that referenced this issue Dec 11, 2023
Add a `nrunning` counter which identifies (when zero) when there is
nothing running anymore. Allowing us to gate all tasks on all threads on
reaching a quiescent state, not just thread 0. This should let us better
support running precompile with threads (since we will be ensured that
all of them are asleep in a consistent state before serialization tries
to inspect the process state). We could additionally stop them
afterwards to make sure there is no way for them to begin running, even
if we forgot about some other event source, but that seems unnecessary
paranoia for now.

Note it is quite hard to encounter currently, as most places where
precompile happens currently try to force the number of threads to 1.
But this should become more relevant in the future as more threads are
supported in more places. This also may help generally with being able
to ensure the IO loop is running on at least one thread (as that is
currently lacking in this PR and on master). And also help with being
able to decide on a more advanced tree-wakeup strategy, as we start to
track how many threads are in various states of running and sleeping,
relative to the amount of work they find.

Fixes #52435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants