Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow compilation and segfault with recursion on highly nested struct #30744

Open
tkf opened this issue Jan 17, 2019 · 7 comments
Open

Slow compilation and segfault with recursion on highly nested struct #30744

tkf opened this issue Jan 17, 2019 · 7 comments
Labels
performance Must go faster

Comments

@tkf
Copy link
Member

tkf commented Jan 17, 2019

Continuing from #30125 (comment)

@JeffBezanson Thanks a lot for looking into this. Actually, it turned out the script eventually segfaults if I wait long enough (3 to 5 minutes, at least in some machines; I'll try to find the difference in setup). This should reproduce the issue:

]add Transducers#long-compilation
using Transducers
@time include(joinpath(dirname(dirname(pathof(Transducers))), "examples", "tutorial_missings.jl"))

If I run julia with --compile=min or more precisely with julia --startup-file=no --compiled-modules=no --compile=min -e 'using Transducers; include(joinpath(dirname(dirname(pathof(Transducers))), "examples", "tutorial_missings.jl"))' the output is something like

 input: (1, 10)
output: 10
 input: (2, 12)
output: 24
 input: (3, 14)
output: 42
  0.095141 seconds (100.05 k allocations: 6.278 MiB)
ans = (2, 3.0)
  0.226999 seconds (128.36 k allocations: 8.364 MiB, 21.15% gc time)
ans = ((3, -1.0), (2, 3.0))
  0.131456 seconds (91.80 k allocations: 6.101 MiB)
ans = (3, 2)

but without --compile=min the script should hang after printing ans = (2, 3.0).

Only the last section of the script is needed to demonstrate the slow compilation. So a smaller example than tutorial_missings.jl would be:

using Transducers
argext_step(should_update) =
    ((oldindex, oldvalue), (index, value)) ->
        should_update(oldvalue, value) ? (index, value) : (oldindex, oldvalue)
init_helper(::typeof(>), ::Type{Tuple{F, S}}) where {F, S} = (zero(F), typemax(S))
init_helper(::typeof(<), ::Type{Tuple{F, S}}) where {F, S} = (zero(F), typemin(S))
argext_init(should_update) = Initializer(TT -> init_helper(should_update, TT))
xf_scanext(should_update) = Scan(argext_step(should_update),
                                 argext_init(should_update))
xf_fullextrema(xf_filter = NotA(Missing)) =
    Zip(Count(), xf_filter) |> Zip(xf_scanext(>), xf_scanext(<))
mapfoldl(xf_fullextrema(), right, [1.0, 3.0, -1.0, missing, 2.0])  # slow
mapfoldl(xf_fullextrema(), right, [1.0, 3.0, -1.0, 2.0])  # slower

When it segfaults, the output is something like:

julia> using Transducers

julia> @time include(joinpath(dirname(dirname(pathof(Transducers))), "examples", "tutorial_missings.jl"))
 input: (1, 10)
output: 10
 input: (2, 12)
output: 24
 input: (3, 14)
output: 42
  1.499609 seconds (3.35 M allocations: 152.190 MiB, 6.37% gc time)
ans = (2, 3.0)

signal (11): Segmentation fault
in expression starting at /home/takafumi/.julia/dev/Transducers/examples/tutorial_missings.jl:324
unknown function (ip: 0x7f99d197be50)
unknown function (ip: 0x7f99d1b6d5c2)
unknown function (ip: 0x7f99d1b6d73f)
unknown function (ip: 0x7f99d1b6ef1a)
unknown function (ip: 0x7f99d1b78cd1)
unknown function (ip: 0x7f99d1b7ad2e)
unknown function (ip: 0x7f99d1b7e67c)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/takafumi/repos/watch/julia/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/takafumi/repos/watch/julia/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/takafumi/repos/watch/julia/usr/bin/../lib/libLLVM-6.0.so (unknown line)
unknown function (ip: 0x7f99d1b4fc9d)
unknown function (ip: 0x7f99d1b53779)
unknown function (ip: 0x7f99d1b55b95)
unknown function (ip: 0x7f99d1af6248)
unknown function (ip: 0x7f99d1b48fe6)
jl_fptr_trampoline at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
transduce at /home/takafumi/.julia/dev/Transducers/src/processes.jl:227 [inlined]
#transduce#45 at /home/takafumi/.julia/dev/Transducers/src/processes.jl:204
unknown function (ip: 0x7f99ac0ef739)
jl_fptr_trampoline at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
#transduce at ./none:0 [inlined]
#mapfoldl#46 at /home/takafumi/.julia/dev/Transducers/src/processes.jl:233
unknown function (ip: 0x7f99ac0ef4e9)
jl_fptr_trampoline at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
mapfoldl at /home/takafumi/.julia/dev/Transducers/src/processes.jl:233
unknown function (ip: 0x7f99ac0ef31d)
jl_fptr_trampoline at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f99d1bea716)
unknown function (ip: 0x7f99d1bea460)
unknown function (ip: 0x7f99d1beb0de)
unknown function (ip: 0x7f99d1beb7b2)
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f99c4761bdf)
unknown function (ip: 0x3)
unknown function (ip: 0x7f99d1bebc7b)
unknown function (ip: 0x7f99d1ab4e72)
unknown function (ip: 0x7f99d1a8e62d)
jl_load at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f99c69c6f30)
unknown function (ip: 0x7f99c69d25cb)
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
include at ./client.jl:418
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f99d1bea716)
unknown function (ip: 0x7f99d1bea460)
unknown function (ip: 0x7f99d1beb369)
unknown function (ip: 0x7f99d1beb7b2)
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f99c315f93f)
unknown function (ip: 0x5)
unknown function (ip: 0x7f99d1bebc7b)
unknown function (ip: 0x7f99d1ab4e72)
unknown function (ip: 0x7f99d1ab518e)
jl_toplevel_eval_in at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f99c69d2a21)
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f99c6b6c77f)
run_backend at /home/takafumi/.julia/packages/Revise/yp5KG/src/Revise.jl:769
#58 at ./task.jl:259
jl_fptr_trampoline at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
jl_apply_generic at /home/takafumi/repos/watch/julia/usr/bin/../lib/libjulia.so.1 (unknown line)
unknown function (ip: 0x7f99d1a9c510)
unknown function (ip: 0xffffffffffffffff)
Allocations: 306876904 (Pool: 306871095; Big: 5809); GC: 364
@tkf tkf changed the title Slow compilation with recursion on highly nested struct Slow compilation and segfault with recursion on highly nested struct Jan 17, 2019
@JeffBezanson
Copy link
Member

Wow, this seems to involve a type so large we basically can't (or shouldn't) generate specialized code for it. Is this supposed to be a pathological case, or is it fairly normal transducer code?

With --inline=no compilation is still a bit slow, but the code finishes in less than 2 minutes. Maybe we just need an inlining limit based on function size.

@tkf
Copy link
Member Author

tkf commented Jan 17, 2019

This is a useful example that I hope I can support. But indeed I'm a bit lazy in Transducers.jl internal not trying hard enough to make the struct shallower 😅 (I'm planning on doing that). I tried using a flat tuple instead of a "linked list" style nested struct but it seemed that inference gave up determining a concrete type well when I constantly re-creating tuples.

Actually, I have a question: Is it better to use "flat" style like Nodes{Tuple{Node1{T1}, Node2{T2}, Node3{T3}}} instead of "linked list" style like Node1{T1, Node2{T2, Node3{T3, Nothing}}}? Or what matter is simply the depth of recursion call (so that the representation of the data structure does not matter)? I'd like to make it compiler-friendly so that Transducers.jl can be inferred and inlined as much as possible.

@tkf
Copy link
Member Author

tkf commented Jan 17, 2019

Just to clarify a bit more, a call chain (top to bottom) of the flat style would be:

next(::Nodes{Tuple{Node1{T1}, Node2{T2}, Node3{T3}}}, ...)
next(::Nodes{Tuple{Node2{T2}, Node3{T3}}}, ...)
next(::Nodes{Tuple{Node3{T3}}}, ...)

i.e., each next has to pop off the first node, reconstruct the tuple, and then bundle them in a struct Nodes.

On the other hand, a call chain of the linked list style would be:

next(::Node1{T1, Node2{T2, Node3{T3, Nothing}}}, ...)
next(::Node2{T2, Node3{T3, Nothing}}, ...)
next(::Node3{T3, Nothing}, ...)

It's nicer since I don't have to re-construct the structs. But I worry if it is not compiler-friendly.

@JeffBezanson
Copy link
Member

The flat style is probably better. We have moved to that in Base.Iterators, for example, for Zip and Product.

@tkf
Copy link
Member Author

tkf commented Jan 18, 2019

Thanks. Hmm.... So I guess I have to try flat style again at some point.

But it looked like to me that the inference in linked list style was much better than the flat one. Or maybe rather the compiler was "forced" to continue inference because there is no limit? Or maybe simply my code had performance bugs...

Anyway, I suppose that the plan for this issue is that keeping it open until there is some inference/inline limit. (So I'm not touching the close button ATM.)

@JeffBezanson
Copy link
Member

It's quite possible there are things we can improve in inference to make the "flat" approach work better. As always, the answer is to file more issues :)

@tkf
Copy link
Member Author

tkf commented Jan 18, 2019

the answer is to file more issues

I can do that 👍

(But, to be honest, I personally prefer if julia handle recursive data structure as good as tuples.)

@brenhinkeller brenhinkeller added the performance Must go faster label Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants