Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack allocation of tuples and structs with isbits union fields #29289

Open
mbravenboer opened this issue Sep 20, 2018 · 2 comments
Open

Stack allocation of tuples and structs with isbits union fields #29289

mbravenboer opened this issue Sep 20, 2018 · 2 comments
Labels
performance Must go faster

Comments

@mbravenboer
Copy link

mbravenboer commented Sep 20, 2018

Because an isbits union is not isbits, a tuple or struct containing an isbits union is not isbits. This means that those tuples or structs will currently be heap allocated. This shouldn't be necessary though because there are no heap references involved.

In the original isbits union discussion there are comments on the problems with making isbits unions true isbits types (and at that point it was decided to introduce a separate category isbitsunion). I'm not sure what those complications are and if they particularly relate to stack allocation. Perhaps stack allocation is still possible with making isbits union true isbitstypes?

Examples illustrating the problem:

size = 10000000

# Array of isbits union. No allocation issue
fn_union(x::Int) = x > 10 ? 1 : 2.0
fn_union_loop() = [fn_union(e) for e in 1:size]

# Array of tuples with isbits union. Will heap allocate tuples
fn_union_tuple(x::Int) = (x, fn_union(x))
fn_union_tuple_loop() = [fn_union_tuple(e) for e in 1:size]

# Array of structs with manual tag. No allocations
struct Foo1
    tag :: Bool
    x :: Int64
    y :: Float64
    Foo1(x::Int64) = new(true, x, 0.0)
    Foo1(x::Float64) = new(false, 0, x)
end
fn_union_struct1(x::Int) = Foo1(if iseven(x) x else Float64(x) end)
fn_union_struct1_loop() = [fn_union_struct1(e) for e in 1:size]

# Array of structs with isbits union. Will heap allocate the structs.
struct Foo2
    x :: Union{Int64, Float64}
    y :: Int64
end
fn_union_struct2(x) = Foo2(if iseven(x) x else Float64(x) end, x)
fn_union_struct2_loop() = [fn_union_struct2(e) for e in 1:size]

println("Array of isbits union:" )
for i in 1:3
    @time fn_union_loop()
end

println("Array of tuples containing an isbits union:" )
for i in 1:3
    @time fn_union_tuple_loop()
end

println("Array of structs with manual tagging of a union:" )
for i in 1:3
    @time fn_union_struct1_loop()
end

println("Array of structs containing an isbits union:" )
for i in 1:3  
    @time fn_union_struct2_loop()
end

Output:

Array of isbits union:
  0.171180 seconds (128.47 k allocations: 158.968 MiB, 2.82% gc time)
  0.118740 seconds (18 allocations: 152.588 MiB, 42.04% gc time)
  0.107816 seconds (18 allocations: 152.588 MiB, 36.16% gc time)
Array of tuples containing an isbits union:
  2.295431 seconds (20.24 M allocations: 698.719 MiB, 20.41% gc time)
  1.987941 seconds (20.00 M allocations: 686.639 MiB, 14.47% gc time)
  1.988189 seconds (20.00 M allocations: 686.639 MiB, 14.47% gc time)
Array of structs with manual tagging of a union:
  0.108185 seconds (66.01 k allocations: 232.131 MiB, 1.50% gc time)
  0.101095 seconds (4 allocations: 228.882 MiB, 23.12% gc time)
  0.114195 seconds (4 allocations: 228.882 MiB, 34.39% gc time)
Array of structs containing an isbits union:
  0.486259 seconds (10.07 M allocations: 384.890 MiB, 52.27% gc time)
  0.332681 seconds (10.00 M allocations: 381.470 MiB, 38.94% gc time)
  0.522927 seconds (10.00 M allocations: 381.470 MiB, 61.60% gc time)

Issue is reproducible on both 0.7 and nightly from today:

Julia Version 1.1.0-DEV.299
Commit 6354405908 (2018-09-20 13:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

It is peculiar that the performance difference for tuples is significantly worse than for structs.

Possibly this is because of the isbits union field optimization not being applied to tuples? Possibly that should be a separate feature request?

@JeffBezanson JeffBezanson added the performance Must go faster label Sep 20, 2018
@JeffBezanson
Copy link
Member

Thanks. The array of struct-with-union layout issue should be fairly tractable. The tricky part is that code probably exists that assumes isbits is synonymous with being allocated inline and having a C-compatible layout. We will have to separate those concepts (remarked on briefly here: #23567 (comment)).

Also note that fn_union_loop returns a Vector{Real} and fn_union_tuple_loop returns a Vector{Tuple{Int,Real}}. We don't merge Int and Float64 into a Union type by default. So this depends a bit on what types you have.

Tuples are tricky because of covariance --- unlike a struct with a Union-typed field, Tuple{Union{A,B}} is an abstract type and so does not have a fixed layout. I'm not sure what to do about that. One solution could be to normalize such types to unions of tuples when used as element types.

@mbravenboer
Copy link
Author

Thanks Jeff. I wasn't familiar with the covariance issue, but #24614 helped with that. You gave a similar example there :) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants