Base.summarysize returns different memory usage values on different runs #53061

Tortar · 2024-01-26T01:21:12Z

MWE:

mutable struct B
    x::Union{Float64, Tuple{Float64, Float64}}
    y::Union{Float64, Tuple{Float64, Float64}}
end

ex = B[B(rand(), (rand(),rand())) for _ in 1:10^5];

now if you execute:

julia> Base.summarysize(ex)
7776776

julia> Base.summarysize(ex)
7820648

julia> Base.summarysize(ex)
7923000

julia> Base.summarysize(ex)
7911824

This can become very extreme in some situation, I saw a 2x difference between runs on a more complex vector of structs.

Tested in 1.10 and nightly, same behaviour

vtjnash · 2024-01-26T23:36:01Z

The trouble comes from doing the wrong query here:

julia/base/summarysize.jl

Line 60 in c42df60

if !isbitstype(ft[i]) && isdefined(x, i)

Versus a more correct implementation of the function, as seen here that uses isptr to avoid double-counting:

julia/base/compiler/utilities.jl

Line 95 in c42df60

if !dtfd[i].isptr && datatype_pointerfree(typeof(f))

ketgg · 2024-01-28T11:35:40Z

Hey @vtjnash, I would like to work on this issue it would be great if you can guide me. Thanks.

inkydragon · 2024-01-28T14:50:53Z

It looks like this has been a problem since 1.0, so it may need to be reverse ported.

PS T:\> julia +1.10 .\53061-summarysize.jl
[7915808, 7931712, 7954968, 7967504, 7987304, 8000040, 7990696, 7982432, 7967640, 7956784]
PS T:\> julia +1.9 .\53061-summarysize.jl
[7721968, 7877112, 7937216, 7974416, 7981720, 8000040, 7999040, 7984648, 7984824, 7971424]
PS T:\> julia +1.8 .\53061-summarysize.jl
[7626120, 7933096, 7917768, 7961184, 7926904, 7968272, 7980464, 7999608, 8000040, 7988592]
PS T:\> julia +1.6 .\53061-summarysize.jl
[7330136, 7925376, 7940840, 7980024, 7981240, 7999712, 8000040, 7934824, 7980592, 7964688]
PS T:\> julia +1.0 .\53061-summarysize.jl
[7312352, 7557528, 7501648, 7327928, 7351744, 7391160, 7510224, 7323608, 7288408, 7399816]

PS T:\> cat .\53061-summarysize.jl
mutable struct B
    x::Union{Float64, Tuple{Float64, Float64}}
    y::Union{Float64, Tuple{Float64, Float64}}
end

ex = B[B(rand(), (rand(),rand())) for _ in 1:10^5];

[ Base.summarysize(ex) for _ in 1:10 ] |> println

vtjnash · 2024-01-29T19:52:53Z

I don't think that could be the same, since the Union-storage optimization (that is currently resulting in unreliable over-estimation, wasn't implemented until later)

@re1san If you take a look at the difference between those two implementations, do you see what parts of the base/compiler/utilities.jl there could be ported to the base/summarysize.jl so that the estimate by summarysize is the same as the accurate value computed by the compiler code?

Tortar · 2024-02-12T01:15:15Z

By the way this can even invert the memory efficiency of structs and mutable structs:

struct Z end

struct A
    b::Union{Z, Tuple{Float64, Float64}}
    d::Union{Z, Int}
    c::Union{Z, Symbol}
end

mutable struct B
    b::Union{Z, Tuple{Float64, Float64}}
    d::Union{Z, Int}
    c::Union{Z, Symbol}
end

vec_notmut = A[A(Z(), 1, :s) for _ in 1:10^6];
vec_mut = B[B(Z(), 1, :s) for _ in 1:10^6];

which results in

julia> Base.summarysize(vec_notmut)
83717104

julia> Base.summarysize(vec_mut)
56000064

(if it is the same problem, I'm a bit of unsure because here summarysize for vec_mut is stable while for vec_notmut is not)

vtjnash · 2024-02-12T04:27:59Z

Somewhat unclear, but it is at least true that Array/Memory does use a different branch condition when deciding whether to double-count the memory usage of the elements:

julia/base/summarysize.jl

Line 143 in c42df60

    
           if !isempty(obj) && T !== Symbol && (!Base.allocatedinline(T) || (T isa DataType && !Base.datatype_pointerfree(T)))

xlxs4 · 2024-05-13T15:31:22Z

So, would

             nf = nfields(x)
-            ft = typeof(x).types
-            if !isbitstype(ft[i]) && isdefined(x, i)
-                val = getfield(x, i)
+            dt = typeof(x)
+            dtfd = Base.DataTypeFieldDesc(dt)
+            if isdefined(x, i)
+                f = getfield(x, i)
+                if dtfd[i].isptr || !Base.datatype_pointerfree(typeof(f))
+                    val = f
+                end
             end
         end

take care of #53061 (comment)?

julia> Base.summarysize(ex)
5600056

julia> Base.summarysize(ex)
5600056

julia> Base.summarysize(ex)
5600056

The second case (vec_notmut) still oscillates.

Tortar · 2024-05-22T01:09:59Z

I checked it out your implementation @xlxs4, I think I understood the logic and it seems fine to me

Tortar · 2024-05-22T17:15:56Z

Could you open up a PR with that @xlxs4?

fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

…54606) fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

…54606) fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com> (cherry picked from commit 68fe512)

Tortar changed the title ~~Base.summarysize returns multiple memory usage values on different runs~~ Base.summarysize returns different memory usage values on different runs Jan 26, 2024

Tortar mentioned this issue Jan 26, 2024

SumType becomes less memory efficient with non-isbits fields MasonProtter/SumTypes.jl#65

Closed

vtjnash added good first issue Indicates a good issue for first-time contributors to Julia observability metrics, timing, understandability, reflection, logging, ... labels Jan 26, 2024

xlxs4 mentioned this issue May 22, 2024

Avoid double-counting in Base.summarysize #54555

Closed

JeffBezanson added a commit that referenced this issue May 28, 2024

fix double-counting and non-deterministic results in summarysize

0ef55f7

fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

JeffBezanson mentioned this issue May 28, 2024

fix double-counting and non-deterministic results in summarysize #54606

Merged

JeffBezanson added a commit that referenced this issue May 29, 2024

fix double-counting and non-deterministic results in summarysize

c6d44a3

fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

JeffBezanson added a commit that referenced this issue May 29, 2024

fix double-counting and non-deterministic results in summarysize

c03319a

fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

JeffBezanson added a commit that referenced this issue Jun 3, 2024

fix double-counting and non-deterministic results in summarysize

eba7627

fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

JeffBezanson added a commit that referenced this issue Jun 5, 2024

fix double-counting and non-deterministic results in summarysize

ca25c29

fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

JeffBezanson closed this as completed in #54606 Jun 11, 2024

JeffBezanson added a commit that referenced this issue Jun 11, 2024

fix double-counting and non-deterministic results in summarysize (#…

68fe512

…54606) fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>

KristofferC pushed a commit that referenced this issue Jun 13, 2024

fix double-counting and non-deterministic results in summarysize (#…

2dec97f

…54606) fixes #53061 Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com> (cherry picked from commit 68fe512)

IanButterworth mentioned this issue Jun 23, 2024

Double counting test broken on i686-linux #54895

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base.summarysize returns different memory usage values on different runs #53061

Base.summarysize returns different memory usage values on different runs #53061

Tortar commented Jan 26, 2024 •

edited

Loading

vtjnash commented Jan 26, 2024 •

edited

Loading

ketgg commented Jan 28, 2024

inkydragon commented Jan 28, 2024

vtjnash commented Jan 29, 2024

Tortar commented Feb 12, 2024 •

edited

Loading

vtjnash commented Feb 12, 2024

xlxs4 commented May 13, 2024 •

edited

Loading

Tortar commented May 22, 2024

Tortar commented May 22, 2024

Base.summarysize returns different memory usage values on different runs #53061

Base.summarysize returns different memory usage values on different runs #53061

Comments

Tortar commented Jan 26, 2024 • edited Loading

vtjnash commented Jan 26, 2024 • edited Loading

ketgg commented Jan 28, 2024

inkydragon commented Jan 28, 2024

vtjnash commented Jan 29, 2024

Tortar commented Feb 12, 2024 • edited Loading

vtjnash commented Feb 12, 2024

xlxs4 commented May 13, 2024 • edited Loading

Tortar commented May 22, 2024

Tortar commented May 22, 2024

Tortar commented Jan 26, 2024 •

edited

Loading

vtjnash commented Jan 26, 2024 •

edited

Loading

Tortar commented Feb 12, 2024 •

edited

Loading

xlxs4 commented May 13, 2024 •

edited

Loading