Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base.summarysize returns different memory usage values on different runs #53061

Closed
Tortar opened this issue Jan 26, 2024 · 9 comments · Fixed by #54606
Closed

Base.summarysize returns different memory usage values on different runs #53061

Tortar opened this issue Jan 26, 2024 · 9 comments · Fixed by #54606
Labels
good first issue Indicates a good issue for first-time contributors to Julia observability metrics, timing, understandability, reflection, logging, ...

Comments

@Tortar
Copy link
Contributor

Tortar commented Jan 26, 2024

MWE:

mutable struct B
    x::Union{Float64, Tuple{Float64, Float64}}
    y::Union{Float64, Tuple{Float64, Float64}}
end

ex = B[B(rand(), (rand(),rand())) for _ in 1:10^5];

now if you execute:

julia> Base.summarysize(ex)
7776776

julia> Base.summarysize(ex)
7820648

julia> Base.summarysize(ex)
7923000

julia> Base.summarysize(ex)
7911824

This can become very extreme in some situation, I saw a 2x difference between runs on a more complex vector of structs.

Tested in 1.10 and nightly, same behaviour

@Tortar Tortar changed the title Base.summarysize returns multiple memory usage values on different runs Base.summarysize returns different memory usage values on different runs Jan 26, 2024
@vtjnash
Copy link
Member

vtjnash commented Jan 26, 2024

The trouble comes from doing the wrong query here:

if !isbitstype(ft[i]) && isdefined(x, i)

Versus a more correct implementation of the function, as seen here that uses isptr to avoid double-counting:

if !dtfd[i].isptr && datatype_pointerfree(typeof(f))

@vtjnash vtjnash added good first issue Indicates a good issue for first-time contributors to Julia observability metrics, timing, understandability, reflection, logging, ... labels Jan 26, 2024
@ketgg
Copy link
Contributor

ketgg commented Jan 28, 2024

Hey @vtjnash, I would like to work on this issue it would be great if you can guide me. Thanks.

@inkydragon
Copy link
Member

It looks like this has been a problem since 1.0, so it may need to be reverse ported.

PS T:\> julia +1.10 .\53061-summarysize.jl
[7915808, 7931712, 7954968, 7967504, 7987304, 8000040, 7990696, 7982432, 7967640, 7956784]
PS T:\> julia +1.9 .\53061-summarysize.jl
[7721968, 7877112, 7937216, 7974416, 7981720, 8000040, 7999040, 7984648, 7984824, 7971424]
PS T:\> julia +1.8 .\53061-summarysize.jl
[7626120, 7933096, 7917768, 7961184, 7926904, 7968272, 7980464, 7999608, 8000040, 7988592]
PS T:\> julia +1.6 .\53061-summarysize.jl
[7330136, 7925376, 7940840, 7980024, 7981240, 7999712, 8000040, 7934824, 7980592, 7964688]
PS T:\> julia +1.0 .\53061-summarysize.jl
[7312352, 7557528, 7501648, 7327928, 7351744, 7391160, 7510224, 7323608, 7288408, 7399816]

PS T:\> cat .\53061-summarysize.jl
mutable struct B
    x::Union{Float64, Tuple{Float64, Float64}}
    y::Union{Float64, Tuple{Float64, Float64}}
end

ex = B[B(rand(), (rand(),rand())) for _ in 1:10^5];

[ Base.summarysize(ex) for _ in 1:10 ] |> println

@vtjnash
Copy link
Member

vtjnash commented Jan 29, 2024

I don't think that could be the same, since the Union-storage optimization (that is currently resulting in unreliable over-estimation, wasn't implemented until later)

@re1san If you take a look at the difference between those two implementations, do you see what parts of the base/compiler/utilities.jl there could be ported to the base/summarysize.jl so that the estimate by summarysize is the same as the accurate value computed by the compiler code?

@Tortar
Copy link
Contributor Author

Tortar commented Feb 12, 2024

By the way this can even invert the memory efficiency of structs and mutable structs:

struct Z end

struct A
    b::Union{Z, Tuple{Float64, Float64}}
    d::Union{Z, Int}
    c::Union{Z, Symbol}
end

mutable struct B
    b::Union{Z, Tuple{Float64, Float64}}
    d::Union{Z, Int}
    c::Union{Z, Symbol}
end

vec_notmut = A[A(Z(), 1, :s) for _ in 1:10^6];
vec_mut = B[B(Z(), 1, :s) for _ in 1:10^6];

which results in

julia> Base.summarysize(vec_notmut)
83717104

julia> Base.summarysize(vec_mut)
56000064

(if it is the same problem, I'm a bit of unsure because here summarysize for vec_mut is stable while for vec_notmut is not)

@vtjnash
Copy link
Member

vtjnash commented Feb 12, 2024

Somewhat unclear, but it is at least true that Array/Memory does use a different branch condition when deciding whether to double-count the memory usage of the elements:

if !isempty(obj) && T !== Symbol && (!Base.allocatedinline(T) || (T isa DataType && !Base.datatype_pointerfree(T)))

@xlxs4
Copy link
Contributor

xlxs4 commented May 13, 2024

So, would

             nf = nfields(x)
-            ft = typeof(x).types
-            if !isbitstype(ft[i]) && isdefined(x, i)
-                val = getfield(x, i)
+            dt = typeof(x)
+            dtfd = Base.DataTypeFieldDesc(dt)
+            if isdefined(x, i)
+                f = getfield(x, i)
+                if dtfd[i].isptr || !Base.datatype_pointerfree(typeof(f))
+                    val = f
+                end
             end
         end

take care of #53061 (comment)?

julia> Base.summarysize(ex)
5600056

julia> Base.summarysize(ex)
5600056

julia> Base.summarysize(ex)
5600056

The second case (vec_notmut) still oscillates.

@Tortar
Copy link
Contributor Author

Tortar commented May 22, 2024

I checked it out your implementation @xlxs4, I think I understood the logic and it seems fine to me

@Tortar
Copy link
Contributor Author

Tortar commented May 22, 2024

Could you open up a PR with that @xlxs4?

JeffBezanson added a commit that referenced this issue May 28, 2024
fixes #53061

Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>
JeffBezanson added a commit that referenced this issue May 29, 2024
fixes #53061

Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>
JeffBezanson added a commit that referenced this issue May 29, 2024
fixes #53061

Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>
JeffBezanson added a commit that referenced this issue Jun 3, 2024
fixes #53061

Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>
JeffBezanson added a commit that referenced this issue Jun 5, 2024
fixes #53061

Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>
JeffBezanson added a commit that referenced this issue Jun 11, 2024
…54606)

fixes #53061

Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>
KristofferC pushed a commit that referenced this issue Jun 13, 2024
…54606)

fixes #53061

Co-authored-by: Orestis Ousoultzoglou <orousoultzoglou@gmail.com>
(cherry picked from commit 68fe512)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors to Julia observability metrics, timing, understandability, reflection, logging, ...
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants