Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new marking loop has a regression when marking arrays of pointers #49205

Closed
gbaraldi opened this issue Mar 31, 2023 · 7 comments · Fixed by #49315
Closed

The new marking loop has a regression when marking arrays of pointers #49205

gbaraldi opened this issue Mar 31, 2023 · 7 comments · Fixed by #49315
Labels
GC Garbage collector regression Regression in behavior compared to a previous version

Comments

@gbaraldi
Copy link
Member

When looking into #49120, it seems the regression seen there was hiding a part of the new mark loop where it's slower when marking a very large array of pointers.

using Random: seed!
seed!(1)

abstract type Cell end

struct CellA<:Cell
    a::Int
end

struct CellB<:Cell
    b::String
end

function fillcells!(mc::Array{Cell})
    for ind in eachindex(mc)
        mc[ind] = ifelse(rand() > 0.5, CellA(ind), CellB(string(ind)))
    end
    return mc
end

mcells = Array{Cell}(undef, 5000, 5000 )
t1 = @elapsed fillcells!(mcells)
t2 = @elapsed fillcells!(mcells)

println("filling: $t1 s\nfilling again: $t2 s")

@time GC.gc()
@time GC.gc()

One of the GCs after will do a full mark of the array and take a long time to do it. This probably wasn't seen before because arrays of many pointers had really bad GC behaviour overall, but with #49185 it's quite clear.

@d-netto
Copy link
Member

d-netto commented Mar 31, 2023

Seems like #49185 is adding a GC bit to encode whether an object is in the image.

Why could that amplify a potential regression when marking large arrays of pointers?

@d-netto d-netto added regression Regression in behavior compared to a previous version GC Garbage collector labels Mar 31, 2023
@gbaraldi
Copy link
Member Author

gbaraldi commented Mar 31, 2023

That PR is doing two different GC fixes, I haven't updated the title. It also fixes a behaviour where we stopped increasing the interval size when encountering many pointers which led to the regression in #49120. It's just that it seems we were doing so many GCs that it hid whatever is going on.
#48935 has the fix but not the new mark loop and it's quite a bit faster if you want an easy test to compare.

@d-netto
Copy link
Member

d-netto commented Mar 31, 2023

I can reproduce the performance difference across backports-release-1.9 and #49185.

Most of the degradation seems to be coming from chunking (batching mechanism we use to mark large arrays of pointers):

Screenshot 2023-03-31 at 12 16 52 PM

@gbaraldi
Copy link
Member Author

I think the chunking itself is fine, if you look it's all in the hot loop of the function.

@d-netto
Copy link
Member

d-netto commented Mar 31, 2023

I'm inclined to say it's coming from chunking. After increasing the batch size MAX_REFS_AT_ONCE from 2^16 to 2^20, the pathological full mark goes down from 20s to 3s.

This is not necessarily a solution, but could suggest that our chunking algorithm may need some adjustments.

@gbaraldi
Copy link
Member Author

Oh, that's quite interesting.

@d-netto
Copy link
Member

d-netto commented Mar 31, 2023

CC: @vchuravy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants