Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After conversion to LLVM we should be able to delete the inferred source of the kernel. #520

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

vchuravy
Copy link
Member

@simonbyrne has shown me a heap-snapshot were the inferred source took up >>1GB of ram.

src/jlgen.jl Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Patch coverage: 88.88% and project coverage change: -7.74% ⚠️

Comparison is base (edfdc1a) 83.18% compared to head (919242d) 75.44%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #520      +/-   ##
==========================================
- Coverage   83.18%   75.44%   -7.74%     
==========================================
  Files          24       24              
  Lines        3300     3270      -30     
==========================================
- Hits         2745     2467     -278     
- Misses        555      803     +248     
Files Changed Coverage Δ
src/jlgen.jl 77.85% <85.71%> (-2.07%) ⬇️
src/execution.jl 67.79% <100.00%> (-32.21%) ⬇️

... and 13 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/jlgen.jl Outdated Show resolved Hide resolved
@simonbyrne
Copy link
Contributor

This doesn't seem to fix my issue. I'm not sure exactly where the problem is, but I did notice:

julia> GPUCompiler.GLOBAL_CI_CACHES
Dict{CompilerConfig, GPUCompiler.CodeCache} with 2 entries:
  CompilerConfig for PTXCompilerTarget => CodeCache(IdDict{MethodInstance, Vector{CodeInstance}}(MethodInstance for >>(…
  CompilerConfig for PTXCompilerTarget => CodeCache(IdDict{MethodInstance, Vector{CodeInstance}}(MethodInstance for >>(…

julia> Base.summarysize(GPUCompiler.GLOBAL_CI_CACHES) / 10^6
1396.946174

julia> Base.summarysize(collect(values(GPUCompiler.GLOBAL_CI_CACHES))[1]) / 10^6
1393.855007

julia> Base.summarysize(collect(values(GPUCompiler.GLOBAL_CI_CACHES))[2]) / 10^6
3.090233

I tried manually calling empty! on this dict: it didn't seem to make any difference, so I suspect the data is being retaine somewhere else as well.

@simonbyrne
Copy link
Contributor

Also, what's odd is that RES reported by top is 6.3g, but

julia> Sys.maxrss() / 10^9
17.232601088

@maleadt
Copy link
Member

maleadt commented Sep 21, 2023

Removed a call to jl_uncompress_ir, as IIRC it was only needed for the 1.6 overlay hack: #151 (comment)
Maybe that also helps?

@simonbyrne
Copy link
Contributor

Unfortunately still no.

@maleadt
Copy link
Member

maleadt commented Sep 21, 2023

You could try taking a heap snapshot.

@simonbyrne
Copy link
Contributor

I did that: it looks like most of it is still the inferred objects:
Screenshot 2023-09-21 at 11 11 59 AM

I tried clearing them out manually:

for cache in values(GPUCompiler.GLOBAL_CI_CACHES)
    for insts in values(cache.dict)
        for inst in insts
            @atomic :release inst.inferred = nothing
        end
    end
end

that seemed to work:

Screenshot 2023-09-21 at 11 09 45 AM

top is still reporting 4GB of memory usage though, so not sure what is going on.

@vchuravy
Copy link
Member Author

So I am only deleting top-level kernel calls. Since everything else is re-usable.

@vchuravy
Copy link
Member Author

@maleadt are we tracking anywhere how big the modules are we load onto the GPU?

@maleadt
Copy link
Member

maleadt commented Sep 21, 2023

@maleadt are we tracking anywhere how big the modules are we load onto the GPU?

No, and I don't know of a way to query the size of a CuModule or CuContext.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants