Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to reset the compile cache #508

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Seelengrab
Copy link
Contributor

It can happen sometimes that a bad LLVM module makes it into the compile cache, which is really annoying to debug due to the very unhelpful LLVM error this causes (the try/catch here never fires, as LLVM seems to kill the process). The easiest fix is just to get a clean slate, by deleting the compile cache in the scratch space. This PR adds a function to do just that - it's already been useful in debugging a bad AVR runtime cache :)

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Sep 7, 2023

Huh, just noticed this:

GPUCompiler.jl/src/rtlib.jl

Lines 152 to 160 in d9b8f47

# remove the existing cache
# NOTE: call this function from global scope, so any change triggers recompilation.
function reset_runtime()
lock(runtime_lock) do
rm(compile_cache; recursive=true, force=true)
end
return
end

I suppose that's already the same thing, except it doesn't take care of the additional versioning scheme with 1.9+? Should I delete this or close this PR? I probably should get the runtime lock at least.

@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -7.33% ⚠️

Comparison is base (c0c9ab7) 82.28% compared to head (83d24cb) 74.96%.
Report is 23 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #508      +/-   ##
==========================================
- Coverage   82.28%   74.96%   -7.33%     
==========================================
  Files          23       24       +1     
  Lines        3082     3259     +177     
==========================================
- Hits         2536     2443      -93     
- Misses        546      816     +270     
Files Changed Coverage Δ
src/rtlib.jl 94.73% <ø> (-1.42%) ⬇️
src/GPUCompiler.jl 100.00% <100.00%> (ø)

... and 19 files with indirect coverage changes

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

@Seelengrab
Copy link
Contributor Author

The failure on julia master seems to be due to #506

@maleadt
Copy link
Member

maleadt commented Sep 8, 2023

due to the very unhelpful LLVM error this causes (the try/catch here never fires, as LLVM seems to kill the process)

Yeah, I'm not sure why. When trying in isolation, reading invalid bitcode does throw a proper error. I'm not sure why it suddenly exits there sometimes...

I suppose that's already the same thing, except it doesn't take care of the additional versioning scheme with 1.9+?

That may be a good thing to add.

Regarding the usefulness of this PR, I'm not sure it's worth the changes right now. Renaming reset_runtime is better, but seems like a unnecessary change given that we hope to remove this function at some point in the future (and support automatic invalidation, like Base does). You also shouldn't need to ever have to call create_compile_cache(), that should happen automatically (if it doesn't, that's a bug).

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Sep 8, 2023

Yeah, I'm not sure why. When trying in isolation, reading invalid bitcode does throw a proper error. I'm not sure why it suddenly exits there sometimes...

Well, I can pretty reliably reproduce this with AVR, would you be interested in a MWE for that?

You also shouldn't need to ever have to call create_compile_cache(), that should happen automatically (if it doesn't, that's a bug).

Right, that's why none of this is exported. It's just useful to have a function for this that can be called in both __init__ and the reset of the compile cache.

There are also various places where mkpath(compile_cache) is called explicitly in the codebase, such as here:

GPUCompiler.jl/src/rtlib.jl

Lines 136 to 139 in d9b8f47

if lib === nothing
@debug "Building the GPU runtime library at $path"
mkpath(compile_cache)
lib = build_runtime(job)

which may be replaced with a call to create_compile_cache (with a check & bail inside) instead.

@maleadt
Copy link
Member

maleadt commented Sep 8, 2023

Well, I can pretty reliably reproduce this with AVR, would you be interested in a MWE for that?

Yes, please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants