implement "engine" for managing inference/codegen #54816

vtjnash · 2024-06-15T01:39:55Z

Continuing from previous PRs to making CodeInstance the primary means of tracking compilation, this introduces an "engine" which keeps track externally of whether a particular inference result is in progress and where. At present, this handles unexpected cycles by permitting both threads to work on it. This is likely to be optimal most of the time currently, until we have the ability to do work-stealing of the results.

To assist with that, CodeInstance is now primarily allocated by jl_engine_reserve, which also tracks that this is being currently inferred. This creates a sort of per-(MI,owner) tuple lock mechanism, which can be used with the double-check pattern to see if inference was completed while waiting on that. The world value is not included since that is inferred later, so there is a possibility that a thread waits only to discover that the result was already invalid before it could use it (though this should be unlikely).

The process then can notify when it has finished and wants to release the reservation lock on that identity pair. When doing so, it may also provide source code, allowing the process to potentially begin a threadpool to compile that result while the main thread continues with the job of inference.

Includes fix for #53434, by ensuring SOURCE_MODE_ABI results in the item going into the global cache.

Fixes #53433, as inInference is computed by the engine and protected by a lock, which also fixes #53680.

(n.b. somehow the finalizers_inhibited counter appears to be getting corrupted by the worlds and staged tests, so maybe it will be necessarily to shift exactly how that is done, if we do want to prevent finalizers from occurring here, e.g. JULIA_CPU_THREADS=1 ../julia -p1 runtests.jl worlds int causes an assertion failure. I suspect this is because it is Task-local state, but if something fails and we forget to reject it, then it will be subtracted from ptls2 instead, which might not be the same.)

src/engine.cpp

gbaraldi · 2024-06-17T13:03:22Z

Do we want to add something to the devdocs like:
The life of a codeinstance, all the way from someone requesting a MI to it being codegened?

Keno · 2024-06-19T23:43:22Z

Overall, I think I like the direction. I think the only conceptual thing I flagged is that this start assuming that owner is an egal-comparable cache-key. In further development #54373 I ended up finding it useful to separate the egal-comparable key data and the non-volatile data. That said, I'm happy to keep going with this and figure out how to make that distinction later. I think in currently implemented code, it doesn't really matter.

vtjnash · 2024-06-20T00:19:44Z

this start assuming that owner is an egal-comparable cache-key

I speculate that what will be happening here (eventually, if we need more flexibility), is that you would use a particular egal-comparable key for the engine reservation, but then if you later infer that you you want to refine that, the code would be allowed to do that, as long as that follows conceptually the same sort narrowing and lifecycle management that applies to the other fields. It is potentially very similar to how we deal with pkg precompile locks for example, where the pidfile is simply the package name, rather than being a guarantee that it is precisely the final content hash, and then we move it into place later.

vtjnash · 2024-06-20T00:59:27Z

Oh, also, of secondary note is that this is permitted to be reimplemented in Julia for anything that won't go through the JIT. The only reason this needs to be in C for the stdlib is that it needs to manage the state of things in the JIT before Julia is compiled and that adds some fairly strict concurrency constraints. For stuff that is merely reusing the inference machinery from Julia (e.g. not causing recursion through the JIT), it is permitted to reimplement this in Julia, in any similar way, using any alternate key comparator strategies.

Keno · 2024-06-20T03:23:05Z

Yeah, let's try it. As I said, directionally correct, so let's get the implementation in and then we can figure out what tweaks may be required for the new features that we want.

Continuing from previous PRs to making CodeInstance the primary means of tracking compilation, this introduces an "engine" which keeps track externally of whether a particular inference result is in progress and where. At present, this handles unexpected cycles by permitting both threads to work on it. This is likely to be optimal most of the time currently, until we have the ability to do work-stealing of the results. To assist with that, CodeInstance is now primarily allocated by `jl_engine_reserve`, which also tracks that this is being currently inferred. This creates a sort of per-(MI,owner) tuple lock mechanism, which can be used with the double-check pattern to see if inference was completed while waiting on that. The `world` value is not included since that is inferred later, so there is a possibility that a thread waits only to discover that the result was already invalid before it could use it (though this should be unlikely). The process then can notify when it has finished and wants to release the reservation lock on that identity pair. When doing so, it may also provide source code, allowing the process to potentially begin a threadpool to compile that result while the main thread continues with the job of inference. Includes fix for #53434, by ensuring SOURCE_MODE_ABI results in the item going into the global cache. Fixes #53433, as inInference is computed by the engine and protected by a lock, which also fixes #53680.

There is not permitted to be either a safepoint or equivalently a safepoint region transition while a lock is being held that might be needed by GC as that forms a cycle. We must ensure that this thread is permitted to keep running, so that it becomes possible for it to reach the condition wait call and release the lock to keep in synchronized with the notify_all in the sweep. Alternatively we could use a timed_wait and continuous poll the GC (to ensure we actually run GC periodically), but I would rather that be an internal GC policy than an external forcing factor here. This prevents the GC from recruiting this thread (via a signal) to help GC even though it is observably sleeping, but we might consider later integrating a way for the GC to notify a set of condition variables upon starting to wake them and recruit them temporarily via a subsequent safepoint such as (in julia-syntax psuedocode): gc_safe_enter() lock(condvar) do while !condmet wait(condvar) while gc_running unlock(lock) unsafe_enter() gc_safepoint() unsafe_leave() lock(lock) end end end gc_safe_leave()

This bug was fixed in C17, but we are using gcc 9.1, which caused misalignment here on 32-bit platforms. We have this behind a lock necessarily, so it could just be a boolean, but that might need some more code cleanup.

vtjnash · 2024-06-22T18:58:09Z

Ended up just using ~/julia/julia --project=.. bughunt.jl 'https://buildkite.com/julialang/julia-master/builds/37504#019032a2-da34-49fd-8b24-295852b1c908 and that pretty quickly let me identify the compiler bug (technically a C11 standards bug) in the version of gcc being used that was causing this to fail

I'm not sure whether this is the correct way to do it, but this fixes #586 locally for me.

vtjnash requested a review from Keno June 15, 2024 01:39

vtjnash force-pushed the jn/engine branch 2 times, most recently from 2b86701 to 6bdf8a4 Compare June 15, 2024 20:11

vtjnash marked this pull request as ready for review June 15, 2024 20:11

gbaraldi reviewed Jun 17, 2024

View reviewed changes

src/engine.cpp Show resolved Hide resolved

gbaraldi reviewed Jun 17, 2024

View reviewed changes

src/engine.cpp Show resolved Hide resolved

nsajko added compiler:codegen Generation of LLVM IR and native code compiler:inference Type inference bugfix This change fixes an existing bug labels Jun 17, 2024

vtjnash force-pushed the jn/engine branch from 6bdf8a4 to 940c2c3 Compare June 19, 2024 14:22

Keno approved these changes Jun 20, 2024

View reviewed changes

vtjnash mentioned this pull request Jun 20, 2024

inference: minor follow up on #54793 #54863

Closed

vtjnash added 3 commits June 22, 2024 18:56

workaround C11 standards bug

e71f687

This bug was fixed in C17, but we are using gcc 9.1, which caused misalignment here on 32-bit platforms. We have this behind a lock necessarily, so it could just be a boolean, but that might need some more code cleanup.

vtjnash force-pushed the jn/engine branch from 940c2c3 to e71f687 Compare June 22, 2024 18:56

vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Jun 22, 2024

vtjnash merged commit 2bf4750 into master Jun 23, 2024
8 checks passed

vtjnash deleted the jn/engine branch June 23, 2024 00:28

IanButterworth removed the merge me PR is reviewed. Merge when all tests are passing label Jun 23, 2024

simeonschaub mentioned this pull request Aug 31, 2024

do_typeinf!(interp, mi) does not always put mi into cache JuliaDebug/Cthulhu.jl#586

Closed

simeonschaub added a commit to JuliaDebug/Cthulhu.jl that referenced this pull request Sep 1, 2024

adapt to JuliaLang/julia#54816

5e2513e

I'm not sure whether this is the correct way to do it, but this fixes #586 locally for me.

maleadt added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request Sep 17, 2024

Adapt to JuliaLang/julia#54816.

70d82fc

maleadt added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request Sep 17, 2024

Adapt to JuliaLang/julia#54816.

7094ea4

maleadt added a commit to JuliaGPU/GPUCompiler.jl that referenced this pull request Sep 17, 2024

Adapt to JuliaLang/julia#54816.

8a0e0a4

aviatesk mentioned this pull request Sep 27, 2024

regression in gc safety of debuginfo #53434

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement "engine" for managing inference/codegen #54816

implement "engine" for managing inference/codegen #54816

vtjnash commented Jun 15, 2024 •

edited

Loading

gbaraldi commented Jun 17, 2024

Keno commented Jun 19, 2024

vtjnash commented Jun 20, 2024

vtjnash commented Jun 20, 2024

Keno commented Jun 20, 2024

vtjnash commented Jun 22, 2024

implement "engine" for managing inference/codegen #54816

implement "engine" for managing inference/codegen #54816

Conversation

vtjnash commented Jun 15, 2024 • edited Loading

gbaraldi commented Jun 17, 2024

Keno commented Jun 19, 2024

vtjnash commented Jun 20, 2024

vtjnash commented Jun 20, 2024

Keno commented Jun 20, 2024

vtjnash commented Jun 22, 2024

vtjnash commented Jun 15, 2024 •

edited

Loading