Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Alternative attempt at compile-on-demand layer integration #44886

Closed
wants to merge 11 commits into from

Conversation

pchintalapudi
Copy link
Member

@pchintalapudi pchintalapudi commented Apr 6, 2022

This PR is an alternative to #44575. As opposed to that PR, this version worksTM on LLVM 13 and with our current memory manager, by not splitting up modules (thus avoiding a recursive memory manager allocation). Where JITLink is available, we will split up modules more aggressively, although the performance impact from this is unknown as we emit modules with fairly fine granularity anyways.

Known failures include something in the ccall testset and in InteractiveUtils.

Depends on #44605 for increased parallelism gains from multiple contexts. Depends on #44926 for various thread safety improvements.

@pchintalapudi pchintalapudi added compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM labels Apr 6, 2022
@pchintalapudi pchintalapudi force-pushed the pc/codlayer5 branch 2 times, most recently from 12f6bc2 to 8aa06ba Compare April 12, 2022 02:33
@gbaraldi
Copy link
Member

Deadlock on non-debug builds aside, this is about 5% faster on TTFP than master on my tests. Same thing for time to first solve.

@pchintalapudi
Copy link
Member Author

Conclusions so far: the ccall test failure should be fixed by #44967 when it is merged into master, InteractiveUtils failure is a failure of code_native which probably relates to compile-on-demand, and I need to investigate the deadlocks and windows misc test failures.

@gbaraldi
Copy link
Member

From what I've seen the deadlock happens because allocateCodeSection runs twice on the same thread without running finalizeMemory so allocating_thread == std::this_thread::get_id() and no other thread will run finalize so it just gets stuck.

@pchintalapudi
Copy link
Member Author

That is a problem, that was unfortunately not resolved by the partition function fix. I'll see if we can simply ignore the context of that assert and if libunwind has improved its support in the last 6 years.

@pchintalapudi pchintalapudi force-pushed the pc/codlayer5 branch 2 times, most recently from 5a796f2 to 61a065f Compare April 13, 2022 19:41
@gbaraldi
Copy link
Member

It works for me, time to first solve is about 12% better and TTFP is about 6% better so 🚀

@giordano
Copy link
Contributor

What's "time to first solve" now? 😅

@gbaraldi
Copy link
Member

How long it takes for a solve to happen in OrdinaryDiffEq.

using OrdinaryDiffEq
f(u,p,t) = 1.01*u
u0=1/2
tspan = (0.0,1.0)
prob = ODEProblem(f,u0,tspan)
sol = solve(prob,Tsit5(),reltol=1e-8,abstol=1e-8)

I can take other ideas for Time to First *

@gbaraldi
Copy link
Member

Interactive utils failure happens because this check fails

if (fit != objmap.end() && fptr < fit->first + fit->second.SectionSize) {
From a random run while printing

std::cout << "fitobj: " << (fit != objmap.end()) << " fptr: " << fptr << " fitfir :" << fit->first << " sec: " << fit->second.SectionSize<< std::endl;

this pr:

julia> function linear_foo()
                      return 5
                  end
linear_foo (generic function with 1 method)

julia> @code_native dump_module=false linear_foo()
fitobj: 1 fptr: 140663119466160 fitfir :140660438813392 sec: 3593
WARNING: Unable to find function pointer

master:

@code_native dump_module=false linear_foo()
fitobj: 1 fptr: 139721520063056 fitfir :139721520063056 sec: 49
fitobj: 0 fptr: 5 fitfir :23 sec: 0
fitobj: 0 fptr: 0 fitfir :23 sec: 0
fitobj: 0 fptr: 1 fitfir :23 sec: 0
        .text
; ┌ @ REPL[1]:1 within `linear_foo`
        movl    $5, %eax
        retq
        nopw    %cs:(%rax,%rax)
; └

The section size seems suspiciously large

@pchintalapudi
Copy link
Member Author

So I think the root cause of the code_native failures is somewhat different.

  1. The problem lies earlier in jl_dump_method_asm_impl, where we first attempt to fully compile the method, and if we succeed then we try to read the assembly directly from the compiled method and annotate it as we see fit.
  2. Unfortunately, with compile-on-demand, this does not work in the general case because our function pointer may point to a compilation thunk, which is for all intents and purposes complete garbage for the assembly reader.
  3. Instead we need to rely on the fallback implementation of dump_method_asm that recompiles the function and dumps the resultant IR's assembly.
  4. But this fallback method does not appear to respect dump_module=false, so the InteractiveUtils test will still fail (although any output is still better than returning an empty string).

I will patch up the PR to ignore our disassembly implementation, but there will likely be no discernible change in number of tests passing.

@pchintalapudi
Copy link
Member Author

Compile-on-demand summary statistics during sysimg build

Core.Compiler

4035 jitlayers              - Number of modules added to the JIT
3956 jitlayers              - Number of modules compiled by the JIT

sys.ji

5274 jitlayers              - Number of modules added to the JIT
4562 jitlayers              - Number of modules compiled by the JIT

sys-o.a (various points)

1440 jitlayers             - Number of modules added to the JIT
860 jitlayers              - Number of modules compiled by the JIT

1729 jitlayers              - Number of modules added to the JIT
1205 jitlayers              - Number of modules compiled by the JIT

2377 jitlayers                    - Number of modules added to the JIT
1851 jitlayers                    - Number of modules compiled by the JIT
These stats are during the actual system image build stage

3 jitlayers                    - Number of modules added to the JIT
3 jitlayers                    - Number of modules compiled by the JIT

Total

14858 modules added to the JIT
12437 modules compiled by the JIT (passed the compile-on-demand layer)

83.7% of modules were actually compiled by the JIT during overall julia compilation
77.9% of modules were actually compiled by the JIT during system image build (with optimization turned on)

@pchintalapudi pchintalapudi deleted the pc/codlayer5 branch May 21, 2023 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants