-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve codegen for assignments to globals #44182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, LLVM should be able to optimize the following code to a single store because it's an unordered store and load, right?
julia> x::Int = 1
1
julia> function f()
global x = 0
while x<10
x += 1
end
x
end
f (generic function with 1 method)
julia> @code_llvm f()
; @ REPL[2]:1 within `f`
define i64 @julia_f_1080() #0 {
top:
; @ REPL[2]:2 within `f`
store atomic {}* inttoptr (i64 140146660003872 to {}*), {}** inttoptr (i64 140144012463704 to {}**) unordered, align 8
call void @llvm.assume(i1 icmp ne ({}* inttoptr (i64 140146660003872 to {}*), {}* null))
; @ REPL[2]:3 within `f`
%0 = load atomic i64*, i64** inttoptr (i64 140144012463704 to i64**) unordered, align 8
; ┌ @ int.jl:83 within `<`
%1 = load i64, i64* %0, align 8
%2 = icmp sgt i64 %1, 9
; └
br i1 %2, label %L23, label %L12
L12: ; preds = %16, %top
%3 = phi i64 [ %18, %16 ], [ %1, %top ]
; @ REPL[2]:4 within `f`
; ┌ @ int.jl:87 within `+`
%4 = add nsw i64 %3, 1
; └
%5 = call nonnull {}* @ijl_box_int64(i64 signext %4)
store atomic {}* %5, {}** inttoptr (i64 140144012463704 to {}**) unordered, align 8
%6 = load atomic i64, i64* getelementptr inbounds (i64, i64* inttoptr (i64 140144012463696 to i64*), i64 -1) unordered, align 8
%7 = and i64 %6, 3
%8 = icmp eq i64 %7, 3
br i1 %8, label %9, label %16
9: ; preds = %L12
%10 = bitcast {}* %5 to i64*
%11 = getelementptr inbounds i64, i64* %10, i64 -1
%12 = load atomic i64, i64* %11 unordered, align 8
%13 = and i64 %12, 1
%14 = icmp eq i64 %13, 0
br i1 %14, label %15, label %16
15: ; preds = %9
call void @jl_gc_queue_binding({}* inttoptr (i64 140144012463696 to {}*))
br label %16
16: ; preds = %15, %9, %L12
; @ REPL[2]:3 within `f`
%17 = load atomic i64*, i64** inttoptr (i64 140144012463704 to i64**) unordered, align 8
; ┌ @ int.jl:83 within `<`
%18 = load i64, i64* %17, align 8
%19 = icmp sgt i64 %18, 9
; └
br i1 %19, label %L23, label %L12
L23: ; preds = %16, %top
; ┌ @ int.jl:83 within `<`
%.lcssa = phi i64 [ %1, %top ], [ %18, %16 ]
; └
; @ REPL[2]:6 within `f`
ret i64 %.lcssa
}
Any ideas what's still missing to make this work?
src/codegen.cpp
Outdated
StoreInst *v = ctx.builder.CreateAlignedStore(rval, bp, Align(sizeof(void*))); | ||
v->setOrdering(Order); | ||
tbaa_decorate(ctx.tbaa().tbaa_binding, v); | ||
ctx.builder.CreateAssumption(ctx.builder.CreateIsNotNull(rval)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping this would elide the undef var check in the following example, but that doesn't seem to work:
julia> global y::Int
julia> function g()
global y = 1
y
end
g (generic function with 1 method)
julia> @code_llvm g()
; @ REPL[2]:1 within `g`
define i64 @julia_g_1078() #0 {
top:
; @ REPL[2]:2 within `g`
store atomic {}* inttoptr (i64 140699562991712 to {}*), {}** inttoptr (i64 140696999452504 to {}**) unordered, align 8
call void @llvm.assume(i1 icmp ne ({}* inttoptr (i64 140699562991712 to {}*), {}* null))
; @ REPL[2]:3 within `g`
br i1 icmp ne ({}* inttoptr (i64 140699562991712 to {}*), {}* null), label %ok, label %err
err: ; preds = %top
call void @ijl_undefined_var_error({}* inttoptr (i64 140699564102024 to {}*))
unreachable
ok: ; preds = %top
%0 = load i64, i64* inttoptr (i64 140699562991712 to i64*), align 8
ret i64 %0
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, the llvm.assume
doesn't make much sense here, since it's a constant pointer so the null check should be trivial. Could this be a matter of us not running the right optimization passes? For reference, I can eliminate branches like this in godbolt by compiling with -O1
: https://godbolt.org/z/sE5hs13xe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I surmised to you offline, computePointerICmp
may not be willing to claim that 140699562991712
is not the representation of null at https://llvm.org/doxygen/ValueTracking_8cpp_source.html#l02443
Look at @code_llvm raw=true g()
for the full goryness that LLVM may see here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that doesn't seem to contain anything too interesting in that regard: Nvm, it does actually show the addrspaces
julia> @code_llvm raw=true dump_module=true g()
; ModuleID = 'g'
source_filename = "g"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; @ REPL[2]:1 within `g`
define i64 @julia_g_1267() #0 !dbg !5 {
top:
; @ REPL[2]:2 within `g`
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139739127959648 to {}*) to {} addrspace(10)*), {} addrspace(10)** inttoptr (i64 139739075273112 to {} addrspace(10)**) unordered, align 8, !dbg !7, !tbaa !8
; @ REPL[2]:3 within `g`
br i1 icmp ne ({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139739127959648 to {}*) to {} addrspace(10)*), {} addrspace(10)* null), label %ok, label %err, !dbg !13
err: ; preds = %top
call void @ijl_undefined_var_error({} addrspace(12)* addrspacecast ({}* inttoptr (i64 139739129125664 to {}*) to {} addrspace(12)*)), !dbg !13
unreachable, !dbg !13
ok: ; preds = %top
%0 = load i64, i64 addrspace(10)* addrspacecast (i64* inttoptr (i64 139739127959648 to i64*) to i64 addrspace(10)*), align 8, !dbg !13, !tbaa !14
ret i64 %0, !dbg !13
}
define nonnull {} addrspace(10)* @jfptr_g_1268({} addrspace(10)* %0, {} addrspace(10)** %1, i32 %2) #1 {
top:
%3 = call i64 @julia_g_1267() #0
%4 = call nonnull {} addrspace(10)* @ijl_box_int64(i64 signext %3)
ret {} addrspace(10)* %4
}
declare nonnull {} addrspace(10)* @ijl_box_int64(i64 signext)
; Function Attrs: inaccessiblememonly norecurse nounwind
declare void @julia.write_barrier_binding({} addrspace(10)* readonly, ...) #2
; Function Attrs: noreturn
declare void @ijl_undefined_var_error({} addrspace(12)*) #3
; Function Attrs: inaccessiblemem_or_argmemonly
declare void @ijl_gc_queue_root({} addrspace(10)*) #4
; Function Attrs: inaccessiblemem_or_argmemonly
declare void @jl_gc_queue_binding({} addrspace(10)*) #4
; Function Attrs: allocsize(1)
declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc(i8*, i32, i32) #5
; Function Attrs: allocsize(1)
declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc(i8*, i64) #5
attributes #0 = { "probe-stack"="inline-asm" }
attributes #1 = { "probe-stack"="inline-asm" "thunk" }
attributes #2 = { inaccessiblememonly norecurse nounwind }
attributes #3 = { noreturn }
attributes #4 = { inaccessiblemem_or_argmemonly }
attributes #5 = { allocsize(1) }
!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2}
!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: GNU)
!3 = !DIFile(filename: "REPL[2]", directory: ".")
!4 = !{}
!5 = distinct !DISubprogram(name: "g", linkageName: "julia_g_1267", scope: null, file: !3, line: 1, type: !6, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!6 = !DISubroutineType(types: !4)
!7 = !DILocation(line: 2, scope: !5)
!8 = !{!9, !9, i64 0}
!9 = !{!"jtbaa_binding", !10, i64 0}
!10 = !{!"jtbaa_data", !11, i64 0}
!11 = !{!"jtbaa", !12, i64 0}
!12 = !{!"jtbaa"}
!13 = !DILocation(line: 3, scope: !5)
!14 = !{!15, !15, i64 0}
!15 = !{!"jtbaa_value", !10, i64 0}
Or do I need a special build to see this in more detail?
fd7c457
to
d8b2dda
Compare
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
53aaad5
to
7c6938b
Compare
Ok, it seems like LLVM doesn't fully take advantage of this extra information yet, but I think this is still a worthwhile improvement as it stands. I have opened a discussion about the LLVM issues on their discourse forum: https://discourse.llvm.org/t/nonnull-assumption-on-literal-pointers-not-propagated-to-icmp-when-in-different-address-spaces/60975 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
This gets through bootstrap fine, but I get segfaults in the GC as soon
as I start the REPL. Is there anything else I need to do here to get the
GC to track the value of the binding correctly?