Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve codegen for assignments to globals #44182

Merged
merged 1 commit into from
Mar 21, 2022
Merged

Conversation

simeonschaub
Copy link
Member

This gets through bootstrap fine, but I get segfaults in the GC as soon
as I start the REPL. Is there anything else I need to do here to get the
GC to track the value of the binding correctly?

src/codegen.cpp Outdated Show resolved Hide resolved
src/codegen.cpp Outdated Show resolved Hide resolved
Copy link
Member Author

@simeonschaub simeonschaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, LLVM should be able to optimize the following code to a single store because it's an unordered store and load, right?

julia> x::Int = 1
1

julia> function f()
           global x = 0
           while x<10
               x += 1
           end
           x
       end
f (generic function with 1 method)

julia> @code_llvm f()
;  @ REPL[2]:1 within `f`
define i64 @julia_f_1080() #0 {
top:
;  @ REPL[2]:2 within `f`
  store atomic {}* inttoptr (i64 140146660003872 to {}*), {}** inttoptr (i64 140144012463704 to {}**) unordered, align 8
  call void @llvm.assume(i1 icmp ne ({}* inttoptr (i64 140146660003872 to {}*), {}* null))
;  @ REPL[2]:3 within `f`
  %0 = load atomic i64*, i64** inttoptr (i64 140144012463704 to i64**) unordered, align 8
; ┌ @ int.jl:83 within `<`
   %1 = load i64, i64* %0, align 8
   %2 = icmp sgt i64 %1, 9
; └
  br i1 %2, label %L23, label %L12

L12:                                              ; preds = %16, %top
  %3 = phi i64 [ %18, %16 ], [ %1, %top ]
;  @ REPL[2]:4 within `f`
; ┌ @ int.jl:87 within `+`
   %4 = add nsw i64 %3, 1
; └
  %5 = call nonnull {}* @ijl_box_int64(i64 signext %4)
  store atomic {}* %5, {}** inttoptr (i64 140144012463704 to {}**) unordered, align 8
  %6 = load atomic i64, i64* getelementptr inbounds (i64, i64* inttoptr (i64 140144012463696 to i64*), i64 -1) unordered, align 8
  %7 = and i64 %6, 3
  %8 = icmp eq i64 %7, 3
  br i1 %8, label %9, label %16

9:                                                ; preds = %L12
  %10 = bitcast {}* %5 to i64*
  %11 = getelementptr inbounds i64, i64* %10, i64 -1
  %12 = load atomic i64, i64* %11 unordered, align 8
  %13 = and i64 %12, 1
  %14 = icmp eq i64 %13, 0
  br i1 %14, label %15, label %16

15:                                               ; preds = %9
  call void @jl_gc_queue_binding({}* inttoptr (i64 140144012463696 to {}*))
  br label %16

16:                                               ; preds = %15, %9, %L12
;  @ REPL[2]:3 within `f`
  %17 = load atomic i64*, i64** inttoptr (i64 140144012463704 to i64**) unordered, align 8
; ┌ @ int.jl:83 within `<`
   %18 = load i64, i64* %17, align 8
   %19 = icmp sgt i64 %18, 9
; └
  br i1 %19, label %L23, label %L12

L23:                                              ; preds = %16, %top
; ┌ @ int.jl:83 within `<`
   %.lcssa = phi i64 [ %1, %top ], [ %18, %16 ]
; └
;  @ REPL[2]:6 within `f`
  ret i64 %.lcssa
}

Any ideas what's still missing to make this work?

src/codegen.cpp Outdated
StoreInst *v = ctx.builder.CreateAlignedStore(rval, bp, Align(sizeof(void*)));
v->setOrdering(Order);
tbaa_decorate(ctx.tbaa().tbaa_binding, v);
ctx.builder.CreateAssumption(ctx.builder.CreateIsNotNull(rval));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping this would elide the undef var check in the following example, but that doesn't seem to work:

julia> global y::Int

julia> function g()
           global y = 1
           y
       end
g (generic function with 1 method)

julia> @code_llvm g()
;  @ REPL[2]:1 within `g`
define i64 @julia_g_1078() #0 {
top:
;  @ REPL[2]:2 within `g`
  store atomic {}* inttoptr (i64 140699562991712 to {}*), {}** inttoptr (i64 140696999452504 to {}**) unordered, align 8
  call void @llvm.assume(i1 icmp ne ({}* inttoptr (i64 140699562991712 to {}*), {}* null))
;  @ REPL[2]:3 within `g`
  br i1 icmp ne ({}* inttoptr (i64 140699562991712 to {}*), {}* null), label %ok, label %err

err:                                              ; preds = %top
  call void @ijl_undefined_var_error({}* inttoptr (i64 140699564102024 to {}*))
  unreachable

ok:                                               ; preds = %top
  %0 = load i64, i64* inttoptr (i64 140699562991712 to i64*), align 8
  ret i64 %0
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, the llvm.assume doesn't make much sense here, since it's a constant pointer so the null check should be trivial. Could this be a matter of us not running the right optimization passes? For reference, I can eliminate branches like this in godbolt by compiling with -O1: https://godbolt.org/z/sE5hs13xe

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I surmised to you offline, computePointerICmp may not be willing to claim that 140699562991712 is not the representation of null at https://llvm.org/doxygen/ValueTracking_8cpp_source.html#l02443

Look at @code_llvm raw=true g() for the full goryness that LLVM may see here

Copy link
Member Author

@simeonschaub simeonschaub Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that doesn't seem to contain anything too interesting in that regard: Nvm, it does actually show the addrspaces

julia> @code_llvm raw=true dump_module=true g()
; ModuleID = 'g'
source_filename = "g"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

;  @ REPL[2]:1 within `g`
define i64 @julia_g_1267() #0 !dbg !5 {
top:
;  @ REPL[2]:2 within `g`
  store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139739127959648 to {}*) to {} addrspace(10)*), {} addrspace(10)** inttoptr (i64 139739075273112 to {} addrspace(10)**) unordered, align 8, !dbg !7, !tbaa !8
;  @ REPL[2]:3 within `g`
  br i1 icmp ne ({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139739127959648 to {}*) to {} addrspace(10)*), {} addrspace(10)* null), label %ok, label %err, !dbg !13

err:                                              ; preds = %top
  call void @ijl_undefined_var_error({} addrspace(12)* addrspacecast ({}* inttoptr (i64 139739129125664 to {}*) to {} addrspace(12)*)), !dbg !13
  unreachable, !dbg !13

ok:                                               ; preds = %top
  %0 = load i64, i64 addrspace(10)* addrspacecast (i64* inttoptr (i64 139739127959648 to i64*) to i64 addrspace(10)*), align 8, !dbg !13, !tbaa !14
  ret i64 %0, !dbg !13
}

define nonnull {} addrspace(10)* @jfptr_g_1268({} addrspace(10)* %0, {} addrspace(10)** %1, i32 %2) #1 {
top:
  %3 = call i64 @julia_g_1267() #0
  %4 = call nonnull {} addrspace(10)* @ijl_box_int64(i64 signext %3)
  ret {} addrspace(10)* %4
}

declare nonnull {} addrspace(10)* @ijl_box_int64(i64 signext)

; Function Attrs: inaccessiblememonly norecurse nounwind
declare void @julia.write_barrier_binding({} addrspace(10)* readonly, ...) #2

; Function Attrs: noreturn
declare void @ijl_undefined_var_error({} addrspace(12)*) #3

; Function Attrs: inaccessiblemem_or_argmemonly
declare void @ijl_gc_queue_root({} addrspace(10)*) #4

; Function Attrs: inaccessiblemem_or_argmemonly
declare void @jl_gc_queue_binding({} addrspace(10)*) #4

; Function Attrs: allocsize(1)
declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc(i8*, i32, i32) #5

; Function Attrs: allocsize(1)
declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc(i8*, i64) #5

attributes #0 = { "probe-stack"="inline-asm" }
attributes #1 = { "probe-stack"="inline-asm" "thunk" }
attributes #2 = { inaccessiblememonly norecurse nounwind }
attributes #3 = { noreturn }
attributes #4 = { inaccessiblemem_or_argmemonly }
attributes #5 = { allocsize(1) }

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, enums: !4, nameTableKind: GNU)
!3 = !DIFile(filename: "REPL[2]", directory: ".")
!4 = !{}
!5 = distinct !DISubprogram(name: "g", linkageName: "julia_g_1267", scope: null, file: !3, line: 1, type: !6, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !4)
!6 = !DISubroutineType(types: !4)
!7 = !DILocation(line: 2, scope: !5)
!8 = !{!9, !9, i64 0}
!9 = !{!"jtbaa_binding", !10, i64 0}
!10 = !{!"jtbaa_data", !11, i64 0}
!11 = !{!"jtbaa", !12, i64 0}
!12 = !{!"jtbaa"}
!13 = !DILocation(line: 3, scope: !5)
!14 = !{!15, !15, i64 0}
!15 = !{!"jtbaa_value", !10, i64 0}

Or do I need a special build to see this in more detail?

src/codegen.cpp Outdated Show resolved Hide resolved
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@simeonschaub simeonschaub changed the title WIP: improve codegen for assignments to globals improve codegen for assignments to globals Mar 17, 2022
@simeonschaub simeonschaub marked this pull request as ready for review March 17, 2022 19:50
@simeonschaub
Copy link
Member Author

Ok, it seems like LLVM doesn't fully take advantage of this extra information yet, but I think this is still a worthwhile improvement as it stands. I have opened a discussion about the LLVM issues on their discourse forum: https://discourse.llvm.org/t/nonnull-assumption-on-literal-pointers-not-propagated-to-icmp-when-in-different-address-spaces/60975

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@simeonschaub simeonschaub merged commit c92ab5e into master Mar 21, 2022
@simeonschaub simeonschaub deleted the sds/codegen_globals branch March 21, 2022 13:09
vtjnash added a commit that referenced this pull request Nov 29, 2022
In looking at a TSAN report recently, I noticed that globals were
getting stored as atomic-unordered (since c92ab5e #44182), instead
of atomic-release as intended (since
46135df #45484).
vtjnash added a commit that referenced this pull request Nov 29, 2022
In looking at a TSAN report recently, I noticed that globals were
getting stored as atomic-unordered (since c92ab5e #44182), instead
of atomic-release as intended (since
46135df #45484).
vtjnash added a commit that referenced this pull request Nov 30, 2022
In looking at a TSAN report recently, I noticed that globals were
getting stored as atomic-unordered (since c92ab5e #44182), instead
of atomic-release as intended (since
46135df #45484).
KristofferC pushed a commit that referenced this pull request Dec 8, 2022
In looking at a TSAN report recently, I noticed that globals were
getting stored as atomic-unordered (since c92ab5e #44182), instead
of atomic-release as intended (since
46135df #45484).

(cherry picked from commit f4534d1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants