Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM 6.0] flt2dec / dec2flt cause assertion failed: Wrong topological sorting #101

Open
shepmaster opened this issue May 3, 2018 · 22 comments
Labels
A-libcore Affects compiling the core library A-llvm Affects the LLVM AVR backend has-local-patch A patch exists but has not been applied to upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem

Comments

@shepmaster
Copy link
Member

shepmaster commented May 3, 2018

This is with my recently-rebased branch based on upstream Rust 190a6c. I've added the patch from #99 locally.

Case 1

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-b213db3.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

; Function Attrs: uwtable
define align 1 void @core__num__flt2dec__strategy__dragon__mul_pow() unnamed_addr  {
start:
  %ret.i = alloca [40 x i32], align 1
  %0 = zext i32 undef to i64
  %1 = add nuw nsw i16 0, 11
  br label %bb19.i105.11

bb19.i105.11:                                     ; preds = %start
  %2 = getelementptr inbounds [40 x i32], [40 x i32]* %ret.i, i16 0, i16 %1
  %3 = load i32, i32* %2, align 1
  %4 = mul nuw i64 %0, 2788392729
  %5 = zext i32 %3 to i64
  %6 = add nuw nsw i64 0, %5
  %7 = add i64 %6, %4
  %8 = trunc i64 %7 to i32
  store i32 %8, i32* %2, align 1
  unreachable
}

Case 2

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-95356ba.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

%"num::bignum::Big32x40" = type { [0 x i8], i16, [0 x i8], [40 x i32], [0 x i8] }

; Function Attrs: uwtable
define void @"core::num::dec2flt::num::digits_to_big"() unnamed_addr {
start:
  %_7.sroa.0.0..sroa_idx.i = getelementptr inbounds %"num::bignum::Big32x40", %"num::bignum::Big32x40"* undef, i16 0, i32 3, i16 0
  switch i2 undef, label %bb5.i10 [
    i2 0, label %bb2.i
    i2 1, label %bb3.i
    i2 -2, label %bb4.i
  ]

bb2.i:                                            ; preds = %start
  ret void

bb3.i:                                            ; preds = %start
  %0 = add i8 0, -48
  %1 = zext i8 %0 to i32
  %2 = load i32, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  %3 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 %1)
  %4 = extractvalue { i32, i1 } %3, 0
  store i32 %4, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  unreachable

bb4.i:                                            ; preds = %start
  unreachable

bb5.i10:                                          ; preds = %start
  unreachable
}

; Function Attrs: nounwind readnone speculatable
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)

Error

Assertion failed: (Node2Index[SU.NodeNum] > Node2Index[PD.getSUnit()->NodeNum] && "Wrong topological sorting"), function InitDAGTopologicalSorting, file /Users/shep/Projects/avr-rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp, line 518.
@shepmaster shepmaster added help wanted This is a good candidate issue to fix A-libcore Affects compiling the core library has-reduced-testcase A small LLVM IR file exists that demonstrates the problem labels May 3, 2018
@shepmaster
Copy link
Member Author

/cc @brainlag

@jhwgh1968
Copy link

jhwgh1968 commented May 3, 2018

Since I got my bugpoint setup working and fixed all my paths, I can now report a new cra-- oh. 😄

One piece of addition information: I did not apply #99 and I still got this crash in exactly the same place.

@shepmaster shepmaster changed the title [LLVM 6.0] flt2dec causes assertion failed: Wrong topological sorting [LLVM 6.0] flt2dec / dec2flt cause assertion failed: Wrong topological sorting May 3, 2018
@shepmaster shepmaster added A-libcore Affects compiling the core library A-llvm Affects the LLVM AVR backend and removed A-libcore Affects compiling the core library labels May 4, 2018
@brainlag brainlag mentioned this issue Aug 7, 2018
8 tasks
@TimNN
Copy link

TimNN commented Oct 8, 2018

I've been looking into this issue a bit (with a slightly different repro since I'm using a newer LLVM version). The issue appears to be that whatever data is passed to ScheduleDAGTopologicalSort, is not a DAG. The example where the assertion happens contains a cycle.

Note that this seems to be related to optimizations. Compiling with llc -O0 is fine, llc -O1 is not.

@TimNN
Copy link

TimNN commented Oct 8, 2018

In case that helps anyone, here is what I believe is the relevant part of the -debug output: https://gist.github.com/620098d70297404f3d02a8555363b916

The "LLVM ERROR: out of memory" seems easily explainable to me: LLVM apparently wants to compute the depth of the "DAG" for debug printing. Since the DAG contains a cycle, we get an infinite loop.

@brainlag
Copy link

brainlag commented Oct 8, 2018

If you compile llvm with asserts you will hit that and don't run into OOM error.
I debugged this in the past and IIRC the problem is because 2 nodes in the DAG are glued together which lead to the cycle. If these 2 nodes would not glued together it would be fine.

@TimNN
Copy link

TimNN commented Oct 8, 2018

the problem is because 2 nodes in the DAG are glued together

I'm not quite sure what you mean with "glued together". In my case there is a cycle between three nodes (2, 3 and 6 in the Graph below).

Graph

graph

@brainlag
Copy link

brainlag commented Oct 8, 2018

There are glue nodes which glues 2 (or more?) nodes together
While the IR code does not have a cycle, llvm wrongly fuses 2 instructions into 1 node which leads to a cycle in the DAG.

@TimNN
Copy link

TimNN commented Oct 9, 2018

Ah, I wasn't aware that "glue nodes" are an LLVM concept. Anyway, I think I understand now what you meant and see it for my example as well. So, IIUC, the next step would probably be figuring out why the two nodes are glued together.

In any case, I'm including the relevant graphs below, in case they are useful to anyone:

ISel Input

01-isel-input

Sched Input

02-sched-input

Scheduled

03-scheduled

@TimNN
Copy link

TimNN commented Oct 9, 2018

Looking at some more graphs, I believe that the issue occurs during the combine2 phase:

Before, we already have the glue node, but not yet a potential cycle. After combine2 we now have t40 and the potential cycle.

Before Combine 2

04-before-combine-2

After Combine 2

04-after-combine-2

@TimNN
Copy link

TimNN commented Oct 9, 2018

I think I know now what happens: In CombineToPostIndexedLoadStore, LLVM checks if t21 and t35 can be combined.

This may only be done if

Op must be independent of N, i.e. Op is neither a predecessor nor a successor of N. Otherwise, if Op is folded that would create a cycle.

The check for "!Op->isPredecessorOf(N) && !N->isPredecessorOf(Op)" then "correctly" identifies that they are independent, if one considers only the edges actually present in the graph.

The issue now is that the glue link between t28 and t29 will in effect be bidirectional (rather than unidirectional) once the two nodes are combined.

I can see three solutions to this:

  1. This could be considered a bug in CombineToPostIndexedLoadStore, thus CombineToPostIndexedLoadStore should be adapted to handle this case.
  2. Not generate the glue in the first place. (If it only exists for cosmetic / performance reasons, it could just be dropped. If it must be present, then I believe the glue would need to be replaced by a pseudo / virtual instruction).
  3. Ignore the glue in the schedule when it would lead to a cycle.

Intuitively, I don't think that (1) is feasible and don't like (3) as a solution. I'll try to figure out why / how / where the glue is inserted in the first place.

Edit: Any thoughts on the three proposed solutions?

@TimNN
Copy link

TimNN commented Oct 9, 2018

The glue nodes are generated by LLVM in DAGTypeLegalizer::ExpandIntRes_ADDSUB. Given that, I don't believe that solution (2) is easily feasible anymore, since any changes here would affect much more than just AVR.

@brainlag
Copy link

brainlag commented Oct 9, 2018

Imho, and i might be wrong, this is flaw in llvm. Either way I think you should discuss this with the llvm developers.

Also, when I looked up the code for CombineToPostIndexedLoadStore I found this.

@TimNN
Copy link

TimNN commented Oct 9, 2018

Yeah, I think you are correct that this can be considered an LLVM bug. I'll write to the mailing list or open a bug tomorrow.

Also, when I looked up the code for CombineToPostIndexedLoadStore I found this.

That commit should be a non-functional change. It just ensures that Op->isPredecessorOf(N) and N->isPredecessorOf(Op) don't do duplicate work.

@TimNN
Copy link

TimNN commented Oct 10, 2018

llvm-dev thread (or rather, initial message): http://lists.llvm.org/pipermail/llvm-dev/2018-October/126851.html

@TimNN
Copy link

TimNN commented Oct 10, 2018

Per the first reply on the ML, I've tried to disable the (add|sub)(c|e) instructions for AVR (patch) and that makes my reproduction compile.

I don't know yet if that breaks something else. Worstcase, I hope that this only makes things slower, which could be fixed by adding support for the (ADD|SUB)CARRY instructions.

Live-Edit: I've just ran the AVR CodeGen tests and took a closer look at add.ll. As far as I can tell, the generated assembly is much worse (for comparison, the assembly without the patch).

@shepmaster
Copy link
Member Author

And someone has a proposed fix for the glue

@shepmaster
Copy link
Member Author

We've recently upgraded to LLVM 8 🎉 I'm going to close any bug that is reported against an older version of LLVM. If you are still having this issue with the LLVM-8 based code, please ping me and I can reopen the issue!

@TimNN
Copy link

TimNN commented Nov 3, 2018

@shepmaster: This has a not-yet-upstreamed patch on the 8.0 branch, so I think we should keep the issue open?

@shepmaster shepmaster added has-local-patch A patch exists but has not been applied to upstream LLVM and removed help wanted This is a good candidate issue to fix labels Nov 3, 2018
@shepmaster shepmaster reopened this Nov 3, 2018
@crlf0710
Copy link

Built the official 8.0 release code with AVR target enabled, llced the code in OP, and it seems llc falls into a dead loop.

@dylanmckay
Copy link
Member

dylanmckay commented Jun 7, 2019

The way forward here is to convert the AVR backend from the old ADDE/ADDC/SUBE/SUBC DAG opcodes to the UADDO/ADDCARRY/USUBO/SUBCARRY DAG opcodes. The old ones have been deprecated for a while.

https://reviews.llvm.org/D53106#1443059

Summarizing offline discussion with jyknight for future reference:

This should only be necessary due to glued nodes that are not Copy{To,From}Reg and which have additional predecessors/successors than nodes on the glue.
This appear to only occur in the case of ADDC/ADDE nodes have been deprecated.

This patch was originally written for such an instance in the AVR backend.

@dylanmckay
Copy link
Member

Hm, I can't reproduce this on LLVM trunk anymore, on either testcase.

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-95356ba.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

%"num::bignum::Big32x40" = type { [0 x i8], i16, [0 x i8], [40 x i32], [0 x i8] }

; Function Attrs: uwtable
define void @"core::num::dec2flt::num::digits_to_big"() unnamed_addr {
start:
  %_7.sroa.0.0..sroa_idx.i = getelementptr inbounds %"num::bignum::Big32x40", %"num::bignum::Big32x40"* undef, i16 0, i32 3, i16 0
  switch i2 undef, label %bb5.i10 [
    i2 0, label %bb2.i
    i2 1, label %bb3.i
    i2 -2, label %bb4.i
  ]

bb2.i:                                            ; preds = %start
  ret void

bb3.i:                                            ; preds = %start
  %0 = add i8 0, -48
  %1 = zext i8 %0 to i32
  %2 = load i32, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  %3 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 %1)
  %4 = extractvalue { i32, i1 } %3, 0
  store i32 %4, i32* %_7.sroa.0.0..sroa_idx.i, align 1
  unreachable

bb4.i:                                            ; preds = %start
  unreachable

bb5.i10:                                          ; preds = %start
  unreachable
}

; Function Attrs: nounwind readnone speculatable
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)

Using avr-rust at avr-support-aa35e73b

./llc -march=avr -o - -filetype=asm ~/projects/builds/llvm-project/llvm/foo.ll

Gives

        .text                                                                                                                                                                                 
        .file   "bugpoint-output-95356ba.bc"                                                                                                                                                  
llc: /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp:507: void llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting(): Assertion `Node2Index[SU.NodeNum] > Node
2Index[PD.getSUnit()->NodeNum] && "Wrong topological sorting"' failed.                                                                                                                        
Stack dump:                                                                                                                                                                                   
0.      Program arguments: ./llc -march=avr -o - -filetype=asm /home/dylan/projects/builds/llvm-project/llvm/foo.ll                                                                           
1.      Running pass 'Function Pass Manager' on module '/home/dylan/projects/builds/llvm-project/llvm/foo.ll'.                                                                                
2.      Running pass 'AVR DAG->DAG Instruction Selection' on function '@"core::num::dec2flt::num::digits_to_big"'                                                                             
#0 0x00007f5841e1dd9a llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Unix/Signals.inc:499:3                                           
#1 0x00007f5841e1c234 llvm::sys::RunSignalHandlers() /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Signals.cpp:67:5                                                                 
#2 0x00007f5841e1c3b5 SignalHandler(int) /home/dylan/projects/avr-rust/rust/src/llvm/lib/Support/Unix/Signals.inc:348:31                                                                      
#3 0x00007f58414a44d0 __restore_rt (/usr/lib/libpthread.so.0+0x124d0)                                                                                                                         
#4 0x00007f584101482f __GI_raise (/usr/lib/libc.so.6+0x3782f)
#5 0x00007f5840fff672 __GI_abort (/usr/lib/libc.so.6+0x22672)
#6 0x00007f5840fff548 _nl_load_domain.cold.0 (/usr/lib/libc.so.6+0x22548)
#7 0x00007f584100cdb6 (/usr/lib/libc.so.6+0x2fdb6)
#8 0x00007f58422c3deb llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/ScheduleDAG.cpp:506:7
#9 0x00007f58424b1d75 (anonymous namespace)::ScheduleDAGRRList::Schedule() /home/dylan/projects/avr-rust/rust/src/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp:371:3

Using llvm trunk at 3a987cdf3 (r362773)

./bin/llc -march=avr -o - -filetype=asm  foo.ll 

Gives

        .text
        .file   "bugpoint-output-95356ba.bc"
        .globl  "core::num::dec2flt::num::digits_to_big" ; -- Begin function core::num::dec2flt::num::digits_to_big
        .p2align        1
        .type   "core::num::dec2flt::num::digits_to_big",@function
"core::num::dec2flt::num::digits_to_big": ; @"core::num::dec2flt::num::digits_to_big"
; %bb.0:                                ; %start
        ldi     r24, 1
        cpi     r24, 0
        breq    LBB0_2
; %bb.1:                                ; %bb2.i
        ret
LBB0_2:                                 ; %start
        ldi     r24, 0
        cpi     r24, 0
        brne    LBB0_4
; %bb.3:                                ; %bb4.i
LBB0_4:                                 ; %bb3.i
        ld      r24, Z
        ldd     r25, Z+1
        ldd     r18, Z+2
        ldd     r19, Z+3
        subi    r24, 48
        sbci    r25, 255
        sbci    r18, 255
        sbci    r19, 255
        st      Z, r24
        std     Z+1, r25
        std     Z+2, r18
        std     Z+3, r19
.Lfunc_end0:
        .size   "core::num::dec2flt::num::digits_to_big", .Lfunc_end0-"core::num::dec2flt::num::digits_to_big"
                                        ; -- End function

        ; Declaring this symbol tells the CRT that it should
        ;copy all variables from program memory to RAM on startup
        .globl  __do_copy_data
        ; Declaring this symbol tells the CRT that it should
        ;clear the zeroed data section on startup
        .globl  __do_clear_bss

So perhaps an LLVM upgrade will fix this?

@dylanmckay
Copy link
Member

I cherry-picked avr-rust/llvm-project@56d676b and avr-rust/llvm-project@a5b6478 into the avr-support-aa35e73b branch (#137), and this test no longer fails.

dylanmckay pushed a commit to avr-rust/llvm-project that referenced this issue Jun 8, 2019
Lazarus535 pushed a commit to Lazarus535/llvm-project that referenced this issue Oct 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-libcore Affects compiling the core library A-llvm Affects the LLVM AVR backend has-local-patch A patch exists but has not been applied to upstream LLVM has-reduced-testcase A small LLVM IR file exists that demonstrates the problem
Projects
None yet
Development

No branches or pull requests

6 participants