Error generating arm32 fp16 code #7976

steven-johnson · 2023-12-05T18:10:21Z

Some Google-specific code reports an error in some resampling code that targets arm32 with fp16 enabled. I don't have a repro case in Halide I can share yet, but I do have a .ll file (enclosed):

$ ~/llvm-18-install/bin/llc /tmp/Example.ll -o /tmp/foo
SplitVectorResult #0: t80: v4f16 = llvm.arm.neon.vpadd nnan ninf nsz contract afn reassoc TargetConstant:i32<3054>, t127, t126

LLVM ERROR: Do not know how to split the result of this operator!

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /usr/local/google/home/srj/llvm-18-install/bin/llc /tmp/Example.ll -o /tmp/foof
1.	Running pass 'Function Pass Manager' on module '/tmp/Example.ll'.
2.	Running pass 'ARM Instruction Selection' on function '@_Z7ExamplePKvP15halide_buffer_tS2_'
 #0 0x00005619909fd63b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2bef63b)
 #1 0x00005619909fa7eb llvm::sys::RunSignalHandlers() (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2bec7eb)
 #2 0x00005619909fa915 SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f35da77b510 (/lib/x86_64-linux-gnu/libc.so.6+0x3c510)
 #4 0x00007f35da7c90fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007f35da77b472 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007f35da7654b2 abort ./stdlib/abort.c:81:7
 #7 0x000056198e518ed7 (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x70aed7)
 #8 0x000056199094d758 (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2b3f758)
 #9 0x00005619908a62be llvm::DAGTypeLegalizer::SplitVectorResult(llvm::SDNode*, unsigned int) (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2a982be)
#10 0x000056199084c9db llvm::DAGTypeLegalizer::run() (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2a3e9db)
#11 0x000056199084dea8 llvm::SelectionDAG::LegalizeTypes() (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2a3fea8)
#12 0x00005619907bc1b4 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x29ae1b4)
#13 0x00005619907c1598 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x29b3598)
#14 0x00005619907c3251 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (.part.0) SelectionDAGISel.cpp:0:0
#15 0x000056198ebb8914 (anonymous namespace)::ARMDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) ARMISelDAGToDAG.cpp:0:0
#16 0x000056198fc9bf76 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#17 0x0000561990263e51 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2455e51)
#18 0x0000561990264261 llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2456261)
#19 0x0000561990264b22 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x2456b22)
#20 0x000056198e5da6d2 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x000056198e51e7b7 main (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x7107b7)
#22 0x00007f35da7666ca __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#23 0x00007f35da766785 call_init ./csu/../csu/libc-start.c:128:20
#24 0x00007f35da766785 __libc_start_main ./csu/../csu/libc-start.c:347:5
#25 0x000056198e5d0811 _start (/usr/local/google/home/srj/llvm-18-install/bin/llc+0x7c2811)

Example.ll.zip

The text was updated successfully, but these errors were encountered:

Fixes #7976

* Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if

* Half-plumbed * Revert "Half-plumbed" This reverts commit eb9dd02. * Interface for double buffer * Update Provides, Calls and Realizes for double buffering * Proper sync for double buffering * Use proper name for the semaphor and use correct initial value * Rename the class * Pass expression for index * Adds storage for double buffering index * Use a separate index to go through the double buffer * Failing test * Better handling of hoisted storage in all of the async-related passes * New test and clean-up the generated IR * More tests * Allow double buffering without async and add corresponding test * Filter out incorrect double_buffer schedules * Add tests to the cmake files * Clean up * Update the comment * Clean up * Clean up * Update serialization * complete_x86_target() should enable F16C and FMA when AVX2 is present (#7971) All known AVX2-enabled architectures definitely have these features. * Add two new tail strategies for update definitions (#7949) * Add two new tail strategies for update definitions * Stop printing asm * Update expected number of partitions for Partition::Always * Add a comment explaining why the blend safety check is per dimension * Add serialization support for the new tail strategies * trigger buildbots * Add comment --------- Co-authored-by: Steven Johnson <srj@google.com> * Add appropriate mattrs for arm-32 extensions (#7978) * Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if * Move canonical version numbers into source, not build system (#7980) (#7981) * Move canonical version numbers into source, not build system (#7980) * Fixes * Silence useless "Insufficient parallelism" autoscheduler warning (#7990) * Add a notebook with a visualization of the aprrox_* functions and their errors (#7974) * Add a notebook with a visualization of the aprrox_* functions and their errors * Fix spelling error * Make narrowing float->int casts on wasm go via wider ints (#7973) Fixes #7972 * Fix handling of assert statements whose conditions get vectorized (#7989) * Fix handling of assert statements whose conditions get vectorized * Fix test name * Fix all "unscheduled update()" warnings in our code (#7991) * Fix all "unscheduled update()" warnings in our code And also fix the Mullapudi scheduler to explicitly touch all update stages. This allows us to mark this warning as an error if we so choose. * fixes * fixes * Update recursive_box_filters.cpp * Silence useless 'Outer dim vectorization of var' warning in Mullapudi… (#7992) Silence useless 'Outer dim vectorization of var' warning in Mullapudi scheduler * Add a tutorial for async and double_buffer * Renamed double_buffer to ring_buffer * ring_buffer() now expects an extent Expr * Actually use extent for ring_buffer() * Address some of the comments * Provide an example of the code structure for producer-consumer async example * Comments updates * Fix clang-format and clang-tidy * Add Python binding for Func::ring_buffer() * Don't use a separate index for ring buffer + add a new test * Rename the tests * Clean up the old name * Add & * Move test to the right folder * Move expr * Add comments for InjectRingBuffering * Improve ring_buffer doc * Fix comments * Comments * A better error message * Mention that extent is expected to be a positive integer * Add another code structure and explain how the indices for ring buffer are computed * Expand test comments * Fix spelling --------- Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com>

* Add appropriate mattrs for arm-32 extensions Fixes halide#7976 * Pull clauses out of if

* Half-plumbed * Revert "Half-plumbed" This reverts commit eb9dd02. * Interface for double buffer * Update Provides, Calls and Realizes for double buffering * Proper sync for double buffering * Use proper name for the semaphor and use correct initial value * Rename the class * Pass expression for index * Adds storage for double buffering index * Use a separate index to go through the double buffer * Failing test * Better handling of hoisted storage in all of the async-related passes * New test and clean-up the generated IR * More tests * Allow double buffering without async and add corresponding test * Filter out incorrect double_buffer schedules * Add tests to the cmake files * Clean up * Update the comment * Clean up * Clean up * Update serialization * complete_x86_target() should enable F16C and FMA when AVX2 is present (halide#7971) All known AVX2-enabled architectures definitely have these features. * Add two new tail strategies for update definitions (halide#7949) * Add two new tail strategies for update definitions * Stop printing asm * Update expected number of partitions for Partition::Always * Add a comment explaining why the blend safety check is per dimension * Add serialization support for the new tail strategies * trigger buildbots * Add comment --------- Co-authored-by: Steven Johnson <srj@google.com> * Add appropriate mattrs for arm-32 extensions (halide#7978) * Add appropriate mattrs for arm-32 extensions Fixes halide#7976 * Pull clauses out of if * Move canonical version numbers into source, not build system (halide#7980) (halide#7981) * Move canonical version numbers into source, not build system (halide#7980) * Fixes * Silence useless "Insufficient parallelism" autoscheduler warning (halide#7990) * Add a notebook with a visualization of the aprrox_* functions and their errors (halide#7974) * Add a notebook with a visualization of the aprrox_* functions and their errors * Fix spelling error * Make narrowing float->int casts on wasm go via wider ints (halide#7973) Fixes halide#7972 * Fix handling of assert statements whose conditions get vectorized (halide#7989) * Fix handling of assert statements whose conditions get vectorized * Fix test name * Fix all "unscheduled update()" warnings in our code (halide#7991) * Fix all "unscheduled update()" warnings in our code And also fix the Mullapudi scheduler to explicitly touch all update stages. This allows us to mark this warning as an error if we so choose. * fixes * fixes * Update recursive_box_filters.cpp * Silence useless 'Outer dim vectorization of var' warning in Mullapudi… (halide#7992) Silence useless 'Outer dim vectorization of var' warning in Mullapudi scheduler * Add a tutorial for async and double_buffer * Renamed double_buffer to ring_buffer * ring_buffer() now expects an extent Expr * Actually use extent for ring_buffer() * Address some of the comments * Provide an example of the code structure for producer-consumer async example * Comments updates * Fix clang-format and clang-tidy * Add Python binding for Func::ring_buffer() * Don't use a separate index for ring buffer + add a new test * Rename the tests * Clean up the old name * Add & * Move test to the right folder * Move expr * Add comments for InjectRingBuffering * Improve ring_buffer doc * Fix comments * Comments * A better error message * Mention that extent is expected to be a positive integer * Add another code structure and explain how the indices for ring buffer are computed * Expand test comments * Fix spelling --------- Co-authored-by: Steven Johnson <srj@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com>

steven-johnson assigned abadams Dec 5, 2023

abadams added a commit that referenced this issue Dec 5, 2023

Add appropriate mattrs for arm-32 extensions

06bf183

Fixes #7976

abadams mentioned this issue Dec 5, 2023

Add appropriate mattrs for arm-32 extensions #7978

Merged

steven-johnson closed this as completed in #7978 Dec 5, 2023

steven-johnson pushed a commit that referenced this issue Dec 5, 2023

Add appropriate mattrs for arm-32 extensions (#7978)

209ec02

* Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if

vksnk pushed a commit that referenced this issue Dec 7, 2023

Add appropriate mattrs for arm-32 extensions (#7978)

40ea449

* Add appropriate mattrs for arm-32 extensions Fixes #7976 * Pull clauses out of if

ardier pushed a commit to ardier/Halide-mutation that referenced this issue Mar 3, 2024

Add appropriate mattrs for arm-32 extensions (halide#7978)

a57a300

* Add appropriate mattrs for arm-32 extensions Fixes halide#7976 * Pull clauses out of if

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error generating arm32 fp16 code #7976

Error generating arm32 fp16 code #7976

steven-johnson commented Dec 5, 2023