Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to optimize .reverse() with more than one usage in the same file #85872

Closed
Mikadore opened this issue May 31, 2021 · 6 comments · Fixed by #101200
Closed

Failure to optimize .reverse() with more than one usage in the same file #85872

Mikadore opened this issue May 31, 2021 · 6 comments · Fixed by #101200
Assignees
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Mikadore
Copy link

I was experimenting with some code on godbolt, when encountering this:

pub fn u16_be_to_arch(mut data: [u8;2]) -> [u8;2] {
    data.reverse();
    data
}

with -C opt-level=3 produces the expected assembly:

example::u16_be_to_arch:
        mov     eax, edi
        rol     ax, 8
        ret

However, when adding (into the same file) this function:

pub fn u32_be_to_arch(mut data: [u8;4]) -> [u8;4] {
    data.reverse();
    data
}

The previous function's codegen changes to

example::u16_be_to_arch:
        sub     rsp, 2
        mov     eax, edi
        rol     ax, 8
        mov     word ptr [rsp], ax
        add     rsp, 2
        ret

https://godbolt.org/z/MWTar9xa3
https://godbolt.org/z/sKTznrKh9

The issue: it stores the result on the stack, and then immediately discards that. (which it didn't do previously)

The second potential issue I'm unsure about is, that here a function's codegen changes by just adding removing seemingly unrelated functions. I was told on discord this is because the two functions use the same reverse, and adding the second one forces reverse to be generalized to [u8;2] and [u8;4] as opposed to just [u8;2]. But I by far don't know enough to judge whether this is a problem or not, or what it's caused by, but I wanted to mention it.

@Mikadore Mikadore added the C-bug Category: This is a bug. label May 31, 2021
@inquisitivecrystal
Copy link
Contributor

@rustbot label +A-codegen +I-slow +T-compiler

@rustbot rustbot added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2021
@workingjubilee
Copy link
Member

workingjubilee commented Jul 6, 2022

What's really weird is that on -C opt-level=1 --emit=llvm-ir -Cdebuginfo=0, it looks like this:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%"unwind::libunwind::_Unwind_Exception" = type { i64, void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [6 x i64] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }

define i16 @_ZN7example14u16_be_to_arch17hed7e5b3bb5283cd2E(i16 %0) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
  %data.sroa.0.0.insert.insert = call i16 @llvm.bswap.i16(i16 %0)
  ret i16 %data.sroa.0.0.insert.insert
}

define i32 @_ZN7example14u32_be_to_arch17h34475b177da247b9E(i32 %0) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
  %data.sroa.0.0.insert.insert = call i32 @llvm.bswap.i32(i32 %0)
  ret i32 %data.sroa.0.0.insert.insert
}

declare noundef i32 @rust_eh_personality(i32, i32 noundef, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #0

declare i16 @llvm.bswap.i16(i16) #1

declare i32 @llvm.bswap.i32(i32) #1

attributes #0 = { nonlazybind uwtable "probe-stack"="__rust_probestack" "target-cpu"="x86-64" }
attributes #1 = { nofree nosync nounwind readnone speculatable willreturn }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}

But for -C opt-level=2 --emit=llvm-ir -Cdebuginfo=0, we get this:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%"unwind::libunwind::_Unwind_Exception" = type { i64, void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [6 x i64] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }

define i16 @_ZN7example14u16_be_to_arch17hed7e5b3bb5283cd2E(i16 %0) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
  %data = alloca i16, align 2
  %1 = getelementptr inbounds i16, i16* %data, i64 1
  %2 = bitcast i16* %1 to i8*
  tail call void @llvm.experimental.noalias.scope.decl(metadata !2) #4
  tail call void @llvm.experimental.noalias.scope.decl(metadata !5) #4
  %_34.i.i = bitcast i16* %data to i8*
  %_39.i.i = getelementptr inbounds i8, i8* %2, i64 -1
  tail call void @llvm.experimental.noalias.scope.decl(metadata !7) #4
  tail call void @llvm.experimental.noalias.scope.decl(metadata !10) #4
  tail call void @llvm.experimental.noalias.scope.decl(metadata !12) #4
  tail call void @llvm.experimental.noalias.scope.decl(metadata !15) #4
  %3 = trunc i16 %0 to i8
  %4 = lshr i16 %0, 8
  %5 = trunc i16 %4 to i8
  store i8 %5, i8* %_34.i.i, align 2, !alias.scope !17, !noalias !20
  store i8 %3, i8* %_39.i.i, align 1, !alias.scope !21, !noalias !22
  %.sroa.0.0.copyload = load i16, i16* %data, align 2
  ret i16 %.sroa.0.0.copyload
}

define i32 @_ZN7example14u32_be_to_arch17h34475b177da247b9E(i32 %0) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
  %data.sroa.0.0.insert.insert = call i32 @llvm.bswap.i32(i32 %0)
  ret i32 %data.sroa.0.0.insert.insert
}

declare noundef i32 @rust_eh_personality(i32, i32 noundef, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #1

declare void @llvm.experimental.noalias.scope.decl(metadata) #2

declare i32 @llvm.bswap.i32(i32) #3

attributes #0 = { nofree nosync nounwind nonlazybind uwtable "probe-stack"="__rust_probestack" "target-cpu"="x86-64" }
attributes #1 = { nonlazybind uwtable "probe-stack"="__rust_probestack" "target-cpu"="x86-64" }
attributes #2 = { inaccessiblememonly nofree nosync nounwind willreturn }
attributes #3 = { nofree nosync nounwind readnone speculatable willreturn }
attributes #4 = { nounwind }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 7, !"PIC Level", i32 2}
!1 = !{i32 2, !"RtLibUseGOT", i32 1}
!2 = !{!3}
!3 = distinct !{!3, !4, !"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$7reverse7revswap17heb06b5d2673f9130E: %a.0"}
!4 = distinct !{!4, !"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$7reverse7revswap17heb06b5d2673f9130E"}
!5 = !{!6}
!6 = distinct !{!6, !4, !"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$7reverse7revswap17heb06b5d2673f9130E: %b.0"}
!7 = !{!8}
!8 = distinct !{!8, !9, !"_ZN4core3mem4swap17h6e6b79a4a47ccb97E: %x"}
!9 = distinct !{!9, !"_ZN4core3mem4swap17h6e6b79a4a47ccb97E"}
!10 = !{!11}
!11 = distinct !{!11, !9, !"_ZN4core3mem4swap17h6e6b79a4a47ccb97E: %y"}
!12 = !{!13}
!13 = distinct !{!13, !14, !"_ZN4core3mem11swap_simple17h4fde363dcdbcde8aE: %x"}
!14 = distinct !{!14, !"_ZN4core3mem11swap_simple17h4fde363dcdbcde8aE"}
!15 = !{!16}
!16 = distinct !{!16, !14, !"_ZN4core3mem11swap_simple17h4fde363dcdbcde8aE: %y"}
!17 = !{!13, !8, !3, !18}
!18 = distinct !{!18, !19, !"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$7reverse17he43b2164665b0fd7E: %self.0"}
!19 = distinct !{!19, !"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$7reverse17he43b2164665b0fd7E"}
!20 = !{!16, !11, !6}
!21 = !{!16, !11, !6, !18}
!22 = !{!13, !8, !3}

@workingjubilee workingjubilee added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-heavy Issue: Problems and improvements with respect to binary size of generated code. labels Jul 6, 2022
@nikic
Copy link
Contributor

nikic commented Jul 6, 2022

The reason why the second call makes a difference is that IPSCCP is no longer able to propagate the slice length into various functions early.

@nikic
Copy link
Contributor

nikic commented Jul 6, 2022

I would probably blame this one a LoopDeletion/SROA interaction. LoopDeletion leaves behind this code:

nnamed_addr #1 personality ptr @rust_eh_personality {
start:
  %data = alloca [2 x i8], align 2
  store i16 %0, ptr %data, align 2
  %1 = getelementptr inbounds i8, ptr %data, i64 1
  tail call void @llvm.experimental.noalias.scope.decl(metadata !19) #6
  tail call void @llvm.experimental.noalias.scope.decl(metadata !22) #6
  br label %bb6.i.i

bb6.i.i:                                          ; preds = %start
  %iter.sroa.0.07.i.i = phi i64 [ 0, %start ]
  %_40.i.i = sub nsw i64 0, %iter.sroa.0.07.i.i
  %2 = add nuw nsw i64 %iter.sroa.0.07.i.i, 1
  %_34.i.i = getelementptr inbounds [0 x i8], ptr %data, i64 0, i64 %iter.sroa.0.07.i.i
  %_39.i.i = getelementptr inbounds [0 x i8], ptr %1, i64 0, i64 %_40.i.i
  tail call void @llvm.experimental.noalias.scope.decl(metadata !24) #6
  tail call void @llvm.experimental.noalias.scope.decl(metadata !27) #6
  tail call void @llvm.experimental.noalias.scope.decl(metadata !29) #6
  tail call void @llvm.experimental.noalias.scope.decl(metadata !32) #6
  %tmp.0.copyload.i.i.i.i = load i8, ptr %_34.i.i, align 1, !alias.scope !34, !noalias !37
  %tmp2.0.copyload.i.i.i.i = load i8, ptr %_39.i.i, align 1, !alias.scope !38, !noalias !39
  store i8 %tmp2.0.copyload.i.i.i.i, ptr %_34.i.i, align 1, !alias.scope !34, !noalias !37
  store i8 %tmp.0.copyload.i.i.i.i, ptr %_39.i.i, align 1, !alias.scope !38, !noalias !39
  br label %"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$7reverse17h78a536b0d0fa8d9eE.exit"

"_ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$7reverse17h78a536b0d0fa8d9eE.exit": ; preds = %bb6.i.i
  %.sroa.0.0.copyload = load i16, ptr %data, align 2
  ret i16 %.sroa.0.0.copyload
}

And SROA runs directly afterwards, before the phi has been folded, so we don't know that the GEPs actually access constant offsets.

Possibly the backedge-breaking in LoopDeletion should also perform some folding, or we need a phase ordering change.

@nikic
Copy link
Contributor

nikic commented Jul 6, 2022

Or possibly IndVars should already be replacing the PHI. Just remembered that we have the replaceLoopPHINodesWithPreheaderValues() transform, which seems like it should be taking care of this.

@nikic nikic self-assigned this Jul 6, 2022
@nikic
Copy link
Contributor

nikic commented Jul 7, 2022

https://reviews.llvm.org/D129214 and https://reviews.llvm.org/D129293 together should fix this.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Aug 30, 2022
Add test for issue rust-lang#85872

This has been fixed by the LLVM 15 upgrade, add a codegen test.

Fixes rust-lang#85872.
JohnTitor added a commit to JohnTitor/rust that referenced this issue Aug 30, 2022
Add test for issue rust-lang#85872

This has been fixed by the LLVM 15 upgrade, add a codegen test.

Fixes rust-lang#85872.
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 31, 2022
Rollup of 10 pull requests

Successful merges:

 - rust-lang#100804 (Fix search results color on hover for ayu theme)
 - rust-lang#100892 (Add `AsFd` implementations for stdio types on WASI.)
 - rust-lang#100927 (Adding new Fuchsia rustup docs... reworking walkthrough)
 - rust-lang#101088 (Set DebuginfoKind::Pdb in msvc_base)
 - rust-lang#101159 (add tracking issue number to const_slice_split_at_not_mut)
 - rust-lang#101192 (Remove path string)
 - rust-lang#101193 (Avoid zeroing large stack buffers in stdio on Windows)
 - rust-lang#101197 (:arrow_up: rust-analyzer)
 - rust-lang#101200 (Add test for issue rust-lang#85872)
 - rust-lang#101219 (Update books)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors closed this as completed in 5663bb3 Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants