-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MachineOutliner] Sort by Benefit to Cost Ratio #90264
[MachineOutliner] Sort by Benefit to Cost Ratio #90264
Conversation
5b8fb07
to
f1ddd0b
Compare
f1ddd0b
to
4046e8e
Compare
@llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-backend-aarch64 Author: Xuan Zhang (xuanzh-meta) ChangesThis PR depends on #90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size.
This is part of an enhanced version of machine outliner -- see RFC. Patch is 29.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90264.diff 6 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineOutliner.cpp b/llvm/lib/CodeGen/MachineOutliner.cpp
index 626e577a30bf3..9e2e6316dc6d1 100644
--- a/llvm/lib/CodeGen/MachineOutliner.cpp
+++ b/llvm/lib/CodeGen/MachineOutliner.cpp
@@ -828,10 +828,12 @@ bool MachineOutliner::outline(Module &M,
<< "\n");
bool OutlinedSomething = false;
- // Sort by benefit. The most beneficial functions should be outlined first.
+ // Sort by priority where priority := getNotOutlinedCost / getOutliningCost.
+ // The function with highest priority should be outlined first.
stable_sort(FunctionList,
[](const OutlinedFunction &LHS, const OutlinedFunction &RHS) {
- return LHS.getBenefit() > RHS.getBenefit();
+ return LHS.getNotOutlinedCost() * RHS.getOutliningCost() >
+ RHS.getNotOutlinedCost() * LHS.getOutliningCost();
});
// Walk over each function, outlining them as we go along. Functions are
diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll b/llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll
new file mode 100644
index 0000000000000..00efc3c6e71c8
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll
@@ -0,0 +1,96 @@
+; This tests the order in which functions are outlined in MachineOutliner
+; There are TWO key OutlinedFunction in FunctionList
+;
+; ===================== First One =====================
+; ```
+; mov w0, #1
+; mov w1, #2
+; mov w2, #3
+; mov w3, #4
+; mov w4, #5
+; ```
+; It has:
+; - `SequenceSize=20` and `OccurrenceCount=6`
+; - each Candidate has `CallOverhead=12` and `FrameOverhead=4`
+; - `NotOutlinedCost=20*6=120` and `OutliningCost=12*6+20+4=96`
+; - `Benefit=120-96=24` and `Priority=120/96=1.25`
+;
+; ===================== Second One =====================
+; ```
+; mov w6, #6
+; mov w7, #7
+; b
+; ```
+; It has:
+; - `SequenceSize=12` and `OccurrenceCount=4`
+; - each Candidate has `CallOverhead=4` and `FrameOverhead=0`
+; - `NotOutlinedCost=12*4=48` and `OutliningCost=4*4+12+0=28`
+; - `Benefit=120-96=20` and `Priority=48/28=1.71`
+;
+; Note that the first one has higher benefit, but lower priority.
+; Hence, when outlining per priority, the second one will be outlined first.
+
+; RUN: llc %s -enable-machine-outliner=always -filetype=obj -o %t
+; RUN: llvm-objdump -d %t | FileCheck %s --check-prefix=CHECK-SORT-BY-PRIORITY
+
+; RUN: llc %s -enable-machine-outliner=always -outliner-benefit-threshold=22 -filetype=obj -o %t
+; RUN: llvm-objdump -d %t | FileCheck %s --check-prefix=CHECK-THRESHOLD
+
+
+target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
+target triple = "arm64-apple-macosx14.0.0"
+
+declare i32 @_Z3fooiiii(i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef, i32 noundef)
+
+define i32 @_Z2f1v() minsize {
+ %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 11, i32 noundef 6, i32 noundef 7)
+ ret i32 %1
+}
+
+define i32 @_Z2f2v() minsize {
+ %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 12, i32 noundef 6, i32 noundef 7)
+ ret i32 %1
+}
+
+define i32 @_Z2f3v() minsize {
+ %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 13, i32 noundef 6, i32 noundef 7)
+ ret i32 %1
+}
+
+define i32 @_Z2f4v() minsize {
+ %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 14, i32 noundef 6, i32 noundef 7)
+ ret i32 %1
+}
+
+define i32 @_Z2f5v() minsize {
+ %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 15, i32 noundef 8, i32 noundef 9)
+ ret i32 %1
+}
+
+define i32 @_Z2f6v() minsize {
+ %1 = tail call i32 @_Z3fooiiii(i32 noundef 1, i32 noundef 2, i32 noundef 3, i32 noundef 4, i32 noundef 5, i32 noundef 16, i32 noundef 9, i32 noundef 8)
+ ret i32 %1
+}
+
+; CHECK-SORT-BY-PRIORITY: <_OUTLINED_FUNCTION_0>:
+; CHECK-SORT-BY-PRIORITY-NEXT: mov w6, #0x6
+; CHECK-SORT-BY-PRIORITY-NEXT: mov w7, #0x7
+; CHECK-SORT-BY-PRIORITY-NEXT: b
+
+; CHECK-SORT-BY-PRIORITY: <_OUTLINED_FUNCTION_1>:
+; CHECK-SORT-BY-PRIORITY-NEXT: mov w0, #0x1
+; CHECK-SORT-BY-PRIORITY-NEXT: mov w1, #0x2
+; CHECK-SORT-BY-PRIORITY-NEXT: mov w2, #0x3
+; CHECK-SORT-BY-PRIORITY-NEXT: mov w3, #0x4
+; CHECK-SORT-BY-PRIORITY-NEXT: mov w4, #0x5
+; CHECK-SORT-BY-PRIORITY-NEXT: ret
+
+; CHECK-THRESHOLD: <_OUTLINED_FUNCTION_0>:
+; CHECK-THRESHOLD-NEXT: mov w0, #0x1
+; CHECK-THRESHOLD-NEXT: mov w1, #0x2
+; CHECK-THRESHOLD-NEXT: mov w2, #0x3
+; CHECK-THRESHOLD-NEXT: mov w3, #0x4
+; CHECK-THRESHOLD-NEXT: mov w4, #0x5
+; CHECK-THRESHOLD-NEXT: ret
+
+; CHECK-THRESHOLD-NOT: <_OUTLINED_FUNCTION_1>:
diff --git a/llvm/test/CodeGen/ARM/machine-outliner-calls.mir b/llvm/test/CodeGen/ARM/machine-outliner-calls.mir
index a92c9dd28be5a..7634ecd6e863a 100644
--- a/llvm/test/CodeGen/ARM/machine-outliner-calls.mir
+++ b/llvm/test/CodeGen/ARM/machine-outliner-calls.mir
@@ -26,15 +26,15 @@ body: |
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_2
; CHECK: bb.1:
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_2
; CHECK: bb.2:
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_2
; CHECK: bb.3:
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_2
; CHECK: bb.4:
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_2
; CHECK: bb.5:
; CHECK: $sp = frame-destroy LDMIA_UPD $sp, 14 /* CC::al */, $noreg, def $r4, def $lr
; CHECK: BX_RET 14 /* CC::al */, $noreg
@@ -139,13 +139,13 @@ body: |
; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8
- ; CHECK: BL @OUTLINED_FUNCTION_1
+ ; CHECK: BL @OUTLINED_FUNCTION_0
; CHECK: bb.1:
- ; CHECK: BL @OUTLINED_FUNCTION_1
+ ; CHECK: BL @OUTLINED_FUNCTION_0
; CHECK: bb.2:
- ; CHECK: BL @OUTLINED_FUNCTION_1
+ ; CHECK: BL @OUTLINED_FUNCTION_0
; CHECK: bb.3:
- ; CHECK: BL @OUTLINED_FUNCTION_1
+ ; CHECK: BL @OUTLINED_FUNCTION_0
; CHECK: bb.4:
; CHECK: $sp = frame-destroy LDMIA_UPD $sp, 14 /* CC::al */, $noreg, def $r4, def $lr
; CHECK: BX_RET 14 /* CC::al */, $noreg
@@ -245,19 +245,19 @@ body: |
; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -4
; CHECK: frame-setup CFI_INSTRUCTION offset $r4, -8
; CHECK: BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
- ; CHECK: BL @OUTLINED_FUNCTION_2
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: bb.1:
; CHECK: BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
- ; CHECK: BL @OUTLINED_FUNCTION_2
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: bb.2:
; CHECK: BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
- ; CHECK: BL @OUTLINED_FUNCTION_2
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: bb.3:
; CHECK: BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
- ; CHECK: BL @OUTLINED_FUNCTION_2
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: bb.4:
; CHECK: BL @"\01mcount", csr_aapcs, implicit-def dead $lr, implicit $sp
- ; CHECK: BL @OUTLINED_FUNCTION_2
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: bb.5:
; CHECK: $sp = frame-destroy LDMIA_UPD $sp, 14 /* CC::al */, $noreg, def $r4, def $lr
; CHECK: BX_RET 14 /* CC::al */, $noreg
@@ -307,38 +307,17 @@ body: |
bb.0:
BX_RET 14, $noreg
-
; CHECK-LABEL: name: OUTLINED_FUNCTION_0
; CHECK: bb.0:
- ; CHECK: liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
- ; CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
- ; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -8
- ; CHECK: BL @bar, implicit-def dead $lr, implicit $sp
- ; CHECK: $r0 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $r1 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $r2 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $r3 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $r4 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
- ; CHECK: MOVPCLR 14 /* CC::al */, $noreg
-
- ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
- ; CHECK: bb.0:
- ; CHECK: liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
- ; CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
- ; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -8
- ; CHECK: BL @bar, implicit-def dead $lr, implicit $sp
+ ; CHECK: liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
; CHECK: $r0 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r1 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r2 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r3 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r4 = MOVi 2, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
; CHECK: TAILJMPd @bar, implicit $sp
- ; CHECK-LABEL: name: OUTLINED_FUNCTION_2
+ ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
; CHECK: bb.0:
; CHECK: liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
; CHECK: $r0 = MOVi 3, 14 /* CC::al */, $noreg, $noreg
@@ -348,31 +327,28 @@ body: |
; CHECK: $r4 = MOVi 3, 14 /* CC::al */, $noreg, $noreg
; CHECK: MOVPCLR 14 /* CC::al */, $noreg
+ ; CHECK-LABEL: name: OUTLINED_FUNCTION_2
+ ; CHECK: bb.0:
+ ; CHECK: liveins: $r11, $r10, $r9, $r8, $r7, $r6, $r5, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
+ ; CHECK: $r0 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: $r1 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: $r2 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: $r3 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: $r4 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: MOVPCLR 14 /* CC::al */, $noreg
+
; CHECK-LABEL: name: OUTLINED_FUNCTION_3
; CHECK: bb.0:
- ; CHECK: liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
- ; CHECK: early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
- ; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -8
- ; CHECK: tBL 14 /* CC::al */, $noreg, @bar, implicit-def dead $lr, implicit $sp
+ ; CHECK: liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
; CHECK: $r0 = t2MOVi 2, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r1 = t2MOVi 2, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r2 = t2MOVi 2, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
; CHECK: tTAILJMPdND @bar, 14 /* CC::al */, $noreg, implicit $sp
; CHECK-LABEL: name: OUTLINED_FUNCTION_4
; CHECK: bb.0:
- ; CHECK: liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8, $lr
- ; CHECK: early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: frame-setup CFI_INSTRUCTION def_cfa_offset 8
- ; CHECK: frame-setup CFI_INSTRUCTION offset $lr, -8
- ; CHECK: tBL 14 /* CC::al */, $noreg, @bar, implicit-def dead $lr, implicit $sp
+ ; CHECK: liveins: $r11, $r10, $r9, $r8, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
; CHECK: $r0 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r1 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r2 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
; CHECK: tBX_RET 14 /* CC::al */, $noreg
-
-
-
diff --git a/llvm/test/CodeGen/ARM/machine-outliner-default.mir b/llvm/test/CodeGen/ARM/machine-outliner-default.mir
index 6d0218dbfe636..de2b8f5596976 100644
--- a/llvm/test/CodeGen/ARM/machine-outliner-default.mir
+++ b/llvm/test/CodeGen/ARM/machine-outliner-default.mir
@@ -19,17 +19,17 @@ body: |
; CHECK: bb.0:
; CHECK: liveins: $lr
; CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
; CHECK: bb.1:
; CHECK: liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
; CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
; CHECK: bb.2:
; CHECK: liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
; CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: BL @OUTLINED_FUNCTION_0
+ ; CHECK: BL @OUTLINED_FUNCTION_1
; CHECK: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
; CHECK: bb.3:
; CHECK: liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
@@ -73,17 +73,17 @@ body: |
; CHECK: bb.0:
; CHECK: liveins: $lr
; CHECK: early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_1
+ ; CHECK: tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_0
; CHECK: $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
; CHECK: bb.1:
; CHECK: liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
; CHECK: early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_1
+ ; CHECK: tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_0
; CHECK: $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
; CHECK: bb.2:
; CHECK: liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
; CHECK: early-clobber $sp = frame-setup t2STR_PRE killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ; CHECK: tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_1
+ ; CHECK: tBL 14 /* CC::al */, $noreg, @OUTLINED_FUNCTION_0
; CHECK: $lr, $sp = frame-destroy t2LDR_POST $sp, 8, 14 /* CC::al */, $noreg
; CHECK: bb.3:
; CHECK: liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
@@ -114,6 +114,15 @@ body: |
; CHECK-LABEL: name: OUTLINED_FUNCTION_0
; CHECK: bb.0:
+ ; CHECK: liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
+ ; CHECK: $r0 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: $r1 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: $r2 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: $r3 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK: tBX_RET 14 /* CC::al */, $noreg
+
+ ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
+ ; CHECK: bb.0:
; CHECK: liveins: $lr, $r6, $r7, $r8, $r9, $r10, $r11
; CHECK: $r0 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r1 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
@@ -122,15 +131,3 @@ body: |
; CHECK: $r4 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
; CHECK: $r5 = MOVi 1, 14 /* CC::al */, $noreg, $noreg
; CHECK: MOVPCLR 14 /* CC::al */, $noreg
-
- ; CHECK-LABEL: name: OUTLINED_FUNCTION_1
- ; CHECK: bb.0:
- ; CHECK: liveins: $lr, $r4, $r5, $r6, $r7, $r8, $r9, $r10, $r11
- ; CHECK: $r0 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $r1 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $r2 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: $r3 = t2MOVi 1, 14 /* CC::al */, $noreg, $noreg
- ; CHECK: tBX_RET 14 /* CC::al */, $noreg
-
-
-
diff --git a/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir b/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir
index ae5caa5b7c06d..e71edc8ceb3f6 100644
--- a/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir
+++ b/llvm/test/CodeGen/ARM/machine-outliner-stack-fixup-arm.mir
@@ -18,6 +18,7 @@ body: |
liveins: $r0
; CHECK-LABEL: name: CheckAddrMode_i12
; CHECK: $r1 = MOVr killed $r0, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I12:[0-9]+]]
; CHECK-NEXT: $r6 = LDRi12 $sp, 4088, 14 /* CC::al */, $noreg
$r1 = MOVr killed $r0, 14, $noreg, $noreg
@@ -47,6 +48,7 @@ body: |
liveins: $r1
; CHECK-LABEL: name: CheckAddrMode3
; CHECK: $r0 = MOVr killed $r1, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I3:[0-9]+]]
; CHECK-NEXT: $r6 = LDRSH $sp, $noreg, 248, 14 /* CC::al */, $noreg
$r0 = MOVr killed $r1, 14, $noreg, $noreg
@@ -76,6 +78,7 @@ body: |
liveins: $r2
; CHECK-LABEL: name: CheckAddrMode5
; CHECK: $r0 = MOVr killed $r2, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I5:[0-9]+]]
; CHECK-NEXT: $d5 = VLDRD $sp, 254, 14 /* CC::al */, $noreg
$r0 = MOVr killed $r2, 14, $noreg, $noreg
@@ -110,6 +113,7 @@ body: |
liveins: $r3
; CHECK-LABEL: name: CheckAddrMode5FP16
; CHECK: $r0 = MOVr killed $r3, 14 /* CC::al */, $noreg, $noreg
+ ; CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
; CHECK-NEXT: BL @OUTLINED_FUNCTION_[[I5FP16:[0-9]+]]
; CHECK-NEXT: $s6 = VLDRH $sp, 252, 14, $noreg
$r0 = MOVr killed $r3, 14, $noreg, $noreg
@@ -146,41 +150,29 @@ body: |
BX_RET 14, $noreg
;CHECK: name: OUTLINED_FUNCTION_[[I5]]
- ;CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ;CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 8
- ;CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $lr, -8
- ;CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
- ;CHECK-NEXT: $d0 = VLDRD $sp, 2, 14 /* CC::al */, $noreg
- ;CHECK-NEXT: $d1 = VLDRD $sp, 10, 14 /* CC::al */, $noreg
- ;CHECK-NEXT: $d4 = VLDRD $sp, 255, 14 /* CC::al */, $noreg
- ;CHECK-NEXT: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
+ ;CHECK: liveins: $r10, $r9, $r8, $r7, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
+ ;CHECK: $d0 = VLDRD $sp, 0, 14 /* CC::al */, $noreg
+ ;CHECK-NEXT: $d1 = VLDRD $sp, 8, 14 /* CC::al */, $noreg
+ ;CHECK-NEXT: $d4 = VLDRD $sp, 253, 14 /* CC::al */, $noreg
+ ;CHECK-NEXT: BX_RET 14 /* CC::al */, $noreg
;CHECK: name: OUTLINED_FUNCTION_[[I5FP16]]
- ;CHECK: early-clobber $sp = frame-setup STR_PRE_IMM killed $lr, $sp, -8, 14 /* CC::al */, $noreg
- ;CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 8
- ;CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $lr, -8
- ;CHECK-NEXT: BL @foo, implicit-def dead $lr, implicit $sp
- ;CHECK-NEXT: $s1 = VLDRH $sp, 4, 14, $noreg
- ;CHECK-NEXT: $s2 = VLDRH $sp, 12, 14, $noreg
- ;CHECK-NEXT: $s5 = VLDRH $sp, 244, 14, $noreg
- ;CHECK-NEXT: $lr, $sp = frame-destroy LDR_POST_IMM $sp, $noreg, 8, 14 /* CC::al */, $noreg
+ ;CHECK: liveins: $r10, $r9, $r8, $r7, $r6, $r5, $r4, $d15, $d14, $d13, $d12, $d11, $d10, $d9, $d8
+ ;CHECK: $s1 = VLDRH $sp, 0, 14, $noreg
+ ;CHECK-NEXT: $s2 = VLDRH $...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a MIR test to clearly demonstrate the actual instruction sequences? This would help ensure the implementation aligns with our expectations with regard to priority computation.
llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.mir
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AArch64/machine-outliner-sort-per-priority.mir
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but please wait a bit for others' comments if any.
This PR depends on #90264 In the current implementation, only leaf children of each internal node in the suffix tree are included as candidates for outlining. But all leaf descendants are outlining candidates, which we include in the new implementation. This is enabled on a flag `outliner-leaf-descendants` which is default to be true. The reason for _enabling this on a flag_ is because machine outliner is not the only pass that uses suffix tree. The reason for _having this default to be true_ is because including all leaf descendants show consistent size win. * For Clang/LLD, it shows around 3% reduction in text segment size when compared to the baseline `-Oz` linker binary. * For selected benchmark tests in LLVM test suite | run (CTMark/) | only leaf children | all leaf descendants | reduction % | |------------------|--------------------|----------------------|-------------| | lencod | 349624 | 348564 | -0.2004% | | SPASS | 219672 | 218440 | -0.4738% | | kc | 271956 | 250068 | -0.4506% | | sqlite3 | 223920 | 222484 | -0.5471% | | 7zip-benchmark | 405364 | 401244 | -0.3428% | | bullet | 139820 | 138340 | -0.8315% | | consumer-typeset | 295684 | 286628 | -1.2295% | | pairlocalalign | 72236 | 71936 | -0.2164% | | tramp3d-v4 | 189572 | 183676 | -2.9668% | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
This PR depends on llvm#90264 In the current implementation, only leaf children of each internal node in the suffix tree are included as candidates for outlining. But all leaf descendants are outlining candidates, which we include in the new implementation. This is enabled on a flag `outliner-leaf-descendants` which is default to be true. The reason for _enabling this on a flag_ is because machine outliner is not the only pass that uses suffix tree. The reason for _having this default to be true_ is because including all leaf descendants show consistent size win. * For Clang/LLD, it shows around 3% reduction in text segment size when compared to the baseline `-Oz` linker binary. * For selected benchmark tests in LLVM test suite | run (CTMark/) | only leaf children | all leaf descendants | reduction % | |------------------|--------------------|----------------------|-------------| | lencod | 349624 | 348564 | -0.2004% | | SPASS | 219672 | 218440 | -0.4738% | | kc | 271956 | 250068 | -0.4506% | | sqlite3 | 223920 | 222484 | -0.5471% | | 7zip-benchmark | 405364 | 401244 | -0.3428% | | bullet | 139820 | 138340 | -0.8315% | | consumer-typeset | 295684 | 286628 | -1.2295% | | pairlocalalign | 72236 | 71936 | -0.2164% | | tramp3d-v4 | 189572 | 183676 | -2.9668% | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
This PR depends on #90260
We changed the order in which functions are outlined in Machine Outliner.
The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size.
This is part of an enhanced version of machine outliner -- see RFC.