-
Notifications
You must be signed in to change notification settings - Fork 63
Add stack overflow check for ARM Thumb instruction set. #4
Add stack overflow check for ARM Thumb instruction set. #4
Conversation
Besides the mechanincal changes between the ARM and Thumb functions, because of the different instruction sets, there is difference in how the stack limit is located. The ARM version uses hardware which isn't available on the lower-end Thumb processors (namely system co-processor and MMU) therefore the stack limit is placed at a predefined location in memory - STACK_LIMIT. It is the responsibility of the wrapping runtime to manage this location with the correct value. It can vary from a simple constant defined by the linker to actively managed variable by a RTOS implementation.
This even has tests, amazing! Out of curiosity, would you be interested in helping us upstream the ARM-related segmented stack patches? All we really need are tests, and otherwise I think that upstream will be receptive to our patches. I will merge this soon, thank you! |
This updates the LLVM submodule to the `rust-llvm-2014-02-19` tag which is the old one with rust-lang/llvm#4 cherry-picked on top. Awesome job by @neykov for this!
Opened a PR on the rust repo as rust-lang/rust#12407 and landed as b015ecd, thanks! |
Yes, I can prepare analogous patches for the ARM case. |
That would be awesome! If you need any help, feel free to ask me questions. |
This updates the LLVM submodule to the `rust-llvm-2014-02-19` tag which is the old one with rust-lang/llvm#4 cherry-picked on top. Awesome job by @neykov for this!
The above doesn't work for thumb-1 (ARMV6-M, Cortex-M0). I had to modify the asm to make two changes. Change MOV with MOVS (can only take an 8-bit constant). And align the STACK_LIMIT with .align 2. |
Patches welcome! |
@bharrisau The code is written specifically to support Thumb1 but I wouldn't be surprised if there are bugs in it. Could you give more details where you did the changes. You modified the assembly, generated by llc to compile it with GCC as? I haven't checked this workflow - will do in the following days.
Thanks for the feedback. |
I ran as emit=asm so I will need to check a disassembly of an object to see I'm trying to get some of this done at the end of a 12 hr day, so I may be
|
Pretty sure this will also surface if you compile with -C no-integrated-as |
Emit=obj works fine, the disassembly has the proper movs instructions. So |
Sorry to spam - I'll check this with a large stack frame tonight. The following suggests that the maximum movs value is 255 http://infocenter.arm.com/help/topic/com.arm.doc.dui0497a/BABHGAJI.html LLVM might automatically change it into a LDR instruction? |
Yes, if the value is larger than 255 it will have to be loaded analogously to the STACK_LIMIT address with LDR from the constant pool. At least the call to alignToARMConstant won't be necessary. |
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207702 91177308-0d34-0410-b5e6-96231b3b80d8
…airs. Store instructions do not modify register values and therefore it's safe to form a store pair even if the source register has been read in between the two store instructions. Previously, the read of w1 (see below) prevented the formation of a stp. str w0, [x2] ldr w8, [x2, #8] add w0, w8, w1 str w1, [x2, #4] ret We now generate the following code. stp w0, w1, [x2] ldr w8, [x2, #8] add w0, w8, w1 ret All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass. Performance results for SPEC2K were within noise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239432 91177308-0d34-0410-b5e6-96231b3b80d8
…ons. Increment/decrement vector by multiple of predicate constraint element count. The variants added by this patch are: - INCH, INCW, INC and (saturating): - SQINCH, SQINCW, SQINCD - UQINCH, UQINCW, UQINCW - SQDECH, SQINCW, SQINCD - UQDECH, UQINCW, UQINCW For example: incw z0.s, all, mul #4 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336090 91177308-0d34-0410-b5e6-96231b3b80d8
r335553 with the non-trivial unswitching of switches. The code correctly updated most aspects of the CFG and analyses, but missed some crucial aspects: 1) When multiple cases have the same successor, we unswitch that a single time and replace the switch with a direct branch. The CFG here is correct, but the target of this direct branch may have had a PHI node with multiple entries in it. 2) When we still have to clone a successor of the switch into an unswitched copy of the loop, we'll delete potentially multiple edges entering this successor, not just one. 3) We also have to delete multiple edges entering the successors in the original loop when they have to be retained. 4) When the "retained successor" *also* occurs as a case successor, we just assert failed everywhere. This doesn't happen very easily because its always valid to simply drop the case -- the retained successor for switches is always the default successor. However, it is likely possible through some contrivance of different loop passes, unrolling, and simplifying for this to occur in practice and certainly there is nothing "invalid" about the IR so this pass needs to handle it. 5) In the case of #4, we also will replace these multiple edges with a direct branch much like in #1 and need to collapse the entries in any PHI nodes to a single enrty. All of this stems from the delightful fact that the same successor can show up in multiple parts of the switch terminator, and each of these are considered a distinct edge for the purpose of PHI nodes (and iterating the successors and predecessors) but not for unswitching itself, the dominator tree, or many other things. For the record, I intensely dislike this "feature" of the IR in large part because of the complexity it causes in passes like this. We already have a ton of logic building sets and handling duplicates, and we just had to add a bunch more. I've added a complex test case that covers all five of the above failure modes. I've also added a variation on it where #4 and #5 occur in loop exit, adding fun where we have an LCSSA PHI node with "multiple entries" despite have dedicated exits. There were no additional issues found by this, but it seems a useful corner case to cover with testing. One thing that working on all of this code has made painfully clear for me as well is how amazingly inefficient our PHI node representation is (in terms of the in-memory data structures and the APIs used to update them). This code has truly marvelous complexity bounds because every time we remove an entry from a PHI node we do a linear scan to find it and then a linear update to the data structure to remove it. We could in theory batch all of the PHI node updates into a single linear walk of the operands making this much more efficient, but the APIs fight hard against this and the fact that we have to handle duplicates in the peculiar manner we do (removing all but one in some cases) makes even implementing that very tedious and annoying. Anyways, none of this is new here or specific to loop unswitching. All code in LLVM that updates PHI node operands suffers from these problems. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336536 91177308-0d34-0410-b5e6-96231b3b80d8
…d VPlan for tests." Memory leaks in tests. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/6289/steps/check-llvm%20asan/logs/stdio Direct leak of 192 byte(s) in 1 object(s) allocated from: #0 0x554ea8 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106 #1 0x56cef1 in llvm::VPlanTestBase::doAnalysis(llvm::Function&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:53:14 #2 0x56bec4 in llvm::VPlanTestBase::buildHCFG(llvm::BasicBlock*) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:57:3 #3 0x571f1e in llvm::(anonymous namespace)::VPlanHCFGTest_testVPInstructionToVPRecipesInner_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp:119:15 #4 0xed2291 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #5 0xed44c8 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #6 0xed5890 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #7 0xef3634 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #8 0xef27e0 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #9 0xebbc23 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #10 0xebbc23 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:51 #11 0x7f65569592e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) and more. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336718 91177308-0d34-0410-b5e6-96231b3b80d8
…ering" This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337079 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @app mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339257 91177308-0d34-0410-b5e6-96231b3b80d8
…>> (32 - y) pattern" *Seems* to be breaking sanitizer-x86_64-linux-fast buildbot, the ELF/relocatable-versioned.s test: ==17758==MemorySanitizer CHECK failed: /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 "((kBlockMagic)) == ((((u64*)addr)[0]))" (0x6a6cb03abcebc041, 0x0) #0 0x59716b in MsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/msan/msan.cc:393 rust-lang#1 0x586635 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79 rust-lang#2 0x57d5ff in __sanitizer::InternalFree(void*, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 rust-lang#3 0x7fc21b24193f (/lib/x86_64-linux-gnu/libc.so.6+0x3593f) rust-lang#4 0x7fc21b241999 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x35999) rust-lang#5 0x7fc21b22c2e7 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e7) rust-lang#6 0x57c039 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld+0x57c039) This reverts commit r345014. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345017 91177308-0d34-0410-b5e6-96231b3b80d8
Besides the mechanical changes between the ARM and Thumb functions,
because of the different instruction sets, there is difference in how the
stack limit is located. The ARM version uses hardware which isn't available
on the lower-end Thumb processors (namely system co-processor and MMU)
therefore the stack limit is placed at a predefined location in
memory - STACK_LIMIT. It is the responsibility of the wrapping runtime
to manage this location with the correct value. It can vary from a simple
constant defined by the linker to actively managed variable by a RTOS
implementation.