Add stack overflow check for ARM Thumb instruction set. #4

neykov · 2014-02-19T21:03:22Z

Besides the mechanical changes between the ARM and Thumb functions,
because of the different instruction sets, there is difference in how the
stack limit is located. The ARM version uses hardware which isn't available
on the lower-end Thumb processors (namely system co-processor and MMU)
therefore the stack limit is placed at a predefined location in
memory - STACK_LIMIT. It is the responsibility of the wrapping runtime
to manage this location with the correct value. It can vary from a simple
constant defined by the linker to actively managed variable by a RTOS
implementation.

Besides the mechanincal changes between the ARM and Thumb functions, because of the different instruction sets, there is difference in how the stack limit is located. The ARM version uses hardware which isn't available on the lower-end Thumb processors (namely system co-processor and MMU) therefore the stack limit is placed at a predefined location in memory - STACK_LIMIT. It is the responsibility of the wrapping runtime to manage this location with the correct value. It can vary from a simple constant defined by the linker to actively managed variable by a RTOS implementation.

alexcrichton · 2014-02-19T22:36:19Z

This even has tests, amazing!

Out of curiosity, would you be interested in helping us upstream the ARM-related segmented stack patches? All we really need are tests, and otherwise I think that upstream will be receptive to our patches.

I will merge this soon, thank you!

@neykov

This updates the LLVM submodule to the `rust-llvm-2014-02-19` tag which is the old one with rust-lang/llvm#4 cherry-picked on top. Awesome job by @neykov for this!

alexcrichton · 2014-02-19T23:00:45Z

Opened a PR on the rust repo as rust-lang/rust#12407 and landed as b015ecd, thanks!

neykov · 2014-02-20T20:01:04Z

Yes, I can prepare analogous patches for the ARM case.

alexcrichton · 2014-02-20T20:23:29Z

That would be awesome! If you need any help, feel free to ask me questions.

@neykov

This updates the LLVM submodule to the `rust-llvm-2014-02-19` tag which is the old one with rust-lang/llvm#4 cherry-picked on top. Awesome job by @neykov for this!

bharrisau · 2014-02-25T12:25:08Z

The above doesn't work for thumb-1 (ARMV6-M, Cortex-M0). I had to modify the asm to make two changes.

Change MOV with MOVS (can only take an 8-bit constant). And align the STACK_LIMIT with .align 2.
Are the above constraints OK? Don't know if the 8-bit limit is too low?

alexcrichton · 2014-02-25T17:43:53Z

Patches welcome!

neykov · 2014-02-25T19:59:25Z

@bharrisau The code is written specifically to support Thumb1 but I wouldn't be surprised if there are bugs in it. Could you give more details where you did the changes. You modified the assembly, generated by llc to compile it with GCC as? I haven't checked this workflow - will do in the following days.

MOV -> MOVS change - LLVM will generate the correct instruction in machine code. I wil check how it can be fixed in the assembly listing.
.align 2 - LLVM handles the data sections and inserts alignment instructions when necessary.
8-bit limit - I was under the impression that MOVS supports shifting of the constant value (the constants are specifically selected to be a power of two for bigger values). After you pointed it out this seems not to be the case. Will have to research this in more detail.

Thanks for the feedback.

bharrisau · 2014-02-25T20:56:36Z

I ran as emit=asm so I will need to check a disassembly of an object to see
if this applies in that case. I'm still fighting with libstd so just used
the asm to see how it looked.
I built for thumbv6m-linux-eabi with CPU cortex-m0 if you want to try to
replicate. The error I was getting was something about suffix width can't
be guaranteed, was on the 'mov r, #x' instructions. I got the impression
that thumb 1 had to used movs for imm values.

I'm trying to get some of this done at the end of a 12 hr day, so I may be
slow in making any progress.
On 26/02/2014 3:59 am, "Svetoslav" notifications@github.com wrote:

@bharrisau https://github.com/bharrisau The code is written
specifically to support Thumb1 but I wouldn't be surprised if there are
bugs in it. Could you give more details where you did the changes. You
modified the assembly, generated by llc to compile it with GCC as? I
haven't checked this workflow - will do in the following days.

MOV -> MOVS change - LLVM will generate the correct instruction in
machine code. I wil check how it can be fixed in the assembly listing.

.align 2 - LLVM handles the data sections and inserts alignment
instructions when necessary.

8-bit limit - I was under the impression that MOVS supports shifting
of the constant value (the constants are specifically selected to be a
power of two for bigger values). After you pointed it out this seems not to
be the case. Will have to research this in more detail.

Thanks for the feedback.

Reply to this email directly or view it on GitHubhttps://github.com//pull/4#issuecomment-36050972
.

bharrisau · 2014-02-25T20:58:46Z

Pretty sure this will also surface if you compile with -C no-integrated-as

bharrisau · 2014-02-25T22:05:08Z

Emit=obj works fine, the disassembly has the proper movs instructions. So
the issue is with the asm printer or GCC.

bharrisau · 2014-02-26T00:32:02Z

Sorry to spam - I'll check this with a large stack frame tonight. The following suggests that the maximum movs value is 255 http://infocenter.arm.com/help/topic/com.arm.doc.dui0497a/BABHGAJI.html

LLVM might automatically change it into a LDR instruction?

neykov · 2014-02-26T06:28:25Z

Yes, if the value is larger than 255 it will have to be loaded analogously to the STACK_LIMIT address with LDR from the constant pool. At least the call to alignToARMConstant won't be necessary.

@arr

For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207702 91177308-0d34-0410-b5e6-96231b3b80d8

…airs. Store instructions do not modify register values and therefore it's safe to form a store pair even if the source register has been read in between the two store instructions. Previously, the read of w1 (see below) prevented the formation of a stp. str w0, [x2] ldr w8, [x2, #8] add w0, w8, w1 str w1, [x2, #4] ret We now generate the following code. stp w0, w1, [x2] ldr w8, [x2, #8] add w0, w8, w1 ret All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass. Performance results for SPEC2K were within noise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239432 91177308-0d34-0410-b5e6-96231b3b80d8

…ons. Increment/decrement vector by multiple of predicate constraint element count. The variants added by this patch are: - INCH, INCW, INC and (saturating): - SQINCH, SQINCW, SQINCD - UQINCH, UQINCW, UQINCW - SQDECH, SQINCW, SQINCD - UQDECH, UQINCW, UQINCW For example: incw z0.s, all, mul #4 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336090 91177308-0d34-0410-b5e6-96231b3b80d8

r335553 with the non-trivial unswitching of switches. The code correctly updated most aspects of the CFG and analyses, but missed some crucial aspects: 1) When multiple cases have the same successor, we unswitch that a single time and replace the switch with a direct branch. The CFG here is correct, but the target of this direct branch may have had a PHI node with multiple entries in it. 2) When we still have to clone a successor of the switch into an unswitched copy of the loop, we'll delete potentially multiple edges entering this successor, not just one. 3) We also have to delete multiple edges entering the successors in the original loop when they have to be retained. 4) When the "retained successor" *also* occurs as a case successor, we just assert failed everywhere. This doesn't happen very easily because its always valid to simply drop the case -- the retained successor for switches is always the default successor. However, it is likely possible through some contrivance of different loop passes, unrolling, and simplifying for this to occur in practice and certainly there is nothing "invalid" about the IR so this pass needs to handle it. 5) In the case of #4, we also will replace these multiple edges with a direct branch much like in #1 and need to collapse the entries in any PHI nodes to a single enrty. All of this stems from the delightful fact that the same successor can show up in multiple parts of the switch terminator, and each of these are considered a distinct edge for the purpose of PHI nodes (and iterating the successors and predecessors) but not for unswitching itself, the dominator tree, or many other things. For the record, I intensely dislike this "feature" of the IR in large part because of the complexity it causes in passes like this. We already have a ton of logic building sets and handling duplicates, and we just had to add a bunch more. I've added a complex test case that covers all five of the above failure modes. I've also added a variation on it where #4 and #5 occur in loop exit, adding fun where we have an LCSSA PHI node with "multiple entries" despite have dedicated exits. There were no additional issues found by this, but it seems a useful corner case to cover with testing. One thing that working on all of this code has made painfully clear for me as well is how amazingly inefficient our PHI node representation is (in terms of the in-memory data structures and the APIs used to update them). This code has truly marvelous complexity bounds because every time we remove an entry from a PHI node we do a linear scan to find it and then a linear update to the data structure to remove it. We could in theory batch all of the PHI node updates into a single linear walk of the operands making this much more efficient, but the APIs fight hard against this and the fact that we have to handle duplicates in the peculiar manner we do (removing all but one in some cases) makes even implementing that very tedious and annoying. Anyways, none of this is new here or specific to loop unswitching. All code in LLVM that updates PHI node operands suffers from these problems. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336536 91177308-0d34-0410-b5e6-96231b3b80d8

…d VPlan for tests." Memory leaks in tests. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/6289/steps/check-llvm%20asan/logs/stdio Direct leak of 192 byte(s) in 1 object(s) allocated from: #0 0x554ea8 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106 #1 0x56cef1 in llvm::VPlanTestBase::doAnalysis(llvm::Function&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:53:14 #2 0x56bec4 in llvm::VPlanTestBase::buildHCFG(llvm::BasicBlock*) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:57:3 #3 0x571f1e in llvm::(anonymous namespace)::VPlanHCFGTest_testVPInstructionToVPRecipesInner_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp:119:15 #4 0xed2291 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #5 0xed44c8 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #6 0xed5890 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #7 0xef3634 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #8 0xef27e0 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #9 0xebbc23 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #10 0xebbc23 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:51 #11 0x7f65569592e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) and more. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336718 91177308-0d34-0410-b5e6-96231b3b80d8

…ering" This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337079 91177308-0d34-0410-b5e6-96231b3b80d8

@app

Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @app mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339257 91177308-0d34-0410-b5e6-96231b3b80d8

…>> (32 - y) pattern" *Seems* to be breaking sanitizer-x86_64-linux-fast buildbot, the ELF/relocatable-versioned.s test: ==17758==MemorySanitizer CHECK failed: /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 "((kBlockMagic)) == ((((u64*)addr)[0]))" (0x6a6cb03abcebc041, 0x0) #0 0x59716b in MsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/msan/msan.cc:393 rust-lang#1 0x586635 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79 rust-lang#2 0x57d5ff in __sanitizer::InternalFree(void*, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 rust-lang#3 0x7fc21b24193f (/lib/x86_64-linux-gnu/libc.so.6+0x3593f) rust-lang#4 0x7fc21b241999 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x35999) rust-lang#5 0x7fc21b22c2e7 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e7) rust-lang#6 0x57c039 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld+0x57c039) This reverts commit r345014. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345017 91177308-0d34-0410-b5e6-96231b3b80d8

alexcrichton mentioned this pull request Feb 19, 2014

Upgrade LLVM for segmented stacks on thumb rust-lang/rust#12407

Merged

alexcrichton closed this Feb 19, 2014

bharrisau mentioned this pull request Mar 13, 2014

RFC: Add OsUnknown as a target operating system rust-lang/rfcs#7

Closed

bharrisau mentioned this pull request May 5, 2014

Submit llvm patches required to build zinc hackndev/zinc#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stack overflow check for ARM Thumb instruction set. #4

Add stack overflow check for ARM Thumb instruction set. #4

neykov commented Feb 19, 2014

alexcrichton commented Feb 19, 2014

alexcrichton commented Feb 19, 2014

neykov commented Feb 20, 2014

alexcrichton commented Feb 20, 2014

bharrisau commented Feb 25, 2014

alexcrichton commented Feb 25, 2014

neykov commented Feb 25, 2014

bharrisau commented Feb 25, 2014

bharrisau commented Feb 25, 2014

bharrisau commented Feb 25, 2014

bharrisau commented Feb 26, 2014

neykov commented Feb 26, 2014

Add stack overflow check for ARM Thumb instruction set. #4

Add stack overflow check for ARM Thumb instruction set. #4

Conversation

neykov commented Feb 19, 2014

alexcrichton commented Feb 19, 2014

alexcrichton commented Feb 19, 2014

neykov commented Feb 20, 2014

alexcrichton commented Feb 20, 2014

bharrisau commented Feb 25, 2014

alexcrichton commented Feb 25, 2014

neykov commented Feb 25, 2014

bharrisau commented Feb 25, 2014

bharrisau commented Feb 25, 2014

bharrisau commented Feb 25, 2014

bharrisau commented Feb 26, 2014

neykov commented Feb 26, 2014