Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coroutines + templates + "-fsanitize=undefined" = error in backend: Cannot represent a difference across sections (clang++ 10, 11, 12, trunk) #49689

Closed
kbr- mannequin opened this issue May 14, 2021 · 45 comments
Labels
bugzilla Issues migrated from bugzilla clang:codegen compiler-rt:ubsan Undefined behavior sanitizer confirmed Verified by a second party

Comments

@kbr-
Copy link
Mannequin

kbr- mannequin commented May 14, 2021

Bugzilla Link 50345
Version unspecified
OS Linux
Blocks #51489
Attachments Code that crashes the compiler
CC @avikivity,@bhalevy,@lxfind,@pcc,@rjmccall,@tstellar,@yuanfang-chen

Extended Description

The attached file causes clang++ to crash given certain compilation flags.
also godbolt: https://godbolt.org/z/j14YMxj5M

$ clang++ --version
clang version 10.0.1 (Fedora 10.0.1-3.fc32)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$ clang++ coro-crash.cpp -std=c++20 -stdlib=libc++ -fsanitize=undefined
fatal error: error in backend: Cannot represent a difference across sections
Stack dump:
0. Program arguments: /usr/bin/clang++ -std=c++20 -fsanitize=undefined -fcolor-diagnostics -stdlib=libc++ -c -o coro-crash.o coro-crash.cpp
1. parser at end of file
2. Code generation
/lib64/libLLVM-10.so(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x2e)[0x7f60b555e10e]
/lib64/libLLVM-10.so(_ZN4llvm3sys17RunSignalHandlersEv+0x34)[0x7f60b555c3e4]
/lib64/libLLVM-10.so(_ZN4llvm20CrashRecoveryContext10HandleExitEi+0x78)[0x7f60b549b768]
/lib64/libLLVM-10.so(_ZN4llvm3sys7Process4ExitEi+0x1b)[0x7f60b5556ffb]
/usr/bin/clang++(+0x16372)[0x55988c5f0372]
/lib64/libLLVM-10.so(_ZN4llvm18report_fatal_errorERKNS_5TwineEb+0x8b)[0x7f60b54a702b]
/lib64/libLLVM-10.so(+0x1bc7623)[0x7f60b66db623]
/lib64/libLLVM-10.so(+0x1ba83c6)[0x7f60b66bc3c6]
/lib64/libLLVM-10.so(_ZN4llvm11MCAssembler11handleFixupERKNS_11MCAsmLayoutERNS_10MCFragmentERKNS_7MCFixupE+0xf2)[0x7f60b66d2ed2]
/lib64/libLLVM-10.so(_ZN4llvm11MCAssembler6layoutERNS_11MCAsmLayoutE+0x281)[0x7f60b66d40b1]
/lib64/libLLVM-10.so(_ZN4llvm11MCAssembler6FinishEv+0x3d)[0x7f60b66d42ad]
/lib64/libLLVM-10.so(_ZN4llvm10AsmPrinter14doFinalizationERNS_6ModuleE+0x731)[0x7f60b5c64b11]
/lib64/libLLVM-10.so(_ZN4llvm13FPPassManager14doFinalizationERNS_6ModuleE+0x65)[0x7f60b5658895]
/lib64/libLLVM-10.so(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x460)[0x7f60b5664220]
/lib64/libclang-cpp.so.10(+0x16123e1)[0x7f60baea73e1]
/lib64/libclang-cpp.so.10(_ZN5clang17EmitBackendOutputERNS_17DiagnosticsEngineERKNS_19HeaderSearchOptionsERKNS_14CodeGenOptionsERKNS_13TargetOptionsERKNS_11LangOptionsERKN4llvm10DataLayoutEPNSE_6ModuleENS_13BackendActionESt10unique_ptrINSE_17raw_pwrite_streamESt14default_deleteISM_EE+0x2fe)[0x7f60baea805e]
/lib64/libclang-cpp.so.10(+0x190fb39)[0x7f60bb1a4b39]
/lib64/libclang-cpp.so.10(_ZN5clang8ParseASTERNS_4SemaEbb+0x499)[0x7f60ba208c09]
/lib64/libclang-cpp.so.10(_ZN5clang14FrontendAction7ExecuteEv+0xb9)[0x7f60bb83c139]
/lib64/libclang-cpp.so.10(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x1cd)[0x7f60bb7f827d]
/lib64/libclang-cpp.so.10(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x95c)[0x7f60bb8b212c]
/usr/bin/clang++(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0x6c0)[0x55988c5f1440]
/usr/bin/clang++(+0x15a94)[0x55988c5efa94]
/lib64/libclang-cpp.so.10(+0x1c8cec9)[0x7f60bb521ec9]
/lib64/libLLVM-10.so(_ZN4llvm20CrashRecoveryContext9RunSafelyENS_12function_refIFvvEEE+0x27)[0x7f60b549b5e7]
/lib64/libclang-cpp.so.10(_ZNK5clang6driver10CC1Command7ExecuteEN4llvm8ArrayRefINS2_8OptionalINS2_9StringRefEEEEEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPb+0x157)[0x7f60bb524537]
/lib64/libclang-cpp.so.10(ZNK5clang6driver11Compilation14ExecuteCommandERKNS0_7CommandERPS3+0x9d)[0x7f60bb4f8bed]
/lib64/libclang-cpp.so.10(_ZNK5clang6driver11Compilation11ExecuteJobsERKNS0_7JobListERN4llvm15SmallVectorImplISt4pairIiPKNS0_7CommandEEEE+0x86)[0x7f60bb4f9156]
/lib64/libclang-cpp.so.10(_ZN5clang6driver6Driver18ExecuteCompilationERNS0_11CompilationERN4llvm15SmallVectorImplISt4pairIiPKNS0_7CommandEEEE+0x93)[0x7f60bb501693]
/usr/bin/clang++(main+0x14a6)[0x55988c5ebb86]
/lib64/libc.so.6(__libc_start_main+0xf2)[0x7f60b4766042]
/usr/bin/clang++(_start+0x2e)[0x55988c5ecc1e]

also checked through godbolt: crashes in clang 11 and clang trunk, clang 12 doesn't crash but prints errors.

None of these commands crash:
clang++ coro-crash.cpp -std=c++20 -stdlib=libc++ -fsanitize=undefined -emit-llvm -Xclang -disable-llvm-passes -c
clang++ coro-crash.cpp -std=c++20 -stdlib=libc++ -fsanitize=undefined -emit-llvm -Xclang -c

by https://llvm.org/docs/HowToSubmitABug.html#crashing-bugs I deduced that it's a backend bug (as if it wasn't obvious by the error message and the "Code generation" part in the dump).

However, after generating coro_crash.bc using:
clang++ coro-crash.cpp -std=c++20 -stdlib=libc++ -fsanitize=undefined -emit-llvm -c -o coro_crash.bc
none of these commands crash either:
llc coro_crash.bc
llc coro_crash.bc -relocation-model=pic
llc coro_crash.bc -relocation-model=static
Hence https://llvm.org/docs/HowToSubmitABug.html#backend-code-generator-bugs claims that I should follow instructions for front-end bugs, but every other evidence points to a backend bug, so dunno. Maybe it's a mix of bugs, frontent/middle passing incorrect input to backend or whatever.

Anyway, bugpoint doesn't want to cooperate with me either:
bugpoint -run-llc coro_crash.bc
Read input file : 'coro_crash.bc'
*** All input ok
Initializing execution environment: Found llc: /usr/bin/llc
Sorry, I can't automatically select a safe interpreter!

Exiting.

So I guess that's all what I can give in this bug report.

@yuanfang-chen
Copy link
Collaborator

This is due to

  1. -fsanitize=function instrumentation at function entry by using function prologue (https://llvm.org/docs/LangRef.html#prologue-data)
  2. coro-splitting
  3. _Z4callIiE4taskv ramp function is in a COMDAT due to using template hence in a different section than .resume/.destroy/.cleanup. Using -ffunction-section instead of template would reproduce exactly the same error.

Before coro-splitting

define linkonce_odr dso_local void @&#8203;_Z4callIiE4taskv() #&#8203;1 comdat prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i6
    4 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {
...
}

After coro-splitting

define internal fastcc void @&#8203;_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv to i64)) to i32) }> {...}

; ditto for .destroy/.cleanup

So for .resume/.destroy/.cleanup, their prologue data are still referring _Z4callIiE4taskv after split. For the purpose of -fsanitize=function, they should refer to _Z4callIiE4taskv.resume/_Z4callIiE4taskv.destroy/_Z4callIiE4taskv.cleanup respectively.

About the fix, I think we could replace references to ramp function in coroutine functions prologue data with their own function symbol:

define internal fastcc void @&#8203;_Z4callIiE4taskv.resume(%_Z4callIiE4taskv.Frame* noalias nonnull align 8 dereferenceable(24
    ) %FramePtr) #&#8203;1 prologue <{ i32, i32 }> <{ i32 846595819, i32 trunc (i64 sub (i64 ptrtoint (i8** @&#8203;1 to i64), i64 ptrtoint (void ()* @&#8203;_Z4callIiE4taskv.resume to i64)) to i32) }> {...}

; ditto for .destroy/.cleanup

I'm not sure this covers all the use cases, but most likely a better default than the status quo.

@rjmccall
Copy link
Contributor

I've never seen this prologue attribute before. Why does it have a self-reference? That seems generically problematic for any sort of function duplication pass.

If such passes are all supposed to be updated to handle it, then sure, I guess coroutine splitting should be, too.

@rjmccall
Copy link
Contributor

Okay, reading up on this attribute, I have no idea what we can reasonably do. This is intentionally opaque data. How can we know in general that it's safe to replace references to the ramp function with references to the current sub-function here? How do other function-cloning optimizations deal with this?

@yuanfang-chen
Copy link
Collaborator

Okay, reading up on this attribute, I have no idea what we can reasonably
do. This is intentionally opaque data. How can we know in general that
it's safe to replace references to the ramp function with references to the
current sub-function here? How do other function-cloning optimizations deal
with this?

That's indeed problematic. I was proposing to enforce self-referencing before and after coro-split just for this particular bug. Yeah, it is definitely not a complete solution.

@bhalevy
Copy link

bhalevy commented Jun 25, 2021

*** Bug llvm/llvm-bugzilla-archive#50863 has been marked as a duplicate of this bug. ***

@kbr-
Copy link
Mannequin Author

kbr- mannequin commented Jun 25, 2021

I recently found that you don't even need templates, a static member function coroutine is enough:
https://godbolt.org/z/bjEj1hj8h

@avikivity
Copy link
Mannequin

avikivity mannequin commented Nov 16, 2021

We have this (form the last comment's output):

.Lfunc_begin1:
        .loc    3 14 0                          # example.cpp:14:0
        .cfi_startproc
        .long   846595819                       # 0x327606eb
        .long   .L__unnamed_2-_ZN1s4callEv
# %bb.0:
        push    rbp
        .cfi_def_cfa_offset 16

and later

        .type   .L__unnamed_2,@object           # @&#8203;1
        .section        .rodata,"a",@progbits
        .p2align        3
.L__unnamed_2:
        .quad   _ZTIF4taskvE
        .size   .L__unnamed_2, 8

Clearly .L__unnamed_2 is in .rodata and _ZN1s4callEv is in .text.something, and indeed the difference (.L__unnamed_2-_ZN1s4callEv) cannot be represented by ELF.

I'm guessing the sequence

        .long   846595819                       # 0x327606eb
        .long   .L__unnamed_2-_ZN1s4callEv

Is really

    jmp 1f  # 0xeb = JMP rel8 
    .byte 0x76
    .byte 0x32
    .long (.L__unnamed_2-_ZN1s4callEv)
  1f:       # this is 6 bytes in the future
    push    rbp        # real function start

so asan is hiding some magic numbers by adding a jump over them, but those magic numbers are not representible.

@avikivity
Copy link
Mannequin

avikivity mannequin commented Nov 16, 2021

This is

llvm::Constant *
getUBSanFunctionSignature(CodeGen::CodeGenModule &CGM) const override {
unsigned Sig = (0xeb << 0) | // jmp rel8
(0x06 << 8) | // .+0x08
('v' << 16) |
('2' << 24);
return llvm::ConstantInt::get(CGM.Int32Ty, Sig);
}

but looking at the first comments I see you already new that. Anyway it was fun reverse-engineering it.

@rjmccall
Copy link
Contributor

We should probably just disable this sanitizer, because this does not seem like a well-designed implementation. Clang should emit some sort of abstract attribute, and then a very late pass can cause that to be lowered into this prologue attribute.

@tstellar
Copy link
Collaborator

How should we fix this in the release/13.x branch?

@yuanfang-chen
Copy link
Collaborator

CC @​pcc in case he got insights about this.

https://reviews.llvm.org/D37597 introduced the faulty minus op in this bug.

For a temporary measure, I think we could use 846595819 (getUBSanFunctionSignature) to tell that fsanitize=undefined is using the prologue. In the frontend, only place the global value in the prologue (

auto *FuncAsInt = llvm::ConstantExpr::getPtrToInt(F, IntPtrTy);
). Then in the codegen phase, using the function address (and the global value) to form the minus op.

@yuanfang-chen
Copy link
Collaborator

Hmmm, not quite. The type information comes from RTTI, the method I described may compile successfully but gives runtime false positives since resume/destroy function types are different from the ramp function. I think the only sensible thing left is to discard the -fsanitize=function prologue for resume/destroy functions during corosplit.

@bhalevy
Copy link

bhalevy commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#50863

@tstellar
Copy link
Collaborator

mentioned in issue #51489

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@asl asl added this to the LLVM 13.0.1 release milestone Dec 12, 2021
@yuanfang-chen
Copy link
Collaborator

https://reviews.llvm.org/D115844

@avikivity
Copy link
Contributor

Ugh, spent some time bisecting only to notice now that there's a patch ready.

Anyway bisected to

commit 5de2d18
Author: Yuanfang Chen yuanfang.chen@sony.com
Date: Tue Feb 23 09:47:15 2021 -0800

[Diagnose] Unify MCContext and LLVMContext diagnosing

The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.

* Kill LLVMContext inlineasm diagnose handler and migrate it to use
  DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
  covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
  is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
  use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
  handler; if LLVMContext is not available, MCContext uses its own default
  diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D97449

@avikivity
Copy link
Contributor

I see the patch is rotting on the review board. Perhaps someone can review it?

@zhuhan0
Copy link
Member

zhuhan0 commented Jan 26, 2022

Hi, can someone with expertise in this area review the diff? Thanks.

@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
@niekbouman
Copy link

Could this issue please be added to the clang v14 milestone as a release blocker?

@EugeneZelenko EugeneZelenko added clang:codegen compiler-rt:ubsan Undefined behavior sanitizer labels Feb 6, 2022
@llvmbot
Copy link
Member

llvmbot commented Feb 6, 2022

@llvm/issue-subscribers-clang-codegen

@avikivity
Copy link
Contributor

It's now been 7 weeks since the patch was posted, and no review activity.

@tstellar perhaps you can enlist a reviewer?

@avikivity
Copy link
Contributor

Review stalled again :(

tchaikov added a commit to tchaikov/seastar that referenced this issue Jul 3, 2022
before this change, the design of generator (quoting Avi's comment):

> This effectively forces a ping-pong between the generator and its user.
> The generator will have to yield each time it produces a value.

after this change, we have two variants of generator

- one which always suspends itself so its caller can consume the
  produced value.
- one which buffers the yielded values until the buffer is full,
  by then, it suspends itself, and wait for the consumer to grab
  a value from it.

please note, due to llvm/llvm-project#49689,
the new generator does not compile with clang-15 + {debug,sanitize} mode.
but the old version compiles and runs fine with the same combinatino.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
tchaikov added a commit to tchaikov/seastar that referenced this issue Jul 3, 2022
before this change, the design of generator (quoting Avi's comment):

> This effectively forces a ping-pong between the generator and its user.
> The generator will have to yield each time it produces a value.

after this change, we have two variants of generator

- one which always suspends itself so its caller can consume the
  produced value.
- one which buffers the yielded values until the buffer is full,
  by then, it suspends itself, and wait for the consumer to grab
  a value from it.

please note, due to llvm/llvm-project#49689,
the new generator does not compile with clang-15 + {debug,sanitize} mode.
but the old version compiles and runs fine with the same combinatino.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
avikivity pushed a commit to avikivity/seastar that referenced this issue Jul 3, 2022
before this change, the design of generator (quoting Avi's comment):

> This effectively forces a ping-pong between the generator and its user.
> The generator will have to yield each time it produces a value.

after this change, we have two variants of generator

- one which always suspends itself so its caller can consume the
  produced value.
- one which buffers the yielded values until the buffer is full,
  by then, it suspends itself, and wait for the consumer to grab
  a value from it.

please note, due to llvm/llvm-project#49689,
the new generator does not compile with clang-15 + {debug,sanitize} mode.
but the old version compiles and runs fine with the same combinatino.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Message-Id: <20220703170214.43823-1-tchaikov@gmail.com>
tchaikov added a commit to tchaikov/seastar that referenced this issue Jul 3, 2022
before this change, the design of generator (quoting Avi's comment):

> This effectively forces a ping-pong between the generator and its user.
> The generator will have to yield each time it produces a value.

after this change, we have two variants of generator

- one which always suspends itself so its caller can consume the
  produced value.
- one which buffers the yielded values until the buffer is full,
  by then, it suspends itself, and wait for the consumer to grab
  a value from it.

please note, due to llvm/llvm-project#49689,
the new generator does not compile with clang-15 + {debug,sanitize} mode.
but the old version compiles and runs fine with the same combinatino.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
@nikic nikic moved this from Needs Fix to Done in LLVM Release Status Aug 3, 2022
@avikivity
Copy link
Contributor

@nikic any chance of getting this backported to 14.0.x? and getting a release cut?

@EugeneZelenko
Copy link
Contributor

@avikivity: 15 is only maintained release branch.

@avikivity
Copy link
Contributor

I see. It is unfortunate, 14 is broken, and 15 is unreleased, so there is no way to deliver a fix to users.

@kbr-
Copy link

kbr- commented Aug 19, 2022

And to think that this issue was once a release blocker for 13.0.1.

@yuanfang-chen
Copy link
Collaborator

/cherry-pick 6678f8e e2e9e70

@llvmbot
Copy link
Member

llvmbot commented Aug 19, 2022

Failed to cherry-pick: 6678f8e

https://github.com/llvm/llvm-project/actions/runs/2891032428

Please manually backport the fix and push it to your github fork. Once this is done, please add a comment like this:

/branch <user>/<repo>/<branch>

@kbr-
Copy link

kbr- commented Aug 19, 2022

@avikivity I recall you have a branch with this fix backported manually.
Maybe you could point to it or make a PR?

@yuanfang-chen
Copy link
Collaborator

I just checked the patch is already included in the 15.x release branch.

@kbr-
Copy link

kbr- commented Aug 19, 2022

Ah, I thought you were trying to backport it to 14.
Hope dies last :(

@avikivity
Copy link
Contributor

@avikivity I recall you have a branch with this fix backported manually. Maybe you could point to it or make a PR?

The problem is not with performing the backport. It is trivial and mechanical.

The problem is that 14.x is not getting updates, while 15.x is not released.

@EugeneZelenko EugeneZelenko removed this from the LLVM 15.0.0 Release milestone Aug 21, 2022
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Information in the function `Prologue Data` is intentionally opaque.
When a function with `Prologue Data` is duplicated. The self (global
value) references inside `Prologue Data` is still pointing to the
original function. This may cause errors like `fatal error: error in backend: Cannot represent a difference across sections`.

This patch detaches the information from function `Prologue Data`
and attaches it to a function metadata node.

This and D116130 fix llvm/llvm-project#49689.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D115844
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
There is no proper RTTI for these split functions. So just delete the
metadata.

Fixes llvm/llvm-project#49689.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D116130
tchaikov added a commit to tchaikov/seastar that referenced this issue Mar 27, 2023
now that the minimum clang version we support is Clang 15,
and the fix of llvm/llvm-project#49689
was included its 15.x branch. so we can enable these test for Clang
now.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/seastar that referenced this issue Mar 27, 2023
now that the minimum clang version we support is Clang 15,
and the fix of llvm/llvm-project#49689
was included its 15.x branch. so we can enable these tests for Clang
now.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:codegen compiler-rt:ubsan Undefined behavior sanitizer confirmed Verified by a second party
Projects
Archived in project
Development

No branches or pull requests