LLVM should stop mangling function symbols based on calling convention #54090

rnk · 2022-02-25T22:01:52Z

LLVM has an IR Mangler class, which has two responsibilities:

Adding underscore prefixes to symbols on some platforms (MachO & x86 Windows)
Mangling wacky Windows calling conventions (stdcall, vectorcall, fastcall)

I believe the first behavior is useful, because it makes simple LLVM IR more portable. It papers over a common portability issue for frontends. They can easily call the platform C library (main, malloc, free, printf, puts) with the same IR on all platforms. It makes writing generic IR tests for all platforms easier, since you don't have to change IR function names based on the target.

The second behavior, however, is very specific to Windows APIs, and doesn't help make target-neutral IR portable. I think it was a mistake to implement this behavior in LLVM. This should have always been a frontend responsibility. I think it became the IR mangler's responsibility to do it because, for fastcall, the leading Windows underscore changes to an @ symbol.

The calling convention mangling has a history of causing issues, the latest of which is D120382. For this reason, Clang implements the calling convention mangling in the frontend.

For reference, here are the three Windows calling convention manglings that get byte suffixes:

void foo(int);
stdcall -> _foo@4
fastcall -> @foo@4
vectorcall -> foo@@4

Assuming folks agree that this is a desirable direction, we should come up with a migration plan. I think the simplest thing to do is to add a static method to the Mangler class that accepts a calling convention, base name, and llvm::FunctionType and returns a string mangled name. Frontends can call this API directly if they need mangling when constructing LLVM IR. Clang handles this itself to handle corner cases involving incomplete types, which probably aren't a concern for non-C frontends.

The most straightforward way to prevent the IR mangler from prefixing these fastcall and vectorcall symbols with underscores would be to prefix them with the existing IR mangler escape, \01. This would presumably make IR from such a frontend incompatible with old IR for LTO purposes, but it matches clang's behavior today, so in some ways it is more compatible with the dominant existing IR producer.

Another way to handle this would be to extend the current mangler check for question mark prefixes to look for @ anywhere in the symbol. This is a behavior change for frontends that create symbols with @ in them, and any language that uses that character for its symbols will need to prefix such symbols with _ to avoid having an ABI break at the object file level. This would also break LTO with old bitcode since the IR symbols would change. However, we went through the same IR ABI break when we eliminated the \01 escape from all C++ symbols emitted by Clang, and I don't recall any user-reported issues, just challenges with updating IR test case expectations.

I think the first approach, \01 prefixing, is a better transition path. We can add the API (Mangler::mangleForCallingConvention(StringRef, DataLayout, CallingConv::ID, FunctionType)), announce the deprecation plan, migrate frontends, and remove the old mangler behavior one release later.

The text was updated successfully, but these errors were encountered:

efriedma-quic · 2022-02-26T00:10:07Z

calling convention, base name, and llvm::FunctionType

Not sure this is sufficient; we also need to deal with byval arguments, I think? Oh, and, sret pointers.

rnk · 2022-02-28T17:39:12Z

Yes, that's right. I think it would be sufficient to also pass the AttributeList for the function prototype, which frontends have to provide eventually anyway. The other option is to provide an API that takes a fully formed llvm::Function, but then the frontend has to rename functions after they are created.

github-actions bot added the new issue label Feb 25, 2022

rnk added llvm:core and removed new issue labels Feb 25, 2022

rnk mentioned this issue Feb 28, 2022

Backport fix for mangling of aliases #54103

Closed

Amanieu mentioned this issue Mar 2, 2022

Implement sym operands for global_asm! rust-lang/rust#94468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM should stop mangling function symbols based on calling convention #54090

LLVM should stop mangling function symbols based on calling convention #54090

rnk commented Feb 25, 2022

efriedma-quic commented Feb 26, 2022

rnk commented Feb 28, 2022

LLVM should stop mangling function symbols based on calling convention #54090

LLVM should stop mangling function symbols based on calling convention #54090

Comments

rnk commented Feb 25, 2022

efriedma-quic commented Feb 26, 2022

rnk commented Feb 28, 2022