Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM should stop mangling function symbols based on calling convention #54090

Open
rnk opened this issue Feb 25, 2022 · 2 comments
Open

LLVM should stop mangling function symbols based on calling convention #54090

rnk opened this issue Feb 25, 2022 · 2 comments

Comments

@rnk
Copy link
Collaborator

rnk commented Feb 25, 2022

LLVM has an IR Mangler class, which has two responsibilities:

  1. Adding underscore prefixes to symbols on some platforms (MachO & x86 Windows)
  2. Mangling wacky Windows calling conventions (stdcall, vectorcall, fastcall)

I believe the first behavior is useful, because it makes simple LLVM IR more portable. It papers over a common portability issue for frontends. They can easily call the platform C library (main, malloc, free, printf, puts) with the same IR on all platforms. It makes writing generic IR tests for all platforms easier, since you don't have to change IR function names based on the target.

The second behavior, however, is very specific to Windows APIs, and doesn't help make target-neutral IR portable. I think it was a mistake to implement this behavior in LLVM. This should have always been a frontend responsibility. I think it became the IR mangler's responsibility to do it because, for fastcall, the leading Windows underscore changes to an @ symbol.

The calling convention mangling has a history of causing issues, the latest of which is D120382. For this reason, Clang implements the calling convention mangling in the frontend.

For reference, here are the three Windows calling convention manglings that get byte suffixes:

void foo(int);
stdcall -> _foo@4
fastcall -> @foo@4
vectorcall -> foo@@4

Assuming folks agree that this is a desirable direction, we should come up with a migration plan. I think the simplest thing to do is to add a static method to the Mangler class that accepts a calling convention, base name, and llvm::FunctionType and returns a string mangled name. Frontends can call this API directly if they need mangling when constructing LLVM IR. Clang handles this itself to handle corner cases involving incomplete types, which probably aren't a concern for non-C frontends.

The most straightforward way to prevent the IR mangler from prefixing these fastcall and vectorcall symbols with underscores would be to prefix them with the existing IR mangler escape, \01. This would presumably make IR from such a frontend incompatible with old IR for LTO purposes, but it matches clang's behavior today, so in some ways it is more compatible with the dominant existing IR producer.

Another way to handle this would be to extend the current mangler check for question mark prefixes to look for @ anywhere in the symbol. This is a behavior change for frontends that create symbols with @ in them, and any language that uses that character for its symbols will need to prefix such symbols with _ to avoid having an ABI break at the object file level. This would also break LTO with old bitcode since the IR symbols would change. However, we went through the same IR ABI break when we eliminated the \01 escape from all C++ symbols emitted by Clang, and I don't recall any user-reported issues, just challenges with updating IR test case expectations.

I think the first approach, \01 prefixing, is a better transition path. We can add the API (Mangler::mangleForCallingConvention(StringRef, DataLayout, CallingConv::ID, FunctionType)), announce the deprecation plan, migrate frontends, and remove the old mangler behavior one release later.

@efriedma-quic
Copy link
Collaborator

calling convention, base name, and llvm::FunctionType

Not sure this is sufficient; we also need to deal with byval arguments, I think? Oh, and, sret pointers.

@rnk
Copy link
Collaborator Author

rnk commented Feb 28, 2022

Yes, that's right. I think it would be sufficient to also pass the AttributeList for the function prototype, which frontends have to provide eventually anyway. The other option is to provide an API that takes a fully formed llvm::Function, but then the frontend has to rename functions after they are created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants