Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constraints for ABI #38

Open
UebelAndre opened this issue Mar 8, 2022 · 22 comments
Open

Constraints for ABI #38

UebelAndre opened this issue Mar 8, 2022 · 22 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional)

Comments

@UebelAndre
Copy link

Would it be possible to introduce @platforms//abi which contains constraints for common ABI definitions of platform triples?

In rules_rust we currently have issues with users wanting to target platforms with shared CPU and system and unique ABI but are unable to do so as there's no constraints uniquely identifying them. Users are forced to define custom constraints and redefine toolchains to get around errors caused by this ambiguity. I think it would be very beneficial to introduce ABI constraints here so that rules_rust and other rules can share constraints, enabling rule maintainers to appropriately constrain toolchains so users only have to setup platform definitions.

@hlopko
Copy link
Member

hlopko commented Mar 8, 2022

Some old but related doc on the topic of more constraint settings: docs.google.com/document/d/1CgU-GKocMAfsUSI3bbGZ0YRkOWczitIoKs29x3zR914/edit#heading=h.3fbh1otqm5sz

@bsilver8192
Copy link

ARM ABIs are really complicated... I'm not sure how to fit all of this into constraints. I'm sure representing all of this complexity is a bad idea, but I don't have much input into where to draw the line.

I've been pretty happy defining my own constraint_setting that just enumerates my hardware platforms. The only place I find standard constraints helpful is using other people's BUILD files, which only needs granularity to the extent those BUILD files do different things. Most of this information is only needed by the toolchain itself, but I have seen different assembly files to choose between which do need more of this to work for any platform.

Some platforms I've targeted with Bazel which I think are common:

  • Cortex-M0+ without FPU (small slow microcontroller) bare metal
  • Cortex-M4 with FPU (faster microcontroller) bare metal. Note that the FPU is optional with this core, but all the ones I've seen have an FPU.
  • Debian armhf (hf means FPU registers in the calling convention, Debian has some other variants too, but I haven't used them)
  • Raspbian / Raspberry Pi OS (Debian armhf, but with flags to build for older processors)
  • Debian armhf, plus all instructions supported by Cortex-A9
  • Debian armhf, plus all instructions supported by Cortex-A53. I've run code compiled this way on both Debian arm64 (via multiarch) and armhf userspaces.
  • Yocto-based Linux distribution on Cortex-A53 (I think the FPU is required on A53, mine has one and uses it in the calling convention)

Some less common platforms I've targeted with Bazel:

  • Cortex-R5F with FPU (kind of like Cortex-A, but no MMU) with FreeRTOS
  • Cortex-M4 with FPU and various RTOS
  • Yocto-based Linux distribution on Cortex-A9 with an FPU, but with soft-float calling conventions

Some notes on aarch32 (ARMv6/ARMv7, or ARMv8 in 32-bit mode) variants I've dealt with:

  • gnueabi and gnueabihf are the most common variants in the triples. gnueabihf corresponds to -mfloat-abi=hard for GCC, -mfloat-abi=soft and -mfloat-abi=softfp are the other choices which both have the same ABI but differ in whether floating point instructions are used within functions. AAPCS and EABI are two different documents that define parts of it. Object files are ELF and C++ code follows the Itanium ABI for both of them.
  • -marm vs -mthumb compile functions with different instruction sets which are both aarch32. I think all EABI triples can freely call between functions compiled with each mode, which is called interworking (other ABIs can disable interworking). Most of the ways to call a function (but not all) in aarch32 will automatically switch between thumb and arm mode based on the low bit of the address, and the toolchain can set things up so that works everywhere with no additional cost beyond making sure to call functions with one of the instructions that does the switching.
  • There's also OABI (old ABI), but I've never used that. This might be old enough to not worry about it.
  • It's common to target a wide variety of ARM cores which have different instruction sets. There's both the choice of which core, and most of the cores have various optional instructions. This means it's pretty common to have different toolchains with the same triple that generate code compatible with different processors. Raspbian is a common example of this: it uses the same triple as Debian armhf but generates code compatible with some older processors.
  • Different enum sizes are even more common with ARM than x86-land. ARM has officially documented variants for both 4-byte enums and sized-to-fit. Those make for really subtle ABI differences...
  • https://bugs.llvm.org/show_bug.cgi?id=27455 is an even more subtle incompatibility I ran into where GCC and Clang are using different ABIs, despite trying to be compatible.
  • Some of these variants are encoded in special sections in the ELF files so linking together incompatible ones will fail, others aren't.
  • Some ARM toolchains come with a variety of precompiled libc/libgcc/etc for many of these variants.

@UebelAndre
Copy link
Author

In hopes of moving the conversation along I've opened #39 to provide a concrete example of what I'm hoping for.

@aiuto
Copy link
Contributor

aiuto commented Apr 11, 2022

IMO, there is no winning in trying to precisely define ARM platforms in this repository. Virtually every device is its own platform, since there can be so much customization of what is on the die.

@UebelAndre
Copy link
Author

For what it's worth, the only 3 values I'm really hoping to represent are gnu, musl, and msvc which would help tremendously with producing docker images containing Rust or Go binaries. I'm not too familiar with all the existing ABI variants but are those considered base values (thinking of things like gnueabi)?

@aiuto
Copy link
Contributor

aiuto commented Apr 19, 2022

I would rather go small first, with very precise definitions of what each constant means.
So what would those three things mean? I take ABI to strictly mean the calling convention between separately compiled modules. So, gnu is probably better called itaniam-C++. That should match clang too, so using the generic name is fair.

How is musl as an API different from gnu? Aren't we getting at features of the library, rather than the binary interface? Unless we are talking about different musls.
It is also not clear what msvc means as an ABI standard. The calling conventions have changed in different VS versions, and the windows APIs have changed as well.

Also, by being strict about calling conventions, I am excluding anything like linker format, and features of the standard library. Those should be an orthogonal space. It's not clear from the issue description if the need is to cover all of those topics or just the ABI.

@UebelAndre, it would help to see examples of what the rust teams are actually trying to do. Can you point to any of the platform definitions they are building? Or, maybe some definitive docs on rust/c++ interoperability.

@bsilver8192
Copy link

I would rather go small first, with very precise definitions of what each constant means.

I agree with these goals. I would like to add "no names that are prone to assuming an incorrect meaning" and "obvious nesting or mutual exclusion between all categories".

For example the current decision between //cpu:arm (all ARM? all aarch32? non-thumb aarch32?) vs //cpu:armv7 vs several sets of specific ARM cores denoted by their instruction set variants is really confusing. And is //cpu:arm64_32 aarch32-on-arm64 (code can use all the registers, but they're all caller-save and the ABI doesn't change) or is it the aarch64 version of x32? Even as somebody who knows what most of this means, I have no idea which of these are supposed to apply to my platforms.

I've got tons of specific things to bring up in the interest of avoiding something which confuses people thinking about just one of them. Also it'd be good to avoid precluding clean solutions to the rest of it in the future, but that might be hard.

So what would those three things mean? I take ABI to strictly mean the calling convention between separately compiled modules. So, gnu is probably better called itaniam-C++. That should match clang too, so using the generic name is fair.

Actually, coming from managing many C/C++ toolchains, gnu strikes me as the most problematic one on that list. For some triples, that means GNU OABI (which is ancient), vs gnueabi would be the GNU EABI that's used on all modern ARM. For other triples, on CPUs/platforms/etc which never used OABI, gnu means GNU EABI. Keep in mind that there might be some platforms with non-GNU EABI around too. Even if we document a consistent meaning for it within the BUILD file, people are going to misuse it because they assume it's obvious what it means (it's in my triple, that must be the one I want).

The itanium C++ ABI defines how C++ is implemented on top of common ELF ABIs. The name is because it was first defined for Itanium, but it's since become the de facto standard on all the common platforms (aarch32, aarch64, x86, amd64 are the ones I work with). But C++ ABI compatibility is also affected by (and this is just on Linux with libstdc++/libc++, Microsoft basically just breaks compatibility periodically instead):

  • ELF symbol versioning (does your libstdc++ have it enabled? If so, what versions does it have?)
  • C++ name mangling versioning (there's some compiler extensions for this, with complex choices around what your libstdc++ supports and which set you build against with the same header files)
  • Different RTTIs which can make two modules that both work with the same C++ standard library not actually work when run together (because they end up with different std::string basically)

Separate from the mess of C++, there's also things like:

  • enum sizes (-fshort-enums)
  • relative function call offset ranges (-mlong-calls)
  • whether complex math compiler builtin functions use the normal hard-float ABI or a special one (https://bugs.llvm.org/show_bug.cgi?id=27455)
  • enabling exceptions and RTTI
  • whether SIMD registers are callee-save

They do fit under "separately compiled modules need to handle them in compatible ways", but I don't think it's manageable in a centralized list like this... I think managing any of this with bazel platforms is a low priority.

How is musl as an API different from gnu? Aren't we getting at features of the library, rather than the binary interface? Unless we are talking about different musls. It is also not clear what msvc means as an ABI standard. The calling conventions have changed in different VS versions, and the windows APIs have changed as well.

Things like the interface to the dynamic linker and the implementation of errno depend on which libc. Rust also cares about the stack unwinding support. TLS implementation can also be different for some libc.

@UebelAndre, it would help to see examples of what the rust teams are actually trying to do. Can you point to any of the platform definitions they are building? Or, maybe some definitive docs on rust/c++ interoperability.

I believe the goal is to enable a platform to select a unique target from all the ones Rust supports at https://doc.rust-lang.org/nightly/rustc/platform-support.html, so that rules_rust can generate toolchains for all of them and get the correct one picked based on the platform.

@UebelAndre
Copy link
Author

@UebelAndre, it would help to see examples of what the rust teams are actually trying to do. Can you point to any of the platform definitions they are building? Or, maybe some definitive docs on rust/c++ interoperability.

@aiuto I've created bazelbuild/rules_rust#1270

@UebelAndre
Copy link
Author

UebelAndre commented Apr 19, 2022

@aiuto I've opened bazelbuild/rules_docker#2062 to also show how other rules may benefit from a common set of definitions. A change like that would allow the rules_rust changes to work with rules_docker changes without requiring user patching or a large amount custom configuration.

Does this and #38 (comment) provide more helpful context?

@UebelAndre
Copy link
Author

@aiuto @gregestren friendly ping 😅

@gregestren
Copy link
Collaborator

Ping noted. Will try to respond soon (maybe next chance @aiuto and I get to talk).

@gregestren gregestren added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jun 29, 2022
@gregestren
Copy link
Collaborator

I've schedule this with my next chat with @aiuto - we at least owe a proper response / next step for this issue. We're both OOO the next two weeks but we'll sync mid-July.

@graywolf-at-work
Copy link

Any update on this? I don't want to push too much but thought I would ask given the "mid-July" estimate.

@gregestren
Copy link
Collaborator

Fair comment. Apologies for delays. Scheduling again for next week, and I'll make sure we discuss.

@gregestren
Copy link
Collaborator

My summary of the above:

I don't want to say more yet since the relationship between ABI and OS and platform is subtle as clearly expressed above. Still on agenda for wider discussion this week.

@bsilver8192
Copy link

Another tricky scenario to consider, courtesy of a @coffinmatician:

  1. C++ toolchain for armv7 soft-float ABI which doesn't emit floating point instructions
  2. C++ toolchain for armv7 soft-float ABI which does emit floating point instructions (GCC's -mfloat-abi=softfp)
  3. Rust toolchain for armv7 soft-float ABI which doesn't emit floating point instructions

(I suspect that in practice rustc can use -mfloat-abi=softfp or an equivalent, but this still applies if rules_rust doesn't provide a toolchain which uses that.)

All 3 of these toolchains can be fully ABI-compatible. Some processors can run code from any of them, other processors can only run code from 1 and 3. I don't think Bazel's current constraint system allows a good solution: if you build for a platform that requires soft-ABI-hard-instructions then there's no compatible Rust toolchain, but if you use soft-ABI-soft-instructions then you end up with a non-optimal C++ toolchain.

@coffinmatician thinks constraints should have a full SAT constraint optimizer to address this.

@aiuto aiuto self-assigned this Sep 28, 2022
@bsilver8192
Copy link

I did some more thinking, and I have a proposal: put the list of Rust triples in a constraint_setting(name = "triple") in rules_rust, and do the same for other toolchain ecosystems.

For example, a project that creates Clang toolchains would have its own constraint_setting, and another project that does GCC toolchains would have its own, and a third project for bare-metal GCC and Clang toolchains would a separate one.

I don't think any other solution is going to solve the problem cleanly. Even if @platforms gets some kind of "ABI" constraint, there's still going to be platforms that can't be distinguished. For example, looking through the Rust list, the first two tier 2 targets look problematic: aarch64-apple-ios vs aarch64-apple-ios-sim.

Also building C++ code with Clang sometimes uses subtly different triples, and GCC often uses different triples. Debian (and derivatives) also use different triples for multiarch (x86_64-linux-gnu vs x86_64-unknown-linux-gnu, or arm-linux-gnueabihf vs armv7a-unknown-linux-gnueabihf). Also with GCC and Clang it's fairly common to further modify the ABI beyond the triple with various flags, sometimes in ways that could be accomplished by changing the triple too (like ARMv6 vs ARMv7). In general I think that trying to canonicalize triples (or any parts of them) across ecosystems is not going to work, and is going to produce confusion when different ecosystems use different strings for the same meaning.

I think rules_rust can create selects that don't use this new triple constraint, unless multiple enabled Rust triples are only distinguishable that way. This means that for some set of "common" platforms enabled by default, the user doesn't need to add any of the triple values to their platform. But once a user does enable multiple Rust triples that can only be distinguished with this constraint, they will have to add the appropriate values to their platforms.

@graywolf-at-work
Copy link

* @graywolf-at-work what brought you to this issue? Are you affected by this?

I'm running bazel on alpine linux (so on musl), meaning basically any pre-built
binaries do not work. Currently I'm patching upstream projects and adding
constraint on glibc or musl (using [0]) so that toolchain selection works
properly, but that is obviously something I cannot contribute to the upstreams.
So I would like to see a standard solution that would allow proper toolchain
selection based on libc (so pre-built can be glibc, and built-from-source can
be musl for example) to work.

0: https://git.sr.ht/~graywolf/x_platforms/tree/master/item/abi/BUILD

@UebelAndre
Copy link
Author

put the list of Rust triples in a constraint_setting(name = "triple") in rules_rust, and do the same for other toolchain ecosystems.

I don't think I understand why this would be acceptable. I think rules_rust correctly translates a triple into a collection of constraints and isn't trying to treat triples as anything other than an alias. The addition of ABI constraints isn't going to solve all problems, it's just one more set of constraints that can be used to make platforms more unique and solve musl vs gnu issues I've run into a lot.

@gregestren
Copy link
Collaborator

We discussed this week. @aiuto has more input.

@mattyclarkson
Copy link

We're setting up C/C++ toolchains within our company and hitting the same issue.

We hermetically download a compiler, binutils and sysroot. That combination needs to be described via constraints so toolchain selection can occur correctly.

Rather than ABI (which can be affected by many things), we were just thinking more around the libc contraints:

@platforms//libc/gnu:2.31
@platforms//libc/musl:1.21

It's important to know the version because they are often forwards but not backwards compatible. It likely makes sense to have the non-versioned contraints also and that a select can occur on both the versioned and non-versioned contraint.

The other thing to know is the version of the system headers because deploying the binary can mean the libc uses system calls that are not available on older kernels.

So something like:

@platforms//os/linux:5.13

@lberki
Copy link
Contributor

lberki commented Oct 20, 2022

@mattyclarkson do the constraints need to be in the @platforms repository for your use case?

I don't think having a separate constraint for each Linux version is a great idea because it would (in principle) make Bazel version incompatible with any Linux version released after it, but if you control your own platform and constraint definitions, you know which exact versions of Linux / libc / etc. you care about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants