Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Cannot select" LLVM failure due to unimplemented atomics on msp430 #441

Closed
cr1901 opened this issue Nov 22, 2021 · 20 comments
Closed

"Cannot select" LLVM failure due to unimplemented atomics on msp430 #441

cr1901 opened this issue Nov 22, 2021 · 20 comments

Comments

@cr1901
Copy link
Contributor

cr1901 commented Nov 22, 2021

Context

A few nights ago, CI started failing for some msp430 firmware I use to make sure that msp430 Rust code builds:

   Compiling rustc-std-workspace-core v1.99.0 (/usr/share/rust/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/rustc-std-workspace-core)
LLVM ERROR: Cannot select: 0x7fbca0892c98: i16,ch = AtomicLoad<(load unordered (s16) from %ir.49)> 0x7fbca08313e8, 0x7fbca0893340
  0x7fbca0893340: i16,ch = CopyFromReg 0x7fbca08313e8, Register:i16 %19
    0x7fbca08a1c30: i16 = Register %19
In function: memcpy
error: could not compile `compiler_builtins`

MSP430 doesn't have meaningful atomic support, so I'm not sure why code using atomics is being generated.

Bisecting rustc

Using cargo-bisect-rustc, I found the range of commits where compilation started failing.

searched nightlies: from nightly-2021-11-18 to nightly-2021-11-19
regressed nightly: nightly-2021-11-19
searched commits: from rust-lang/rust@c9c4b5d to rust-lang/rust@cc946fc
regressed commit: rust-lang/rust@b6f580a

bisected with cargo-bisect-rustc v0.6.0

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --preserve-target --preserve --start=2021-11-18 --end=2021-11-19 --with-src

I further bisected manually, the actual last good commit in rust is rust-lang/rust@a3b9405 and the regression is rust-lang/rust@88f1bf7. Commit b6591c2 bumps compiler_builtins to 0.1.52.

Bisecting compiler_builtins

Using a nightly from before the failure started (see "Other Notes"), the first compile failure I get is with commit ce86d41. The error is quite similar to the one in the CI failures:

william@xubuntu-dtrain:~/Projects/toolchains/compiler-builtins$ cargo build --t[14/661]p430-none-elf -Zbuild-std=core
   Compiling compiler_builtins v0.1.49 (/home/william/Projects/toolchains/compiler-builtins)
warning: an associated function with this name may be added to the standard library in the future
 --> src/float/pow.rs:8:19
  |
8 |     let mut pow = i32::abs_diff(b, 0);
  |                   ^^^^^^^^^^^^^
  |
  = note: `#[warn(unstable_name_collisions)]` on by default
  = warning: once this associated item is added to the standard library, the ambiguity may cause an error or change in behavior!
  = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
  = help: call with fully qualified syntax `Int::abs_diff(...)` to keep using the current method
  = help: add `#![feature(int_abs_diff)]` to the crate attributes to enable `num::<impl i32>::abs_diff`

LLVM ERROR: Cannot select: 0x7f6cd2888208: i16,ch = AtomicLoad<(load unordered (s16) from %ir.188)> 0x7f6cd287aa28:1, 0x7f6cd287aa28, src/mem/impls.rs:65:29 @[ src/mem/impls.rs:109:13 @[ src/mem/mod.rs:25:5 ] ]
  0x7f6cd287aa28: i16,ch = load<(dereferenceable load (s16) from %ir.28)> 0x7f6cd287b270, FrameIndex:i16<24>, undef:i16, src/mem/impls.rs:65:69 @[ src/mem/impls.rs:109:13 @[ src/mem/mod.rs:25:5 ] ]
    0x7f6cd287de38: i16 = FrameIndex<24>
    0x7f6cd28427b8: i16 = undef
In function: memcpy
warning: `compiler_builtins` (lib) generated 1 warning
error: could not compile `compiler_builtins`; 1 warning emitted

I believe fixing the failure in compiler_builtins will fix the error I'm seeing when compiling rustc, and I would need to test locally. That said, I'm not sure what is causing compiler_builtins to be built anyway when I specify -Zbuild-std=core. Doesn't core have no dependencies?

Solution

I'm not sure what should be done here. Can there be a gate for targets whose maximum atomic width is 0? Although it eventually needs to be done, implementing atomic support for msp430 is unlikely right now, as I don't have the bandwidth to do it. Additionally, AIUI, Rust does not actually mandate that targets implement atomics, and that they should become noops/not available on targets without them (not just msp430).

Other Notes

Environment

rustc

william@xubuntu-dtrain:~/Projects/toolchains/compiler-builtins$ rustc -Vv
rustc 1.58.0-nightly (ff0e14829 2021-10-31)
binary: rustc
commit-hash: ff0e14829e1806ca0d4226595f7fdf3e8658758f
commit-date: 2021-10-31
host: x86_64-unknown-linux-gnu
release: 1.58.0-nightly
LLVM version: 13.0.0

cargo

william@xubuntu-dtrain:~/Projects/toolchains/compiler-builtins$ cargo -Vv
cargo 1.58.0-nightly (6c1bc24b8 2021-10-24)
release: 1.58.0
commit-hash: 6c1bc24b8b49d4bc965f67d7037906dc199c72b7
commit-date: 2021-10-24
host: x86_64-unknown-linux-gnu
libgit2: 1.3.0 (sys:0.13.23 vendored)
libcurl: 7.79.1-DEV (sys:0.4.49+curl-7.79.1 vendored ssl:OpenSSL/1.1.1l)
os: Ubuntu 20.04 (focal) [64-bit]

Circular Dependency Problems

I have to compile on msp430 using -Zbuild-std=core, since it's not built as part of a Rust release right now. Even if I have an older version of compiler_builtins checked out, using a too-recent nightly/commit causes -Zbuild-std=core to compile compiler_builtins v0.1.52 anyway as a dependency of the older compiler_builtins in a fun circular dependency!

william@xubuntu-dtrain:~/Projects/toolchains/compiler-builtins$ LD_LIBRARY_PATH[20/450]william/Projects/toolchains/build-llvm-toolchain/build-llvm/lib:/home/william/Projects/toolchains/build-llvm-toolchain/build-rust/build/x86_64-unknown-linux-gnu/stage1/lib" cargo +msp430-fix build --target=msp430-none-elf -Zbuild-std=core
   Compiling compiler_builtins v0.1.52
   Compiling core v0.0.0 (/home/william/Projects/toolchains/build-llvm-toolchain/build-rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core)
   Compiling compiler_builtins v0.1.49 (/home/william/Projects/toolchains/compiler-builtins)
   Compiling rustc-std-workspace-core v1.99.0 (/home/william/Projects/toolchains/build-llvm-toolchain/build-rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/rustc-std-workspace-core)
warning: an associated function with this name may be added to the standard library in the future
 --> src/float/pow.rs:8:19
  |
8 |     let mut pow = i32::abs_diff(b, 0);
  |                   ^^^^^^^^^^^^^
  |
  = note: `#[warn(unstable_name_collisions)]` on by default
  = warning: once this associated item is added to the standard library, the ambiguity may cause an error or change in behavior!
  = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
  = help: call with fully qualified syntax `Int::abs_diff(...)` to keep using the current method
  = help: add `#![feature(int_abs_diff)]` to the crate attributes to enable `num::<impl i32>::abs_diff`

LLVM ERROR: Cannot select: t14: i16,ch = AtomicLoad<(load unordered (s16) from %ir.188)> t13:1, t13, /home/william/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/compiler_builtins-0.1.52/src/mem/impls.rs:65:29 @[ /home/william/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/compiler_builtins-0.1.52/src/mem/impls.rs:109:13 @[ /home/william/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/compiler_builtins-0.1.52/src/mem/mod.rs:25:5 ] ]
  t13: i16,ch = load<(dereferenceable load (s16) from %ir.28)> t12, FrameIndex:i16<24>, undef:i16, /home/william/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/compiler_builtins-0.1.52/src/mem/impls.rs:65:69 @[ /home/william/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/compiler_builtins-0.1.52/src/mem/impls.rs:109:13 @[ /home/william/.cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/compiler_builtins-0.1.52/src/mem/mod.rs:25:5 ] ]
    t11: i16 = FrameIndex<24>
    t5: i16 = undef
In function: memcpy
error: could not compile `compiler_builtins`
warning: build failed, waiting for other jobs to finish...
warning: `compiler_builtins` (lib) generated 1 warning
error: build failed
@Amanieu
Copy link
Member

Amanieu commented Nov 23, 2021

I think the easiest way to resolve this is to make a minor change to ce86d41b4f95e4d1917460ee7a05fcd7fe6831fc: use a volatile load (ptr::read_volatile) instead of atomic_load_unordered. This can be made conditional on cfg(target_has_atomic_load_store = "ptr") so that the change only applies to msp430.

@cr1901
Copy link
Contributor Author

cr1901 commented Nov 23, 2021

@Amanieu I'll prepare a patch for this/try it out and open a PR. Eventually, the need for this for msp430 will be removed; I've asked someone familiar w/ LLVM to help me implement fences for msp430 (because I am empathetically, err, not familiar with LLVM :)). But it's prob good to gate it for others.

@ducktec
Copy link

ducktec commented Nov 23, 2021

I believe I see the same issue in the nightly channel with the ESP32-C3 and the rv32imc ISA. This ISA does not support atomic load/store operations.

   = note: rust-lld: error: undefined symbol: __atomic_load_4
          >>> referenced by impls.rs:98 (/cargo/registry/src/gh.neting.cc-1ecc6299db9ec823/compiler_builtins-0.1.52/src/mem/impls.rs:98)
          >>>               compiler_builtins-f61960a02225fda8.compiler_builtins.f451d05e-cgu.3.rcgu.o:(memcpy) in archive /usr/share/rust/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/riscv32imc-unknown-none-elf/lib/libcompiler_builtins-f61960a02225fda8.rlib

So I guess that the alternative (non-atomic) load would have to be made conditional not only for the msp430, but at least also for the RISCV ISA without atomic support. And for those, the changes will have to stick around.

I tried replacing the atomic operation with core::ptr::read_volatile and with that change the issue was no longer observable.

@bjorn3
Copy link
Member

bjorn3 commented Nov 23, 2021

Doesn't riscv require loads and stores to be atomic even when the atomic extension isn't implemented? From section 2.6 of the riscv spec:

Furthermore, whereas naturally aligned loads and stores are guaranteed to execute atomically, misaligned loads and stores might not, and hence require additional synchronization to ensure atomicity.

(emphasis mine)

The loads should be aligned and thus it should be fine to depend on the atomicity of aligned loads. In addition the fence instruction is also part of the base spec. Not that it is needed for this specific case as the unordered memory ordering is used.

@Amanieu
Copy link
Member

Amanieu commented Nov 23, 2021

IIRC there were concerns about the compatibility between a direct instruction for load/store and libcalls for RMW operations. So LLVM just makes everything use libcalls on RISCV without A. In any case, the workaround I described (cfg(target_has_atomic_load_store = "ptr")) should also apply here.

@cr1901
Copy link
Contributor Author

cr1901 commented Nov 23, 2021

I'll make sure to add a comment describing the RISCV case as well. But just to clarify for my own understanding:

In any case, the workaround I described (cfg(target_has_atomic_load_store = "ptr")) should also apply here.

If RISCV without A is preferring a libcall, shouldn't the atomic load be converted into a libcall without bombing as above? Meaning either version of the load (cfg(target_has_atomic_load_store = "ptr") or cfg(not(target_has_atomic_load_store = "ptr"))) should work?

Not that it is needed for this specific case as the unordered memory ordering is used.

I don't fully understand the issue in msp430's case, but AIUI: msp430 the ISA doesn't support hardware fences, so attempts to lower atomic ops will choke when doing so. However, msp430 LLVM also doesn't support compiler fences (I guess on the C side, there wasn't use for a fence? Idk...). So singlethreaded: true to work around the lack of hardware fences at the ISA level will choke as well, since singlethreaded: true replaces hardware with software fences.

But why is an atomic load w/ unordered memory ordering requiring the backend to supply code for a fence at all, software or otherwise?

I have my own ideas:

  1. Right now singlethreaded: false for msp430, I assume the actual cause of the "cannot select AtomicLoad" in the singlethreaded: false case is the lack of a hardware fence, and since msp430 doesn't support hardware fences, the entire API was left unimplemented, even for the unordered case. cc: @asl can you provide clarification?

  2. The error message can vary between "cannot select AtomicLoad" and "cannot select AtomicFence" when singlethreaded: true. I'm guessing, assuming other things don't break, that if I set singlethreaded: true, a software fence won't be generated for an unordered load, and compiler_builtins will actually compile correctly.

  3. I feel like singlethreaded should be an enum for MulticoreAtomicCapable, SingleCoreAtomicCapable, NotAtomicCapable :P.

@bjorn3
Copy link
Member

bjorn3 commented Nov 23, 2021

Why is an atomic load w/ unordered memory ordering generating a fence at all, software or otherwise, which I assume is the actual cause of the "cannot select AtomicLoad"?

It doesn't try to generate a fence. It tries to generate an atomic load. Fences are only necessary to ensure a consistent view of the memory. Unordered atomic operations don't generate fences. They only need an instruction that atomically performs the operation, without needing any memory ordering restrictions. As the msp430 is single core only AFAICT, it should be possible to lower atomic loads and stores to regular loads and stores inside LLVM. It is probably that simply nobody implemented this.

@cr1901
Copy link
Contributor Author

cr1901 commented Nov 23, 2021

As the msp430 is single core only AFAICT, it should be possible to lower atomic loads and stores to regular loads and stores inside LLVM.

Upstream seems to want singlethreaded: true to be enabled, which would require compiler fence support (just checked- the singlethreaded bool is still present). This feels in contrast to singlethreaded: false and lowering atomic loads/stores to regular loads/stores behind the scenes?

Anyways, I'm fine with the cfg(target_has_atomic_load_store = "ptr") solution. I just don't want it added solely for my sake.

@asl
Copy link

asl commented Nov 23, 2021

MSP430 does not support anything "atomic" and it's up to the IR producer to ensure that the IR is compatible with the target hardware. While it might look reasonable to expect that backend would lower unsupported stuff to anything else (e.g. libcalls or just drop atomic from the load), in practice it is not: we cannot expect that e.g. PowerPC or RISC-V backend would emulate x86 vector intrinsics :)

@cr1901
Copy link
Contributor Author

cr1901 commented Nov 23, 2021

and it's up to the IR producer to ensure that the IR is compatible with the target hardware

@asl Rust seems to think it is the backend's responsibility to accept certain IR regardless of arch- and atomic support is one of them. singlethreaded: true is the fallback for producing "IR compatible with the target hardware" in the case of msp430, but even this requires compiler fences. . Rust maintainers in the linked issue want the msp430 backend to support compiler fences. Which would also be a solution to this issue here.

I'm not making a value judgment either way- I just want my msp430 code to work :D. Would you accept compiler fence support in the msp430 LLVM backend1? If not, I'll continue to submit issues and PRs to gate msp430 behavior like this one (and get clarification from Rust upstream re: "no, msp430 is not going to implement compiler fences").

  1. Of course I'll submit this through Phabricator. I'm just asking here while you're around :P.

@asl
Copy link

asl commented Nov 23, 2021

@asl Rust seems to think it is the backend's responsibility to accept certain IR regardless of arch- and atomic support is one of them. singlethreaded: true is the fallback for producing "IR compatible with the target hardware" in the case of msp430, but even this requires compiler fences. . Rust maintainers in the linked issue want the msp430 backend to support compiler fences. Which would also be a solution to this issue here.

Well let me sincerely disagree with this. And let me amplify the issue we're talking here about. Consider the frontend will generate, say, ppc vector intrinsic. Or i256 division, or something else, because why not. Should it be a backend responsibility to take care of this? The common (unwritten contract) is no. It's up to frontend to generate the IR that is compatible with the given target (similar to ABI bits, btw).

Certainly the key point here is where to draw the borderline. If this is only simple thing that just drops some bits and lowers atomic stuff to normal loads / stores, then probably it could be fine. I just do not want some generalizations and precedents :)

@bjorn3
Copy link
Member

bjorn3 commented Nov 23, 2021

Consider the frontend will generate, say, ppc vector intrinsic.

No, it shouldn't handle ppc vector intrinsics as those are powerpc specific, but it should handle the platform independent vector intrinsics implemented by every other backend. In fact we recently got core::simd which uses the platform independent vector intrinsics on all platforms, even those that don't natively support simd. For those platforms the backend is responsible for lowering it to scalar instructions.

@cr1901
Copy link
Contributor Author

cr1901 commented Nov 23, 2021

For those platforms the backend is responsible for lowering it to scalar instructions.

Almost certainly msp430 doesn't support these either, but you shouldn't be doing simd on an msp430 ;).

I still think it's on topic for this issue to discuss what to do re: msp430 here because there's a few conflicting things going on now:

  1. I want to eventually get msp430 support stable. In the interim, I want to play nice w/ upstream Rust, so they don't start thinking "msp430 has too many exceptions catering to it".
    1. For instance: for a while, msp430 was the only backend using the no_integrated_as Target option, until it got integrated LLVM assembler support. At that point no_integrated_as was removed, to possibly be readded in a more constrained form.
  2. To remove exceptions catering to msp430, Upstream Rust wants the missing msp430 support to be implemented one way.
  3. @asl, the backend designer/maintainer working on the C side of things (I know you wrote the backend, are you still CODE_OWNER?), wants to maintain the msp430 backend in a different way.

Adding cfg(target_has_atomic_load_store = "ptr"), though it's the easiest solution, is another exception catering to msp430. I personally don't mind, but I'd like to avoid giving upstream reasons to consider removing msp430 :D.

@asl
Copy link

asl commented Nov 23, 2021

Well, let me be clear: I'm not opposing adding few emulations here and there. But I do not see point to support every and each IR feature on msp430 :)

@ducktec
Copy link

ducktec commented Nov 23, 2021

Regarding the situation in rv32imc, I have to admit that I'm a bit out of my depth here. I got this error in my CI builds and had the intention to report it when seeing that it somewhat lines up with the MSP430 issue.

This is what I understand the situation to be (please correct me if I got something wrong):

  • In [RISCV] Use lld as the default linker; Enable C extension; Add riscv32imc-unknown-none-elf target rust#53822 (comment) it was found for rv32imc that:

    tl;dr can't emit fence-based inline sequences for some operations and libcalls for other operations on the same memory location because those two might not synchronize correctly with each other.

    So, libcalls it is, even though I understand that from an ISA specification perspective converting this specific case to a regular load (without fencing) would be sound.
    With the specification of max_atomic_width: Some(0) it seems not possible to further detail that some atomic operations are feasible without fencing and thus do not contradict the tl;dr quoted above. It's just either all atomic load/store operations (including the ordered ones requiring fencing) or none at all, correct?

  • And from [RISCV] Disable Atomics on all Non-A RISC-V targets rust#66548 I take that the libcall to __atomic_load_4 should also not take place at all since it would require linking against some additional implementation that provides this?

So, if core::intrinsics::atomic_load_unordered inevitably leads to a libcall to __atomic_load_4 which is considered not acceptable, I understand it is necessary to make these statements conditional as proposed. Positive side effect: introducing this could not be blamed on msp430 ;).

@workingjubilee
Copy link
Member

workingjubilee commented Nov 29, 2021

For those platforms the backend is responsible for lowering it to scalar instructions.

Almost certainly msp430 doesn't support these either, but you shouldn't be doing simd on an msp430 ;).

People are not just using core::simd to twist pixels on consumer PCs or run high-power data processing on big iron, they are using core::simd as a high level API for expressing digital signal processing algorithms that are appropriate for low-power microcontrollers as well. I am of the understanding Texas Instruments even offers their own software library for this purpose.

I hope to offer alternate lowerings as a fallback, to make it easier for backends and reduce the maintenance burden, but as the generic vector ops functionally just express operations over an array-like construct (with some qualifications so that it can be easily optimized if the backend knows how), I don't see why, in principle, such IR cannot be lowered to small devices.

@cr1901
Copy link
Contributor Author

cr1901 commented Nov 29, 2021

@workingjubilee I don't maintain the LLVM backend (nor am I terribly familiar w/ it, tbh), and while that's something I need to/been meaning to fix, I'm the only person right now making sure Rust on msp430 works. I simply do not have the bandwidth to learn how to add all the unimplemented IR constructs, only for my contributions to potentially get rejected by upstream.

I'm not opposing adding few emulations here and there.

Although, based on above, if the only missing IR constructs are compiler fences and SIMD, then I can probably get away w/ saying "this is required for stabilizing Rust support, and this is the only reason I'm implementing it."

msp430 is extremely size sensitive and my own experience is loop unrolling is too expensive. I think msp430 should lower SIMD constructs to a basic loop.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 30, 2021
Bump compiler_builtins to 0.1.55 to bring in fixes for targets lackin…

…g atomic support.

This fixes a "Cannot select" LLVM error when compiling `compiler_builtins` for targets lacking atomics, like MSP430. Se rust-lang/compiler-builtins#441 for more info. This PR is a more general version of rust-lang#91248.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 30, 2021
Bump compiler_builtins to 0.1.55 to bring in fixes for targets lackin…

…g atomic support.

This fixes a "Cannot select" LLVM error when compiling `compiler_builtins` for targets lacking atomics, like MSP430. Se rust-lang/compiler-builtins#441 for more info. This PR is a more general version of rust-lang#91248.
@cr1901 cr1901 closed this as completed Dec 1, 2021
@workingjubilee
Copy link
Member

Hmm. If getting it upstreamed is a concern, then we can probably figure something out, either adjusting how we generate the IR in the first place, or even having LLVM run a "desugaring" pass for vector LLVMIR using the pass manager.

@awygle
Copy link

awygle commented Dec 3, 2021

Well, let me be clear: I'm not opposing adding few emulations here and there. But I do not see point to support every and each IR feature on msp430 :)

Would you be open to supporting AtomicFence on MSP430 in the same way as the AVR backend supports it? This should allow us to use singlethread: true for MSP430 in Rust.

If that sounds acceptable, I'd be happy to help with implementing it.

@cr1901
Copy link
Contributor Author

cr1901 commented Dec 3, 2021

cc: @asl What are your thoughts re: SIMD and AtomicFence? Are you open to those emulations? I think these are the only emulations required so that I can get libcore for msp430 enabled for building in CI (which is a big step to getting msp430 stabilized).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants