Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation of Bevy (and some wgpu) apps resulting in segfault on macOS. #117902

Closed
inodentry opened this issue Nov 14, 2023 · 46 comments · Fixed by #118936
Closed

Miscompilation of Bevy (and some wgpu) apps resulting in segfault on macOS. #117902

inodentry opened this issue Nov 14, 2023 · 46 comments · Fixed by #118936
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-macos Operating system: macOS P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Milestone

Comments

@inodentry
Copy link

inodentry commented Nov 14, 2023

Recent rustc seems to miscompile something somewhere in the dependency tree of Bevy, causing apps to crash with a segfault.

The issue seems to only manifest when optimizations are enabled and codegen units are > 1. The default cargo release configuration triggers it. Unoptimized builds are fine. Setting codegen-units = 1 with opt-level = 3 also seems to be fine.

I am on macOS 14.1 Sonoma, on Apple M1 Pro hardware.

Code

You can reproduce with any Bevy application. It is a huge code base, I know, I'm sorry I can't find a more minimal example.

Clone the Bevy repo: https://github.com/bevyengine/bevy

all these revs have the issue:

  • commit bf4f4e42da3da07b0e84a4d6e40077f7398aea8b (i had that one laying around)
  • commit 749f3d74305d35b4a453a80e45b22f08d5189a48 (current latest)
  • tag v0.12.0
  • tag v0.11.3

Run some example in release mode:

cargo run --example bevymark --release

Version it worked on

The previous 1.74.0-beta.7 did not have the issue. I am not sure which nightly introduced the regression.

Version with regression

I first noticed the issue when I installed the 2023-11-11 nightly, though it might also be present in earlier nightlies. I just installed 1.75.0-beta.1 and updated my nightly, and can confirm the issue is present there, too.

rustc 1.75.0-beta.1 (782883f60 2023-11-12)
binary: rustc
commit-hash: 782883f609713fe9617ba64d90086742ec62d374
commit-date: 2023-11-12
host: aarch64-apple-darwin
release: 1.75.0-beta.1
LLVM version: 17.0.4
rustc 1.76.0-nightly (ba7c7a301 2023-11-13)
binary: rustc
commit-hash: ba7c7a301984967c8c13adb580ef9b86ba706a83
commit-date: 2023-11-13
host: aarch64-apple-darwin
release: 1.76.0-nightly
LLVM version: 17.0.4
@inodentry inodentry added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Nov 14, 2023
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 14, 2023
@Noratrieb

This comment was marked as off-topic.

@inodentry

This comment was marked as off-topic.

@Noratrieb

This comment was marked as off-topic.

@nikic nikic added the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Nov 14, 2023
@Noratrieb Noratrieb added E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 14, 2023
@Noratrieb
Copy link
Member

It would be useful if you could use cargo-bisect-rustc to bisect the regression to a single PR, which would make it easier to figure out what's happening and whether it really is a miscompilation or just UB that's now going wrong in bevy's dep tree.

@Noratrieb
Copy link
Member

I was not able to reproduce this on Windows. Marking as O-MacOS, but I suspect that the problem is platform-independent and just only manifests in platform-specific bevy code.

@Noratrieb Noratrieb added the O-macos Operating system: macOS label Nov 14, 2023
@inodentry
Copy link
Author

inodentry commented Nov 14, 2023

OK. TIL about cargo-bisect-rustc. Installing it and reading the documentation now. Will try it in a moment.

Edit: I created a simple bevy project that should automatically exit after the first frame is rendered to the screen. This should be scriptable. It will exit successfully on older rustc, or segfault on new rustc. cargo bisect-rustc is running now, as I am typing this.

Here is the bevy code if anyone else wants to try it:

use bevy::prelude::*;
fn main() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_systems(Update, exit)
        .run();
}
fn exit(mut evw: EventWriter<bevy::app::AppExit>) {
    evw.send(bevy::app::AppExit);
}

@leorott
Copy link

leorott commented Nov 14, 2023

I get

2023-11-14T09:21:52.380500Z  INFO bevy_render::renderer: AdapterInfo { name: "Apple M1", vendor: 0, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Metal }
bevymark(12860,0x1df7b1300) malloc: *** error for object 0x16d0c8d98: pointer being freed was not allocated
bevymark(12860,0x1df7b1300) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    12860 abort      cargo run --example bevymark --release

On MacOS 14.0

@inodentry
Copy link
Author

Bisected down to:
Regression in nightly-2023-10-19
Regression in 5d5edf0

@nikic nikic removed the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Nov 14, 2023
@Noratrieb
Copy link
Member

oh well, that is sadly not a very useful bisection. it enabled more inlining, causing more optimizations to be applied overall. @saethlin anyways cause you may be interested in this, but I can't imagine that your PR is at fault
can you try enabling LTO, which should get the same behavior on earlier versions and see whether that crashes as well and then bisect that?
Additionally, it would be useful to use a debugger to figure out where the segfault occurred and maybe why it did. It may be possible to figure out what got compiled weirdly, which could lead us to be able to figure out whether it's UB in the code.

@saethlin
Copy link
Member

saethlin commented Nov 14, 2023

That makes me wonder if Bevy is doing this thing that Embassy is (was?) doing: #117047

i.e. if you are relying on the address of an empty function being the same every time it is inspected, it must be one of #[inline(never)] or #[no_mangle] and I'd advise the latter.

@inodentry
Copy link
Author

@Nilstrieb Just tested it. LTO does not cause a segfault in earlier versions, and does cause a segfault in new versions. It does not seem to affect the results.

Also, like I said, setting codegen-units = 1 in the new rustc does not result in segfault. Even though it should be enabling the most inlining optimizations. It is quite interesting.

The issue occurs only with optimized builds with >1 codegen units, regardless of LTO.

@inodentry
Copy link
Author

inodentry commented Nov 15, 2023

Ran in lldb:

* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x4b8c64f1019f8cf8)
    frame #0: 0x00000001013c70dc bevyregress`wgpu_types::TextureFormat::required_features::h5d814caf5b611efe + 20
bevyregress`wgpu_types::TextureFormat::required_features::h5d814caf5b611efe:
->  0x1013c70dc <+20>: ldrb   w11, [x9, x8]
    0x1013c70e0 <+24>: add    x10, x10, x11, lsl #2
    0x1013c70e4 <+28>: br     x10
    0x1013c70e8 <+32>: ret
Target 0: (bevyregress) stopped.
(lldb) bt
* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x4b8c64f1019f8cf8)
  * frame #0: 0x00000001013c70dc bevyregress`wgpu_types::TextureFormat::required_features::h5d814caf5b611efe + 20
    frame #1: 0x00000001013cdb70 bevyregress`wgpu_core::device::resource::Device$LT$A$GT$::create_texture::hbdd987e61dd6d7c5 + 212
    frame #2: 0x0000000101393f94 bevyregress`wgpu_core::device::global::_$LT$impl$u20$wgpu_core..global..Global$LT$G$GT$$GT$::device_create_texture::h73a7bf7e89af0308 + 348
    frame #3: 0x00000001013f8684 bevyregress`_$LT$wgpu..backend..direct..Context$u20$as$u20$wgpu..context..Context$GT$::device_create_texture::h81dbb4a560f1aead + 316
    frame #4: 0x00000001014013e4 bevyregress`_$LT$T$u20$as$u20$wgpu..context..DynContext$GT$::device_create_texture::h7fac5a1728c0e87e + 60
    frame #5: 0x00000001014a63a8 bevyregress`_$LT$wgpu..Device$u20$as$u20$wgpu..util..device..DeviceExt$GT$::create_texture_with_data::h91ea9c74edbd4d11 + 168
    frame #6: 0x0000000100866444 bevyregress`bevy_render::texture::fallback_image::fallback_image_new::h25cda84b669fe296 (.llvm.2045819193268999333) + 548
    frame #7: 0x0000000100866be4 bevyregress`_$LT$bevy_render..texture..fallback_image..FallbackImage$u20$as$u20$bevy_ecs..world..FromWorld$GT$::from_world::ha952e13675380366 + 1032
    frame #8: 0x0000000100981d44 bevyregress`_$LT$bevy_render..texture..ImagePlugin$u20$as$u20$bevy_app..plugin..Plugin$GT$::finish::h832044bf73ce2c5e + 1204
    frame #9: 0x00000001000e3490 bevyregress`bevy_app::app::App::finish::hba9fb047755ddcd5 + 104
    frame #10: 0x00000001000e31ec bevyregress`bevy_app::app::App::run::h39b5f7f41472f60b + 156
    frame #11: 0x0000000100fa00c8 bevyregress`bevyregress::main::hdc8715a225225966 + 128
    frame #12: 0x0000000100fa147c bevyregress`std::sys_common::backtrace::__rust_begin_short_backtrace::h55cb9fdb91c84ca5 + 12
    frame #13: 0x0000000100f9d89c bevyregress`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h9fc2206eb4cf6bb0 + 16
    frame #14: 0x00000001012e4fd0 bevyregress`std::rt::lang_start_internal::hd092a83c379773e8 + 636
    frame #15: 0x0000000100f9d880 bevyregress`std::rt::lang_start::hf53aafa686c05f03 + 44
    frame #16: 0x0000000100fa0154 bevyregress`main + 32
    frame #17: 0x000000018fb410e0 dyld`start + 2360
(lldb) disassemble
bevyregress`wgpu_types::TextureFormat::required_features::h5d814caf5b611efe:
    0x1013c70c8 <+0>:  mov    x8, x0
    0x1013c70cc <+4>:  mov    x0, #0x0
    0x1013c70d0 <+8>:  adrp   x9, 1585
    0x1013c70d4 <+12>: add    x9, x9, #0xce2            ; anon.3067676b3fe55e5f377b438d717236ea.7.llvm.8301794337721405882 + 157
    0x1013c70d8 <+16>: adr    x10, #0x10                ; <+32>
->  0x1013c70dc <+20>: ldrb   w11, [x9, x8]
    0x1013c70e0 <+24>: add    x10, x10, x11, lsl #2
    0x1013c70e4 <+28>: br     x10
    0x1013c70e8 <+32>: ret
    0x1013c70ec <+36>: mov    w0, #0x2000000
    0x1013c70f0 <+40>: ret
    0x1013c70f4 <+44>: mov    w0, #0x4000000
    0x1013c70f8 <+48>: ret
    0x1013c70fc <+52>: mov    w0, #0x20000000
    0x1013c7100 <+56>: ret
    0x1013c7104 <+60>: mov    w0, #0x1000000
    0x1013c7108 <+64>: ret
    0x1013c710c <+68>: cmp    w1, #0x2
    0x1013c7110 <+72>: mov    w8, #0x40000000
    0x1013c7114 <+76>: mov    w9, #0x8000000
    0x1013c7118 <+80>: csel   x0, x9, x8, lo
    0x1013c711c <+84>: ret

Seems to be an invalid memory access in wgpu_types::TextureFormat::required_features?

Included a backtrace, and disassembly of the function, if that helps.

@SkiFire13
Copy link
Contributor

That makes me wonder if Bevy is doing this thing that Embassy is (was?) doing: #117047

Bevy does something similar for its apply_deferred system, but it doesn't rely on function pointer equality, instead it erases the function item into a trait object and uses its TypeId for checking, which shouldn't suffer from the same problem.

@Patryk27
Copy link
Contributor

It looks like disabling cross-crate inlining "solves" the issue, i.e.:

RUSTFLAGS='-Zcross-crate-inline-threshold=0' cargo run --release

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 18, 2023
@lqd lqd removed the regression-untriaged Untriaged performance or correctness regression. label Nov 18, 2023
@saethlin
Copy link
Member

required_features leads me to describe_format_features, which if I put #[inline(never)] on prevents the segfaults. But above that is create_texture which is huge and does a lot of things, and adding #[inline(never)] to that function has no effect. So I think the unexpected codegen happens while optimizing wgpu_core::Device::create_texture, which at least I suppose narrows things down from the whole crate graph.

@inodentry
Copy link
Author

Oh ... great ... so this is a miscompilation bug that also exists in stable, which could have been triggered if some specific code was inlined. Now rustc just inlines it automatically. So it isn't a codegen regression, but an old/preexisting bug exposed by new rustc enabling more aggressive optimizations.

Damn. I'm so glad I reported this. Never could have imagined that it would go so deep...

@saethlin
Copy link
Member

I am not yet convinced that this is a miscompilation as opposed to UB that is now exploited.

@lqd
Copy link
Member

lqd commented Nov 29, 2023

required_features [...] wgpu_core::Device::create_texture

These 2 functions are the closest to the segfault in the backtrace, so the problem surely lies around there. We could likely extract them both and hopefully repro outside of wgpu.

@saethlin saethlin self-assigned this Nov 30, 2023
@saethlin
Copy link
Member

saethlin commented Dec 2, 2023

I have not managed to separate the strangely-behaving code out from wgpu, but I have learned some things.

required_features is very strange. If I stick #[inline(never)] on it, LLVM generates this function signature:

define noundef i64 @_ZN10wgpu_types13TextureFormat17required_features17hd1a6387d4186056bE(ptr noalias nocapture noundef readonly align 4 dereferenceable(12) %self) unnamed_addr #6 !dbg !2239 {

Which is exactly what you'd expect. But if I remove the attribute and leave cross-crate-inlining enabled, or add #[inline] we get this signature:

define internal fastcc noundef i64 @_ZN10wgpu_types13TextureFormat17required_features17hd1a6387d4186056bE(i32 %self.0.val, i32 %self.8.val) unnamed_addr #1 !dbg !24733 {

So LLVM has done some kind of cross-function optimization here. The IR for the function bodies looks exactly the same to me, except for the fact that one does two gepis and the other just uses its arguments.

And of course they are lowered to slightly different assembly. Here is the #[inline(never)] one:

100225494: aa0003e8     mov     x8, x0
100225498: b9400009     ldr     w9, [x0]
10022549c: d2800000     mov     x0, #0
1002254a0: 9000058a     adrp    x10, 0x1002d5000
1002254a4: 911b414a     add     x10, x10, #1744
1002254a8: 1000008b     adr     x11, #16
1002254ac: 3869694c     ldrb    w12, [x10, x9]
1002254b0: 8b0c096b     add     x11, x11, x12, lsl #2
1002254b4: d61f0160     br      x11

And the #[inline] one:

100053860: aa0003e8     mov     x8, x0
100053864: d2800000     mov     x0, #0
100053868: 90001289     adrp    x9, 0x1002a3000
10005386c: 91032529     add     x9, x9, #201
100053870: 1000008a     adr     x10, #16
100053874: 3868692b     ldrb    w11, [x9, x8]
100053878: 8b0b094a     add     x10, x10, x11, lsl #2
10005387c: d61f0140     br      x10

Directly after both of these is a small jump table. But I think what's down there is irrelevant, because in the #[inline] one, when we br on it, x10 contains an address that is comfortably outside this function.

So I think whatever is going on here relies on required_features getting this SRoA codegen, but also not be inlined.


lldb has been supremely anti-helpful. It says breakpoints are set then blows past them (realizing this was happening took me a long time), and if I do too much instruction stepping it segfaults.

@Noratrieb
Copy link
Member

I think what's happening with the signature is argument promotion, where it replaces an indirect argument with a direct one.

@lqd
Copy link
Member

lqd commented Dec 2, 2023

I've removed 99% of wgpu in https://github.com/lqd/repro-117902, but it still currently clocks in at 1000 lines or so.

I have not managed to separate the strangely-behaving code out from wgpu

It's indeed way harder than one would hope. It's quite fiddly, maybe there's a magic value of the inlining threshold for different shapes, combined with various inline never/always attributes, but I didn't manage to easily merge more of the crates together without making the issue disappear; there are still 2 wgpu-* crates in addition to the main binary.

There are still a bunch of backends/API/etc wgpu abstractions in the repro so it can still be reduced more, but I'm not sure I'll have much more time to dedicate to this. At the very least, there are still two dependencies out of three that shouldn't be hard to remove (thiserror and bitflags) .
But there are limits/roadblocks to how much we can minimize without having thresholds/inline attributes that make this reproduce all the time, as some of these sizes impact inlining decisions.

I still don't know if this is a miscompile or UB (maybe slightly leaning towards the former):

  • if it's the latter, it's maybe not in wgpu: there's no unsafe code in the repro per se, but there's surely some in parking_lot.
  • it's possible that parking_lot on mac exhibits an issue though. I think I removed all the osx/metal specific code, it could be cool to test on aarch64-linux (maybe on the dev-desktops) and on an intel mac, to see what happens there.
  • switching from parking_lot to the std mutex and rwlock also made the segfault disappear but it's not clear whether it could just be some different behavior or different codegen that would stop triggering the issue.
  • the number of variants in the TextureFormat enum seem to matter. There needs to be quite a lot or the problem disappears.

@saethlin
Copy link
Member

saethlin commented Dec 2, 2023

I have some ideas on how to shrink this further and I'm starting to make progress...

@saethlin
Copy link
Member

saethlin commented Dec 3, 2023

My latest progress brings the line count down to 221 lines, and there is precious little logic left, it's mostly bitflags and enum definitions. It's in a PR to the above repo. It got way more consistent when I found the path to remove the locks.

I'm sure the remaining bits of wgpu_types could be cleaned up. And I didn't try removing any of the crate boundaries, though I suspect those are required.

At this point I'm quite sure this is a miscompile. There's no unsafe code left in the repo, and Miri doesn't complain.

I've tried running this on aarch64-unknown-linux-gnu and x86_64-unknown-linux-gnu, and neither segfault. I doubt this is an OS issue, perhaps there is something different between the default aarch64-apple-darwin target and the aarch64-unknown-linux-gnu target that would be enlightening.

@saethlin saethlin added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Dec 3, 2023
@saethlin
Copy link
Member

saethlin commented Dec 3, 2023

Also somehow the MIR inliner is still required.

RUSTFLAGS="-Zmir-opt-level=0 -Zinline-mir -Zinline-mir-hint-threshold=12 -Zinline-mir-threshold=0" cargo r --release

Though it's not clear if that's because the bug is in the MIR inliner or if only post-inlining do we generate the exact LLVM IR pattern that gets miscompiled.

@lqd
Copy link
Member

lqd commented Dec 3, 2023

The repro is now at around 150 lines, with 1 bin and 1 lib. It's minimal enough e.g. for a test or LLVM bug report.

Also somehow the MIR inliner is still required.

@saethlin Doesn't -Zinline-mir=no fully turn off the inliner? Because the segfault reproduces for me even with this flag.

In any case, the LLVM optimization pass that seems to start introducing the segfault is Machine Copy Propagation Pass on function (repro_117902::TextureFormat::required_features) from the lib — RUSTFLAGS="-Cllvm-args=-opt-bisect-limit=1255" to reproduce this on the latest revision (with -Zinline-mir=no or -Zinline-mir=no -Zmir-opt-level=0, the limit is 1404 on the lib).

cc @DianQK

@saethlin
Copy link
Member

saethlin commented Dec 3, 2023

Yes that should turn off the MIR inliner. I was aiming for a reproducer that has -Zmir-opt-level=0 because that would turn off all MIR optimizations, and I found that when I added that flag I stopped getting segfaults unless I enabled MIR inlining (which is a pass, but it also has its own special flag).

@lqd
Copy link
Member

lqd commented Dec 3, 2023

Maybe the smaller repro behaves slightly differently now, -Zinline-mir=no -Zmir-opt-level=0 still segfaults for me, interestingly enough.

@ActuallyHappening
Copy link

It's nice to know I'm not the only one experiencing random segfaults in my bevy apps on nightly, and that it is being actively explored!
I have no rustc dev experience to offer, but I did notice another compiler bug (original) here, but I'm pretty sure they are unrelated.

@DianQK
Copy link
Member

DianQK commented Dec 4, 2023

In any case, the LLVM optimization pass that seems to start introducing the segfault is Machine Copy Propagation Pass on function (repro_117902::TextureFormat::required_features) from the lib — RUSTFLAGS="-Cllvm-args=-opt-bisect-limit=1255" to reproduce this on the latest revision (with -Zinline-mir=no or -Zinline-mir=no -Zmir-opt-level=0, the limit is 1404 on the lib).

cc @DianQK

Nice reproduction. But I can't reproduce it on macOS x86_64. Purchasing M1.
I have written very less Machine IR related code. But I'll try to look into it when the M1 arrives.

@DianQK
Copy link
Member

DianQK commented Dec 7, 2023

Upstream issue: llvm/llvm-project#74680.
I tried to submit a fix as well.

Base on https://github.com/lqd/repro-117902, I got

fn main() {
    call_from_main();
}

#[inline(never)]
#[no_mangle]
fn call_from_main() {
    let desc = TextureDescriptor(TextureFormat::Bc6hRgbUfloat);
    device_create_texture(desc);
}

#[inline(never)]
#[no_mangle]
pub fn device_create_texture(desc: TextureDescriptor) {
    let format = desc.0;
    format.required_features().used();
    format.used();
}

enum Feature {
    A,
    B,
}

impl Feature {
    #[inline(never)]
    fn used(self) {
        core::hint::black_box(self);
    }
}

#[repr(C)]
pub enum AstcBlock {
    B12x12,
}
pub enum AstcChannel {
    Unorm,
    Hdr,
}
#[repr(C)]
pub enum TextureFormat {
    Rgba8UnormSrgb,
    Bc6hRgbUfloat,
    Key,
    Astc {
        channel: AstcChannel,
        block: AstcBlock,
    },
}
impl TextureFormat {
    #[inline(never)]
    fn used(&self) {
        core::hint::black_box(self);
    }
    #[inline(never)]
    // #[no_mangle]
    fn required_features(&self) -> Feature {
        match self {
            Self::Rgba8UnormSrgb => Feature::A,
            Self::Bc6hRgbUfloat => Feature::B,
            Self::Key => Feature::B,
            Self::Astc { channel, .. } => match channel {
                AstcChannel::Hdr => Feature::A,
                AstcChannel::Unorm => Feature::B,
            },
        }
    }
}
pub struct TextureDescriptor(pub TextureFormat);

@DianQK
Copy link
Member

DianQK commented Dec 8, 2023

Hmm, I can only use nightly to reproduce, and I can't use local stage1/stage2 to reproduce. This could be a difference in build configuration. So I think this minimal reproduction is unstable. If we want to use this as a test case, I would expect a more stable test case.

I added a more detailed explanation of this issue at llvm/llvm-project#74682 (comment). We should be able to create a must-be-dirty stack to reproduce.

@DianQK
Copy link
Member

DianQK commented Dec 8, 2023

I got a "perfect" test case.

I can reproduce it with nightly, 1.74.1 and even 1.69.0.

fn main() {
    let desc = TextureDescriptor(TextureFormat::A, 1, 1);
    device_create_texture(desc);
}

#[inline(never)]
#[no_mangle]
pub fn device_create_texture(desc: TextureDescriptor) {
    let format = desc.0;
    assert_eq!(format.required_features(desc.1), Feature::A);
}

#[derive(PartialEq, Eq, Debug)]
enum Feature {
    A,
    B,
}

#[repr(i32)]
pub enum TextureFormat {
    A,
    B,
    C,
    D,
}

impl TextureFormat {
    #[inline(never)]
    fn required_features(&self, v: i8) -> Feature {
        match self {
            Self::A => Feature::A,
            Self::B => Feature::B,
            Self::C => Feature::B,
            Self::D => match v {
                0 => Feature::A,
                _ => Feature::B,
            },
        }
    }
}

pub struct TextureDescriptor(pub TextureFormat, i8, i16);

By examining the generated instructions, the following instruction sets x0 to 0x0001000100000000:

mov     x0, #4294967296
movk    x0, #1, lsl #48

godbolt: https://rust.godbolt.org/z/vnnvEszoP

@saethlin saethlin removed their assignment Dec 10, 2023
@bors bors closed this as completed in 604f185 Dec 15, 2023
@lqd
Copy link
Member

lqd commented Dec 15, 2023

Reopening to track beta backport in #118994.

@lqd lqd reopened this Dec 15, 2023
@lqd
Copy link
Member

lqd commented Dec 16, 2023

This can now be closed: this issue was fixed on nightly by the cherry-picked LLVM fix in #118936, and fixed on beta by backporting that PR in #118994, so it thankfully won't reach stable in a couple weeks. Thanks in particular to @saethlin and @DianQK.

@lqd lqd closed this as completed Dec 16, 2023
@apiraino apiraino removed the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-macos Operating system: macOS P-critical Critical priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.