Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning std::simd and Rust on Arm v7 Neon float behavior #439

Open
workingjubilee opened this issue Sep 12, 2024 · 15 comments
Open

Aligning std::simd and Rust on Arm v7 Neon float behavior #439

workingjubilee opened this issue Sep 12, 2024 · 15 comments
Labels
C-bug Category: Bug

Comments

@workingjubilee
Copy link
Member

This is going to be a bit grisly: the Arm v7 Neon registers flush subnormals and Rust has defined floats as to deny flushing subnormals to be a valid behavior. If we want std::simd to align here with scalar ops, we will have to unfortunately kinda chuck the vector ops for non-integer operations.

Meta

rustc --version --verbose:

rustc 1.83.0-nightly (0ee7cb5e3 2024-09-10)
binary: rustc
commit-hash: 0ee7cb5e3633502d9a90a85c3c367eccd59a0aba
commit-date: 2024-09-10
host: x86_64-unknown-linux-gnu
release: 1.83.0-nightly
LLVM version: 19.1.0
@workingjubilee workingjubilee added the C-bug Category: Bug label Sep 12, 2024
@RalfJung
Copy link
Member

This seems like basically the same issue as rust-lang/rust#129880, but might be worth tracking in this repo as well I guess?

I guess stdarch is also affected, but arguably there it is okay to expose the underlying hardware behavior... that is, assuming we don't get unsoundness due to llvm/llvm-project#89885.

@workingjubilee
Copy link
Member Author

@RalfJung It has particular considerations for our API design yes.

@DemiMarie
Copy link

I don’t think it makes sense to expect vector operations to have defined subnormal behavior. There is too much hardware where perfect IEEE conformance is either impossible or requires software support code. Making flushing subnormals to zero permissible behavior is the only approach that allows for predictable runtime performance and predictable lowering to target-specific assembly.

@RalfJung
Copy link
Member

Unfortunately LLVM is unsound on hardware that flushes subnormals.

predictable runtime performance and predictable lowering

And completely unpredictable runtime behavior. Great.

@workingjubilee
Copy link
Member Author

@DemiMarie easily done, all it needs is a small fix in LLVMIR and SelectionDAG: llvm/llvm-project#30633

@calebzulawski
Copy link
Member

Is it unpredictable because of reordering? I don't see what can be accomplished that doesn't make std::simd useless on armv7 or ppc other than allowing ftz

@RalfJung
Copy link
Member

RalfJung commented Nov 10, 2024

It is unpredictable in the sense of giving different results on different targets, and (depending on what semantics LLVM implements once they properly support NEON on 32-bit ARM, which currently they do not) different optimization levels and different ways of writing the same code.

@calebzulawski
Copy link
Member

Considering these are old targets I'm not expecting a huge push to fix the backends, but would simply disallowing certain optimizations be sufficient? We do note in the std::simd docs that ftz will happen on some targets. We could e.g. expose a cfg value if necessary.

@RalfJung
Copy link
Member

I mean we could try to disable the scalar evolution pass and hope that this suffices. But that's far from a robust solution, so it's not really aligned with Rust's values IMO.

@RalfJung
Copy link
Member

Anyway I think portable-simd has a lot of things to resolve before this becomes a pressing question. Right now, not even the core::arch operations are stable on ARM32.

@DemiMarie
Copy link

Unfortunately LLVM is unsound on hardware that flushes subnormals.

predictable runtime performance and predictable lowering

And completely unpredictable runtime behavior. Great.

This can be worked around by implementing the relevant intrinsics using LLVM inline assembly instead.

@RalfJung
Copy link
Member

That would not achieve the "predictable runtime performance" part of your goals, as the optimizer would have to treat this like a black box.

And behavior would still be unpredictable in the sense of differing across architectures. So IMO it would also be reasonable to say that portable-simd is simply not supported on 32bit ARM, and only provide core::arch primitives where people are hopefully aware of the semantic pitfalls.

But anyway as I said, we're likely years away from this being a high-priority question. First all of the rest of the portable-simd API needs to be worked out...

@DemiMarie
Copy link

That would not achieve the "predictable runtime performance" part of your goals, as the optimizer would have to treat this like a black box.

Is the optimizer actually able to usefully reason about SIMD intrinsics anyway? The optimizer can (IIUC) be informed that the operations don’t access memory and can be elided if their result is not needed. My understanding is that SIMD programmers typically use the compiler as a glorified register allocator and so don’t particularly care about other optimizations. Is this accurate?

@RalfJung
Copy link
Member

RalfJung commented Nov 10, 2024

The simd_* intrinsics, which are used for everything in portable-simd, are fully understood by LLVM and can be optimized like scalar operations. I don't know how much that matters in practice, but const-folding does seem like a useful optimization even for SIMD.

@DemiMarie
Copy link

I think it would be better to have SIMD that cannot be constant-folded than to not have SIMD at all.

@rust-lang rust-lang locked as too heated and limited conversation to collaborators Nov 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-bug Category: Bug
Projects
None yet
Development

No branches or pull requests

4 participants