Optimize u8x8::trailing_zeros for AArch64 #193
Labels
A-AArch64
ARM 64-bit architecture
Blocked-LLVM
Bugs blocked on bugfixes in LLVM
Performance
Something isn't fast
LLVM's
cttz.v8i8
intrinsic is broken on AArch64 machines: #191Our current workaround just applies
u8::trailing_zeros
to each lane. With 8 lanes, that can be quite slow.It could be optimized by adapting LLVM's algorithm to Rust's AArch64 SIMD intrinsics (some may be missing and we would have to implement those as well: rust-lang/stdarch#40).
The text was updated successfully, but these errors were encountered: