Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neon vst1_* intrinsics copy many more bytes than they should. #1220

Closed
HEnquist opened this issue Sep 20, 2021 · 4 comments
Closed

neon vst1_* intrinsics copy many more bytes than they should. #1220

HEnquist opened this issue Sep 20, 2021 · 4 comments

Comments

@HEnquist
Copy link

This was introduces in this commit:
907cfb2

For example:

pub unsafe fn vst1q_f64(ptr: *mut f64, a: float64x2_t) {

pub unsafe fn vst1q_f64(ptr: *mut f64, a: float64x2_t) {
    copy_nonoverlapping(
        &a as *const float64x2_t as *const f64,
        ptr as *mut f64,
        size_of::<float64x2_t>(),
    )
}

The docs for copy_nonoverlapping says "Copies count * size_of::() bytes from src to dst"
The example above will get called with type T = f64 (with size 8 bytes), and count = size_of::<float64x2_t> = 16. So it will end up copying 8*16=128 bytes. That's quite a bit more than the 16 bytes it should have been.
The result is that it writes garbage to memory locations it should not touch.

https://doc.rust-lang.org/core/ptr/fn.copy_nonoverlapping.html

@hkratz
Copy link
Contributor

hkratz commented Sep 21, 2021

There is no instance of copy_nonoverlapping() remaining in the armv7/aarch64 intrinsics, so that all should be fixed in master, see e.g.

pub unsafe fn vst1q_f64(ptr: *mut f64, a: float64x2_t) {
write_unaligned(ptr.cast(), a);
}

But the stdarch submodule should be updated in the main Rust repo asap.

@HEnquist
Copy link
Author

Ah thanks, the submodule thing threw me off. Let's keep this open then until the changes have made it into the main Rust repo.

hkratz added a commit to rusticstuff/rust that referenced this issue Sep 21, 2021
This mainly fixes the critical issue of aarch64 store intrinsics
overwriting additional memory, see
rust-lang/stdarch#1220

Other changes:
* aarch64/armv7: additional vld1/vst1 intrinsics + perf fixes for existing ones
* armv7: Make FMA work with vfpv4
* Non-visible changes to the testing framework
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 27, 2021
Update stdarch submodule

This is mainly to fix the critical issue of aarch64 store intrinsics overwriting additional memory, see rust-lang/stdarch#1220

Changes:
* aarch64/armv7: additional vld1/vst1 intrinsics + perf fixes for existing ones
  * rust-lang/stdarch#1205
  * rust-lang/stdarch#1207
  * rust-lang/stdarch#1216
* armv7: Make FMA work with vfpv4 and optimize
  * rust-lang/stdarch#1219
* Non-visible changes to the testing framework
  * rust-lang/stdarch#1208
  * rust-lang/stdarch#1211
  * rust-lang/stdarch#1213
  * rust-lang/stdarch#1215
  * rust-lang/stdarch#1218
@pthariensflame
Copy link

Seems like rust-lang/rust#89145 is now merged!

@HEnquist
Copy link
Author

This works fine now in the latest nightly. Closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants