s390x vector facilities support #130869

taiki-e · 2024-09-26T04:05:23Z

This tracks the status of s390x vector facilities support in rustc and standard libraries.

ABI support

support z13 vector ABI: pending in Support s390x z13 vector ABI #131586

rust/compiler/rustc_target/src/abi/call/s390x.rs

Lines 1 to 2 in 58420a0

    
           // FIXME: The assumes we're using the non-vector ABI, i.e., compiling 
        
           // for a pre-z13 machine or using -mno-vx.

remove explicit disabling of the vector feature (blocked on the above FIXME): pending in Support s390x z13 vector ABI #131586

rust/compiler/rustc_target/src/spec/targets/s390x_unknown_linux_gnu.rs

Lines 9 to 11 in 58420a0

    
           // FIXME: The ABI implementation in abi/call/s390x.rs is for now hard-coded to assume the no-vector 
        
           // ABI. Pass the -vector feature string to LLVM to respect this assumption. 
        
           base.features = "-vector".into();

target_feature support
- support nightly-only cfg(target_feature = "vector") and unstable target_feature(enable = "vector") (under feature(s390x_target_feature)): done in rustc_target: add known safe s390x target features #127506
- support other vector facility-related target features: vector-enhancements-1, vector-enhancements-2, etc. (see also Missing support for target features on s390x #88937)
- stabilize target_feature = "vector" (and other vector-related target features if available)
  At least blocked until ABI support done, and may be more things are needed given the precedent of postponed stabilization of vector features in RISC-V.
asm support
- support clobber-only vector registers: done in Support clobber_abi and vector/access registers (clobber-only) in s390x inline assembly #130630
- stabilize feature(asm_experimental_arch) on s390x: pending in Stabilize s390x inline assembly #131258
- support #[repr(simd)] types in input/output of asm (i.e., fully support vector registers): pending in Support #[repr(simd)] types in input/output of s390x inline assembly #131664
  Both LLVM and GCC do not document v constraint, but actually seem to be supported.
  Probably blocked until ABI support done.
core_arch support
- add unstable vector intrinsics to core::arch::s390x
  Probably blocked until ABI support done.
std_detect support
- support unstable is_s390x_feature_detected!("vector"):

@rustbot label +O-SystemZ

The text was updated successfully, but these errors were encountered:

taiki-e · 2024-09-26T04:18:20Z

cc @uweigand

uweigand · 2024-09-30T15:26:45Z

Thanks for putting together this list! I can certainly look into providing proper vector register ABI support.

uweigand · 2024-10-11T15:50:33Z

I've looked into this a bit, but ran into a problem I'm not sure how to address. The main issue is that on s390x, we use a different ABI depending on whether or not the vector feature is active (i.e. the processor has vector registers). The difference affects only vector types (which I guess would correspond to #repr(simd) types in Rust), in the following manner:

Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.
Calling convention: If the vector ABI is not present, all vector types are passed and returned like aggregates. If the vector ABI is present, vector types up to 16 bytes in size are instead passed and returned in vector registers. For arguments (but not return values) the same applies also to aggregates containing only a single element of vector type.

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

taiki-e · 2024-10-11T18:05:17Z

@uweigand Thanks for the investigation!

Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option.
I will try to see if I can implement that approach...

taiki-e · 2024-10-11T18:41:52Z

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option.
I will try to see if I can implement that approach...

I haven't done much testing yet, but this approach appears to be working: master...taiki-e:rust:s390x-vector-abi

// no_vector: mvc 8(8,%r2), 8(%r3)
// no_vector-NEXT: mvc 0(8,%r2), 0(%r3)
// no_vector-NEXT: br %r14
// vector: vl %v24, 0(%r2), 3
// vector-NEXT: br %r14
#[no_mangle]
extern "C" fn vector(x: &i8x16) -> i8x16 {
    *x
}

uweigand · 2024-10-12T00:33:05Z

@uweigand Thanks for the investigation!

Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The point is that there's a difference between the LLVM IR vector types and the C/C++ language vector types. The LLVM IR vector types are indeed always 8-byte aligned now. But the C/C++ language vector types still use different ABI alignment rules depending on the vector feature. Clang handles this by mapping the C/C++ type not directly to an LLVM IR type (leaving LLVM to use its own alignment rules), but rather to an LLVM IR type with an explicit alignment override (which implements the appropriate ABI alignment rule).

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option. I will try to see if I can implement that approach...

Ah, interesting! It seems this

    let unstable_target_features = codegen_backend.target_features(sess, true);

calls back into the LLVM back-end, which does already know which features are available by default depending on the target-cpu setting. I wasn't aware of that ... I'm wondering whether/how this works when using another codegen backend like GCC or Cranelift?

Otherwise, your patch looks good to me, except for

    if abi == Vector && size.bits() == 128 && contains_vector(cx, ret.layout) {

We also pass vectors smaller than 128 bits in VRs if available, so I'm not sure this size check is correct. (Then again, your vector_small test case already seems to do the right thing, so I may be missing something here?)
The logic should be different between the argument and return case: single-element vector aggregates are treated like vectors only as arguments, not as return types (that's a bit of an unfortunate ABI choice, but it matches what we historically did for single-element float aggregates). The wrapper tests all show the incorrect behavior for return values.

taiki-e · 2024-10-12T07:11:15Z

@uweigand Thanks for the investigation!

Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The point is that there's a difference between the LLVM IR vector types and the C/C++ language vector types. The LLVM IR vector types are indeed always 8-byte aligned now. But the C/C++ language vector types still use different ABI alignment rules depending on the vector feature. Clang handles this by mapping the C/C++ type not directly to an LLVM IR type (leaving LLVM to use its own alignment rules), but rather to an LLVM IR type with an explicit alignment override (which implements the appropriate ABI alignment rule).

Ah, thanks for the clarification. I'm not sure how to handle it either, but considering that the current Rust SIMD types cannot be used with C FFI in the first place, and that even if it could be used, the requirement would be that the target feature be enabled, we may actually not have to very worry about the case where the vector feature is disabled here.

~~RFC 2574 to allows using SIMD types in C FFI says:~~

> Architecture-specific vector types require #[target_feature]s to be FFI safe. That is, they are only safely usable as part of the signature of extern functions if the function has certain #[target_feature]s enabled.

EDIT: I think #130869 (comment) 's approach will work.

 let unstable_target_features = codegen_backend.target_features(sess, true);
calls back into the LLVM back-end, which does already know which features are available by default depending on the target-cpu setting. I wasn't aware of that ... I'm wondering whether/how this works when using another codegen backend like GCC or Cranelift?

rustc_codegen_gcc seems to have code to handle -C target-cpu and -C target-feature, but I have not tested if it works as expected here.

IIRC rustc_codegen_cranelift does not currently support the target feature at all (rust-lang/rustc_codegen_cranelift#1400), so I suspect that vector ABI cannot be enabled from any way.

We also pass vectors smaller than 128 bits in VRs if available, so I'm not sure this size check is correct. (Then again, your vector_small test case already seems to do the right thing, so I may be missing something here?)

It looks like the vector_small case is processed in the branch before processing the 128-bit vector.

rust/compiler/rustc_target/src/abi/call/s390x.rs

Lines 35 to 38 in a805e3e

    
           if !ret.layout.is_aggregate() && size.bits() <= 64 { 
        
               ret.extend_integer_width_to(64); 
        
               return; 
        
           }

However, I think we should handle it explicitly in the if abi == Vector && branch. Posted fixed version in #131586.

The logic should be different between the argument and return case: single-element vector aggregates are treated like vectors only as arguments, not as return types (that's a bit of an unfortunate ABI choice, but it matches what we historically did for single-element float aggregates). The wrapper tests all show the incorrect behavior for return values.

Thanks for pointing that out. I tried to fix this, but it appears that the ABI of Wrapper<Vector> is considered Vector, even without #[repr(transparent)] / #[repr(C)]...

taiki-e · 2024-10-12T08:27:58Z

Rust plans to disallow passing vector types to non-Rust ABI if the required target feature is disabled (#127731), so I think the issue of ABI differences with C/C++ when vector target feature is disabled could be resolved by extending the ABI checks used for that to handle s390x as well.

uweigand · 2024-10-12T12:44:02Z

Thanks for pointing that out. I tried to fix this, but it appears that the ABI of Wrapper is considered Vector, even without #[repr(transparent)] / #[repr(C)]...

Hmm. Your test case uses the default (Rust) representation for the Wrapper type - my understanding is that this does not actually guarantee interoperability with the native platform ABI. Only #[repr(C)] comes with that guarantee. Does the test work if you use this for the Wrapper type?

Rust plans to disallow passing vector types to non-Rust ABI if the required target feature is disabled (#127731), so I think the issue of ABI differences with C/C++ when vector target feature is disabled could be resolved by extending the ABI checks used for that to handle s390x as well.

That looks like a reasonable approach to me as well.

taiki-e · 2024-10-12T13:07:45Z

Hmm. Your test case uses the default (Rust) representation for the Wrapper type - my understanding is that this does not actually guarantee interoperability with the native platform ABI. Only #[repr(C)] comes with that guarantee. Does the test work if you use this for the Wrapper type?

Oh, you are right. Changing Wrapper to #[repr(C)] worked as expected. Thanks!

rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-SystemZ Target: SystemZ processors (s390x) labels Sep 26, 2024

taiki-e mentioned this issue Sep 26, 2024

Support clobber_abi and vector/access registers (clobber-only) in s390x inline assembly #130630

Merged

taiki-e mentioned this issue Oct 4, 2024

Stabilize s390x inline assembly #131258

Open

tgross35 added A-inline-assembly Area: Inline assembly (`asm!(…)`) A-ABI Area: Concerning the application binary interface (ABI) labels Oct 4, 2024

taiki-e mentioned this issue Oct 12, 2024

Support s390x z13 vector ABI #131586

Open

taiki-e mentioned this issue Oct 13, 2024

Support #[repr(simd)] types in input/output of s390x inline assembly #131664

Draft

workingjubilee mentioned this issue Oct 17, 2024

Figure out which target features are required for which SIMD size #131800

Open

14 tasks

workingjubilee added the A-SIMD Area: SIMD (Single Instruction Multiple Data) label Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s390x vector facilities support #130869

s390x vector facilities support #130869

taiki-e commented Sep 26, 2024 •

edited

Loading

taiki-e commented Sep 26, 2024

uweigand commented Sep 30, 2024

uweigand commented Oct 11, 2024

taiki-e commented Oct 11, 2024

taiki-e commented Oct 11, 2024

uweigand commented Oct 12, 2024

taiki-e commented Oct 12, 2024 •

edited

Loading

taiki-e commented Oct 12, 2024 •

edited

Loading

uweigand commented Oct 12, 2024

taiki-e commented Oct 12, 2024

s390x vector facilities support #130869

s390x vector facilities support #130869

Comments

taiki-e commented Sep 26, 2024 • edited Loading

taiki-e commented Sep 26, 2024

uweigand commented Sep 30, 2024

uweigand commented Oct 11, 2024

taiki-e commented Oct 11, 2024

taiki-e commented Oct 11, 2024

uweigand commented Oct 12, 2024

taiki-e commented Oct 12, 2024 • edited Loading

taiki-e commented Oct 12, 2024 • edited Loading

uweigand commented Oct 12, 2024

taiki-e commented Oct 12, 2024

taiki-e commented Sep 26, 2024 •

edited

Loading

taiki-e commented Oct 12, 2024 •

edited

Loading

taiki-e commented Oct 12, 2024 •

edited

Loading