Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s390x vector facilities support #130869

Open
2 of 10 tasks
taiki-e opened this issue Sep 26, 2024 · 10 comments
Open
2 of 10 tasks

s390x vector facilities support #130869

taiki-e opened this issue Sep 26, 2024 · 10 comments
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-inline-assembly Area: Inline assembly (`asm!(…)`) A-SIMD Area: SIMD (Single Instruction Multiple Data) O-SystemZ Target: SystemZ processors (s390x) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@taiki-e
Copy link
Member

taiki-e commented Sep 26, 2024

This tracks the status of s390x vector facilities support in rustc and standard libraries.

@rustbot label +O-SystemZ

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-SystemZ Target: SystemZ processors (s390x) labels Sep 26, 2024
@taiki-e
Copy link
Member Author

taiki-e commented Sep 26, 2024

cc @uweigand

@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 26, 2024
@uweigand
Copy link
Contributor

Thanks for putting together this list! I can certainly look into providing proper vector register ABI support.

@tgross35 tgross35 added A-inline-assembly Area: Inline assembly (`asm!(…)`) A-ABI Area: Concerning the application binary interface (ABI) labels Oct 4, 2024
@uweigand
Copy link
Contributor

I've looked into this a bit, but ran into a problem I'm not sure how to address. The main issue is that on s390x, we use a different ABI depending on whether or not the vector feature is active (i.e. the processor has vector registers). The difference affects only vector types (which I guess would correspond to #repr(simd) types in Rust), in the following manner:

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.
  • Calling convention: If the vector ABI is not present, all vector types are passed and returned like aggregates. If the vector ABI is present, vector types up to 16 bytes in size are instead passed and returned in vector registers. For arguments (but not return values) the same applies also to aggregates containing only a single element of vector type.

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

@taiki-e
Copy link
Member Author

taiki-e commented Oct 11, 2024

@uweigand Thanks for the investigation!

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option.
I will try to see if I can implement that approach...

@taiki-e
Copy link
Member Author

taiki-e commented Oct 11, 2024

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option.
I will try to see if I can implement that approach...

I haven't done much testing yet, but this approach appears to be working: master...taiki-e:rust:s390x-vector-abi

// no_vector: mvc 8(8,%r2), 8(%r3)
// no_vector-NEXT: mvc 0(8,%r2), 0(%r3)
// no_vector-NEXT: br %r14
// vector: vl %v24, 0(%r2), 3
// vector-NEXT: br %r14
#[no_mangle]
extern "C" fn vector(x: &i8x16) -> i8x16 {
    *x
}

@uweigand
Copy link
Contributor

@uweigand Thanks for the investigation!

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The point is that there's a difference between the LLVM IR vector types and the C/C++ language vector types. The LLVM IR vector types are indeed always 8-byte aligned now. But the C/C++ language vector types still use different ABI alignment rules depending on the vector feature. Clang handles this by mapping the C/C++ type not directly to an LLVM IR type (leaving LLVM to use its own alignment rules), but rather to an LLVM IR type with an explicit alignment override (which implements the appropriate ABI alignment rule).

The calling convention should be implemented in the rustc_target/src/abi/call/s390x.c. However, I'm not sure how to detect in this place whether or not the vector feature is active. This depends both on explicit feature flags, and also on the target processor (-march=) - processors from z13 onwards default to having the vector feature enabled, older processors have it disabled. Clang tracks this in the front-end so it can make ABI decisions based on it. I'm not sure how this is done in Rust, I'm not seeing any precedent in any of the other ABI files. (I believe there may be some similar issues e.g. on Intel for the larger vector registers like AVX-256 or AVX-512, but I wasn't able to find where this is handled.)

I think we can implement this by referring to wasm code (1, 2) that determines the ABI depending on the command line option. I will try to see if I can implement that approach...

Ah, interesting! It seems this

    let unstable_target_features = codegen_backend.target_features(sess, true);

calls back into the LLVM back-end, which does already know which features are available by default depending on the target-cpu setting. I wasn't aware of that ... I'm wondering whether/how this works when using another codegen backend like GCC or Cranelift?

Otherwise, your patch looks good to me, except for

    if abi == Vector && size.bits() == 128 && contains_vector(cx, ret.layout) {
  • We also pass vectors smaller than 128 bits in VRs if available, so I'm not sure this size check is correct. (Then again, your vector_small test case already seems to do the right thing, so I may be missing something here?)
  • The logic should be different between the argument and return case: single-element vector aggregates are treated like vectors only as arguments, not as return types (that's a bit of an unfortunate ABI choice, but it matches what we historically did for single-element float aggregates). The wrapper tests all show the incorrect behavior for return values.

@taiki-e
Copy link
Member Author

taiki-e commented Oct 12, 2024

@uweigand Thanks for the investigation!

  • Alignment: If the vector ABI is not present, all vector types are naturally aligned to their size. If the vector ABI is present, vector types larger than 8 bytes are still only 8-byte aligned.

In addition, I'm not sure where to handle the vector alignment. This currently seems to be derived solely from the datalayout string, which likely isn't correct even now - the datalayout string gives the alignment for the LLVM vector types, which is always bounded to 8 bytes, but Clang explicitly uses a different alignment for C/C++ level vector types if necessary.

Hmm, I thought it is always 8-byte aligned since LLVM 16 (2e7a964).

The point is that there's a difference between the LLVM IR vector types and the C/C++ language vector types. The LLVM IR vector types are indeed always 8-byte aligned now. But the C/C++ language vector types still use different ABI alignment rules depending on the vector feature. Clang handles this by mapping the C/C++ type not directly to an LLVM IR type (leaving LLVM to use its own alignment rules), but rather to an LLVM IR type with an explicit alignment override (which implements the appropriate ABI alignment rule).

Ah, thanks for the clarification. I'm not sure how to handle it either, but considering that the current Rust SIMD types cannot be used with C FFI in the first place, and that even if it could be used, the requirement would be that the target feature be enabled, we may actually not have to very worry about the case where the vector feature is disabled here.

RFC 2574 to allows using SIMD types in C FFI says:

> Architecture-specific vector types require #[target_feature]s to be FFI safe. That is, they are only safely usable as part of the signature of extern functions if the function has certain #[target_feature]s enabled.

EDIT: I think #130869 (comment) 's approach will work.

 let unstable_target_features = codegen_backend.target_features(sess, true);

calls back into the LLVM back-end, which does already know which features are available by default depending on the target-cpu setting. I wasn't aware of that ... I'm wondering whether/how this works when using another codegen backend like GCC or Cranelift?

rustc_codegen_gcc seems to have code to handle -C target-cpu and -C target-feature, but I have not tested if it works as expected here.

IIRC rustc_codegen_cranelift does not currently support the target feature at all (rust-lang/rustc_codegen_cranelift#1400), so I suspect that vector ABI cannot be enabled from any way.

  • We also pass vectors smaller than 128 bits in VRs if available, so I'm not sure this size check is correct. (Then again, your vector_small test case already seems to do the right thing, so I may be missing something here?)

It looks like the vector_small case is processed in the branch before processing the 128-bit vector.

if !ret.layout.is_aggregate() && size.bits() <= 64 {
ret.extend_integer_width_to(64);
return;
}

However, I think we should handle it explicitly in the if abi == Vector && branch. Posted fixed version in #131586.

  • The logic should be different between the argument and return case: single-element vector aggregates are treated like vectors only as arguments, not as return types (that's a bit of an unfortunate ABI choice, but it matches what we historically did for single-element float aggregates). The wrapper tests all show the incorrect behavior for return values.

Thanks for pointing that out. I tried to fix this, but it appears that the ABI of Wrapper<Vector> is considered Vector, even without #[repr(transparent)] / #[repr(C)]...

@taiki-e
Copy link
Member Author

taiki-e commented Oct 12, 2024

Rust plans to disallow passing vector types to non-Rust ABI if the required target feature is disabled (#127731), so I think the issue of ABI differences with C/C++ when vector target feature is disabled could be resolved by extending the ABI checks used for that to handle s390x as well.

@uweigand
Copy link
Contributor

Thanks for pointing that out. I tried to fix this, but it appears that the ABI of Wrapper is considered Vector, even without #[repr(transparent)] / #[repr(C)]...

Hmm. Your test case uses the default (Rust) representation for the Wrapper type - my understanding is that this does not actually guarantee interoperability with the native platform ABI. Only #[repr(C)] comes with that guarantee. Does the test work if you use this for the Wrapper type?

Rust plans to disallow passing vector types to non-Rust ABI if the required target feature is disabled (#127731), so I think the issue of ABI differences with C/C++ when vector target feature is disabled could be resolved by extending the ABI checks used for that to handle s390x as well.

That looks like a reasonable approach to me as well.

@taiki-e
Copy link
Member Author

taiki-e commented Oct 12, 2024

Hmm. Your test case uses the default (Rust) representation for the Wrapper type - my understanding is that this does not actually guarantee interoperability with the native platform ABI. Only #[repr(C)] comes with that guarantee. Does the test work if you use this for the Wrapper type?

Oh, you are right. Changing Wrapper to #[repr(C)] worked as expected. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-inline-assembly Area: Inline assembly (`asm!(…)`) A-SIMD Area: SIMD (Single Instruction Multiple Data) O-SystemZ Target: SystemZ processors (s390x) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants