Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement common swizzle operations. #335

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

reinerp
Copy link

@reinerp reinerp commented Mar 13, 2023

concat and slice address #277.

general_reverse is useful for horizontal reductions, bitonic sorting, and many similar cases.

/// ```
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn general_reverse<const SWAP_MASK: usize>(self) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this should include something like lanewise in it's name, since I expect Rust to gain a bitwise grev at some point and we'd want the bitwise simd and scalar integer operations to have matching names (probably gen_rev or generalized_reverse or grev?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For something that effectively implements a very specific xor-striding pattern, general_reverse is a non-descript name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, grev is what that particular bitwise op has become named, thanks to RISC-V afaict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grev -- general bit reverse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because an ISA has picked a terrible name doesn't mean we need to copy it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the precedent of u32::count_ones, it looks like the Rust standard library takes the convention of providing a more meaningful name (count_ones) while maintaining searchability for the common name popcnt using #[doc(alias = "popcnt")], https://github.com/rust-lang/rust/blob/f1b1ed7e18f1fbe5226a96626827c625985f8285/library/core/src/num/int_macros.rs#L104. I think such an approach could be warranted here too, keeping grev as an alias. Indeed, grev is a much less established term than popcnt, so the precedent from grev is weaker.

Other names I considered:

  • butterfly_shuffle. The idea here is that, in the common case of SWAP_MASK being a power of 2, it implements one stage of a butterfly network. https://en.wikipedia.org/wiki/Butterfly_network
  • swap_lanes_xor. The "swap lanes" part is pretty self-explanatory. The "xor" part is confusing, however: it suggests that the data bits are being xored, whereas it's actually the lane indices that are being xored.

Currently I lean towards swap_lanes_xor. Open to other suggestions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swizzle_to_xor_indexes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_pow2_lane_groups? Conceptually this operation performs the reversal of blocks of k lanes within n-lane groups, where k and n are both powers of two, k ≤ n ≤ LANES, and the choice of k and n is determined uniquely up to operation uniqueness by choosing where index 0 will be swizzled to (it’ll be exactly SWAP_MASK).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse_pow2_lane_groups?

afaict grev is actually more powerful than that, it can do any arbitrary combination of those k-n reversals for arbitrary k and n.

e.g. grev(v, 5) is equivalent to simd_swizzle!(v, [5, 4, 7, 6, 1, 0, 3, 2]) which is a combination of reversing adjacent pairs (blocks of 1) and swapping blocks of 4.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for the suggestions. Of the proposals so far, my preference order is:

  1. swizzle_to_xor_indices
  2. butterfly_swizzle
  3. grev

I have gone with swizzle_to_xor_indices. Let me know what you think.

crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`.
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concat_to? Why not just concat?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose concat_to because I imagine we'll want to eventually (when generic_const_exprs stabilizes) deprecate this function in favor of a function that uses generic_const_exprs. I wanted to reserve the better name concat for that new function, which would have signature:

fn concat(self, other: Self) -> Simd<T, {LANES * 2}>

The idea of the concat_to naming is that most call sites will look like x.concat_to::<8>(y) or similar, so it can be read as "concatenate x to length 8 using y".

Open to other suggestions. One option would be to call it concat now, and don't worry about introducing a breaking change in future once the generic_const_exprs approach works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hm.

crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
Comment on lines 421 to 429
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`.
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES>
where
LaneCount<DOUBLE_LANES>: SupportedLaneCount,
{
const fn concat_index<const DOUBLE_LANES: usize>(lanes: usize) -> [Which; DOUBLE_LANES] {
assert!(lanes * 2 == DOUBLE_LANES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: Are you aware that when you put something in a const fn, and then the const fn is evaluated at runtime, that this means an assert! in it will not be evaluated at compilation time, but rather will be evaluated at runtime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only used at compile time, so that doesn't matter here.

Copy link
Member

@workingjubilee workingjubilee Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I see now, it happens in the associated constant. Then that requires monomorphization which puts it at risk of depending on trait and const evaluation details. Hmm. I do want to allow us to use certain post-monomorphization errors but I currently don't fully understand the evaluation patterns that rustc will do to intuit in what cases this will/will not happen.

There are cases where, due to various inputs you can give to the compiler, "dead" code can potentially get resurrected and monomorphized anyways (starting, of course, with -Clink-dead-code), so I want to better know whether this will trigger on those cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guarantee: if you don't instantiate this generic at an invalid input/output type pair, then you won't get a post-monomorphization error. So this is like a type error, except with worse ergonomics because it's raised during monomorphization rather than type checking.

If we had generic_const_exprs we'd make this a genuine type error rather than monomorphization-time error, by giving this function the type concat_to(self, other: Self) -> Simd<T, {LANES * 2}>. I've created commit reinerp@1c2972b on a separate branch, to show how this would work. Unfortunately, the test doesn't compile due to what seems like a bug in generic_const_exprs.

More broadly, I would like to be able to use concat and split operations in my own code without requiring the generic_const_exprs language feature. Hence why I took my current approach for avoiding generic_const_exprs. From my perspective the generic_const_exprs feature is much less stable than portable_simd, and indeed portable_simd is the only unstable feature I allow in my own code.

Copy link
Author

@reinerp reinerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very helpful comments! I have switched from slice to a split API, and have made a few minor cleanups. Tests should pass now, I believe.

Comment on lines 421 to 429
/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`.
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES>
where
LaneCount<DOUBLE_LANES>: SupportedLaneCount,
{
const fn concat_index<const DOUBLE_LANES: usize>(lanes: usize) -> [Which; DOUBLE_LANES] {
assert!(lanes * 2 == DOUBLE_LANES);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guarantee: if you don't instantiate this generic at an invalid input/output type pair, then you won't get a post-monomorphization error. So this is like a type error, except with worse ergonomics because it's raised during monomorphization rather than type checking.

If we had generic_const_exprs we'd make this a genuine type error rather than monomorphization-time error, by giving this function the type concat_to(self, other: Self) -> Simd<T, {LANES * 2}>. I've created commit reinerp@1c2972b on a separate branch, to show how this would work. Unfortunately, the test doesn't compile due to what seems like a bug in generic_const_exprs.

More broadly, I would like to be able to use concat and split operations in my own code without requiring the generic_const_exprs language feature. Hence why I took my current approach for avoiding generic_const_exprs. From my perspective the generic_const_exprs feature is much less stable than portable_simd, and indeed portable_simd is the only unstable feature I allow in my own code.

/// Will be rejected at compile time if `LANES * 2 != DOUBLE_LANES`.
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn concat_to<const DOUBLE_LANES: usize>(self, other: Self) -> Simd<T, DOUBLE_LANES>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose concat_to because I imagine we'll want to eventually (when generic_const_exprs stabilizes) deprecate this function in favor of a function that uses generic_const_exprs. I wanted to reserve the better name concat for that new function, which would have signature:

fn concat(self, other: Self) -> Simd<T, {LANES * 2}>

The idea of the concat_to naming is that most call sites will look like x.concat_to::<8>(y) or similar, so it can be read as "concatenate x to length 8 using y".

Open to other suggestions. One option would be to call it concat now, and don't worry about introducing a breaking change in future once the generic_const_exprs approach works.

crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
/// ```
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn general_reverse<const SWAP_MASK: usize>(self) -> Self {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the precedent of u32::count_ones, it looks like the Rust standard library takes the convention of providing a more meaningful name (count_ones) while maintaining searchability for the common name popcnt using #[doc(alias = "popcnt")], https://github.com/rust-lang/rust/blob/f1b1ed7e18f1fbe5226a96626827c625985f8285/library/core/src/num/int_macros.rs#L104. I think such an approach could be warranted here too, keeping grev as an alias. Indeed, grev is a much less established term than popcnt, so the precedent from grev is weaker.

Other names I considered:

  • butterfly_shuffle. The idea here is that, in the common case of SWAP_MASK being a power of 2, it implements one stage of a butterfly network. https://en.wikipedia.org/wiki/Butterfly_network
  • swap_lanes_xor. The "swap lanes" part is pretty self-explanatory. The "xor" part is confusing, however: it suggests that the data bits are being xored, whereas it's actually the lane indices that are being xored.

Currently I lean towards swap_lanes_xor. Open to other suggestions!

crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
Co-authored-by: Jacob Lifshay <programmerjake@gmail.com>
Copy link
Author

@reinerp reinerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all comments are now addressed. Please take another look.

crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
/// ```
#[inline]
#[must_use = "method returns a new vector and does not mutate the original inputs"]
pub fn general_reverse<const SWAP_MASK: usize>(self) -> Self {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for the suggestions. Of the proposals so far, my preference order is:

  1. swizzle_to_xor_indices
  2. butterfly_swizzle
  3. grev

I have gone with swizzle_to_xor_indices. Let me know what you think.

@reinerp
Copy link
Author

reinerp commented Apr 29, 2023

Hi folks, I believe I have responded to all review comments so far.

My sense is there are some areas where opinions differ between people on this thread on the best path forward, namely:

  1. Naming of swizzle_to_xor_indices.
  2. Whether, and in what form, to include concat/slice, given the limited type inference support due to lack of generic_const_exprs.

My sense is that this PR is stuck in review, pending consensus on these decisions. I'm not sure what process your project uses to make decisions in such cases. Could you please advise on the process? I am very excited about the portable-simd work in general, and this PR in particular, and am happy to help if I can.

@calebzulawski
Copy link
Member

I've definitely been neglecting this PR. Some thoughts:

  • Regarding concat and split, there's a generic_const_exprs cargo feature that you can put these functions behind. While this prevents them from making it to the nightly compiler, I think I'd prefer implementing it this way. It makes it clearer which functions we're waiting on and allows us to test the intended implementation.
  • I find the grev function confusing and I'm not sure if this is something general enough to implement in the standard library. Maybe it is, I'm just not sure.
  • Something with swap_lanes is probably best, but swap_lanes_xor is still a little confusing (since it's the index that's being xor'd, not the lane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants