Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rangeless Shifts #428

Closed
chorman0773 opened this issue Aug 12, 2024 · 14 comments
Closed

Rangeless Shifts #428

chorman0773 opened this issue Aug 12, 2024 · 14 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@chorman0773
Copy link

chorman0773 commented Aug 12, 2024

Proposal

Problem statement

Rust presently has a few options for shifts:

  • val << bits/val >> bits does a shift by bits with normally checked behaviour for out-of-range shifts,
  • val.wrapping_shl(bits)/val.wrapping_shr(bits) likewise does a shift by bits, but wraps bits to the width of the type.

(There are also rotates_ which likewise wrap at width boundary).

This has a problem because the wrapping behaviour here is not the intuitive one (as it wraps the shift quantity, not the shift result). Operations may perform a shift with an undetermined quantity that may exceed the width of the type, e.g. to produce a bitmask of possible values.

Motivating examples or use cases

Producing an n-bit mask of an integer type up to n bits:

pub const fn mk_mask64(bits: u32) -> u64{
    (1 << bits)
}

Determining which bit to check for a carry-out of a wide operation

pub fn test_carry_bit(val: u64, bits: u32) -> u64{
    val & (1 << bits)
}

Both of these functions are ones I'm currently using in an architecture emulator which does native integer operations of varying sizes up to 64-bit.

Solution sketch

The names of the methods are subject to bikeshed, I don't have a good name for them.

impl Int { // `Int` is any signed or unsigned integer type, including `u128`/`i128` and `usize/`isize`/
    pub const fn rangeless_shl(self, width: u32) -> Self;
    pub const fn rangeless_shr(self, width: u32) -> Self;
}

Formally, the behaviour of both of these methods involves evaluating the shift on a type with infinite width (or concretely, a width of u32::MAX + 1), then wrapping the result modulo (2 ^ Self::WIDTH).

Practically this means that:

  • For a.rangeless_shl(bits), if bits < Self::WIDTH, the result is exactly a << bits. Otherwise the result is 0
  • For a.rangeless_shr(bits), if bits < Self::WIDTH, the result is exactly a >> bits. Otherwise the result depends on the signedness:
    • For unsigned types, the result is 0 like for wrapping_shl, which corresponds with the logical right shift of the value by the full width of the type,
    • For signed types, the result is either 0 or -1 depending on whether the value is signed or unsigned, which corresponds to an arithmetic right shift of the value by the full width of the type.

The result does not differ when the shift quantity is exactly Self::WIDTH or is any value that exceeds it.

Alternatives

There are a few ways to implement the rangeless shifts concretely:

  • self.checked_{shl,shr}(bits).unwrap_or(exceeds_width_result) , or, verbosely:
  • if bits < Self::WIDTH { self (op) bits} else { exceeds_width_result }

(where exceeds_width_result is the appropriate value for the type and op)

Both versions are verbose and somewhat error prone (particularily for signed rangeless_shr) to write inline.

This could be an external crate, but has one of two major limitations due to current language features

  1. It must be a free function, or
  2. It cannot be made a const fn.

The ergonomics of having a method call, and the ability to do this as const makes me of the opinion that it should be an inherent method of the integer types in the standard library. A standard library function could also potentially benefit from optimizations on architectures where this behaviour is the behaviour of an out-of-range shift. I do not know whether or not the versions written above will do this, on x86 the unsigned versions will optimize to a cmov however.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@chorman0773 chorman0773 added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Aug 12, 2024
@cuviper
Copy link
Member

cuviper commented Aug 12, 2024

I think this is similar to #230.

(edit: I said "the same as" first, but you want different semantics for oversized shifts)

@chorman0773
Copy link
Author

Yes, in particular I want the over-width left shift to be 0 (rather than !0). I commonly use this operation to, among other things, compute a bitmask for a dynamically provided width, doing a .wrapping_sub(1) after the shift.

@m-ou-se
Copy link
Member

m-ou-se commented Aug 13, 2024

Why not change the behaviour of the << and >> operators themselves? (Just have 1u32 << 100 evaluate to 0 rather than the overflow-panic/wrapping behaviour.)

@joshtriplett
Copy link
Member

Leaving aside whether Rust would have designed these operators like that pre-1.0 (which I don't think Rust should have), I don't think we should change these operators at this point, because that would make them not match the behavior of any existing CPUs, and require multiple instructions rather than one. (In some cases we could optimize away the additional instructions, but I don't think our default operator semantics should have a mismatch with CPUs.)

@kennytm
Copy link
Member

kennytm commented Aug 13, 2024

Not sure if you consider nVidia GPUs an "existing CPU" but the PTX instruction set for shl/shr did specify

Shift amounts greater than the register width N are clamped to N.

The wrapping behavior is true for basically every other target though (x86, arm, wasm, sparc, riscv, powerpc, mips, m68k, loongarch64, hexagon, bpf)

BTW I think NVPTX's shf terminology (.wrap, .clamp) is more familiar than the term "rangeless".

impl u64 {
    pub fn wrapping_shl(self, rhs: u32) -> Self {
        self << (rhs & 63)
    }

    pub fn clamping_shl(self, rhs: u32) -> Self { // or `clamped_shl`
        if rhs < 64 { self << rhs } else { 0 }
    }
}

@dtolnay
Copy link
Member

dtolnay commented Aug 13, 2024

We discussed this ACP in today's standard library API meeting. The team members present agreed there is a compelling motivation for the standard library to provide the proposed rangeless shift behavior. Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.

We were split on whether changing <</>> to perform rangeless shift would be reasonable, but either way this one is a decision for the language team, not the standard library team. If someone is interested in pursuing changing the operators, we would encourage submitting a lang RFC, and we would make sure to consider the state of that RFC as part of the process of stabilizing named methods for rangeless shift.

Commentary opposed to changing the operators:

  • On all platforms we know of, there is no efficient (single machine instruction) way to implement the new behavior
  • In practice, the vast majority of shifts are known by the programmer to be in-range, so applying a performance overhead to all shifts is not a good cost/benefit tradeoff
  • Arithmetic operators should only correspond to things that common architectures natively support
opRust expressionX86 asmAArch64 asm
shift
left

n << amount

shlx    rax, rdi, rsi
lsl     x0, x0, x1
rangeless
shift
left

n.checked_shl(amount)
.unwrap_or(0)

xor     eax, eax
cmp     esi, 64
shlx    rcx, rdi, rsi
cmovb   rax, rcx
lsl     x8, x0, x1
cmp     w1, #64
csel    x0, x8, xzr, lo
shift
right

n >> amount

shrx    rax, rdi, rsi
lsr     x0, x0, x1
rangeless
shift
right
unsigned

n.checked_shr(amount)
.unwrap_or(0)

xor     eax, eax
cmp     esi, 64
shrx    rcx, rdi, rsi
cmovb   rax, rcx
lsr     x8, x0, x1
cmp     w1, #64
csel    x0, x8, xzr, lo
rangeless
shift
right
signed

n.checked_shr(amount)
.unwrap_or(
if n >= 0 { 0 } else { -1 }
)

cmp     esi, 63
mov     eax, 63
cmovb   eax, esi
sarx    rax, rdi, rax
mov     w8, #63
cmp     w1, #63
csel    w8, w1, w8, lo
asr     x0, x0, x8

Commentary in favor of changing the operators:

  • Arithmetic operators should implement whichever variation of the operation is most widely applicable, i.e. use the short symbolic name (as opposed to more verbose named method syntax) for the behavior that is to occur the most frequently in real-world programs
  • Compilers are very competent at telling when a shift amount is definitely in bounds, and turn rangeless shift into a single machine instruction
  • The vast majority of shifts that people write are not performance sensitive at the level that a cmov would matter whatsoever, and of the remainder, the vast majority are expected to be proven to be in-bounds by the compiler; the 0.1% left can use a named method for the legacy shift behavior

@programmerjake
Copy link
Member

changing the shift operators is not backwards compatible, it changes the release-mode behavior of 1u8 << 8.

@cuviper
Copy link
Member

cuviper commented Aug 13, 2024

If we changed the operator, we would also need to decide how to handle negative shift values. e.g. I don't think we want rangeless x << -2 to result in 0, so should it now act like x >> 2? Or keep treating that like overflow?

(This isn't a problem for explicit methods with rhs: u32.)

Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.

How about "unbounded"?

@kennytm
Copy link
Member

kennytm commented Aug 13, 2024

If you are changing the behavior of << you need to make sure portable_simd (std::simd::Simd) is using the same definition.

Interestingly the x86 SSE2 PSLL[W,D,Q] and AVX VPSLLV[W,D,Q] instructions do clamping natively.

If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all 0s.

OTOH ARM's vector-variant USHL instruction is still wrapping, consistent with the scalar behavior.

shift = SInt(Elem[operand2, e, esize]<7:0>);

@chorman0773
Copy link
Author

chorman0773 commented Aug 13, 2024

x86-64 will also offer this behaviour (specifically as to left-shift and unsigned right-shift) when the range is known to be <64, as you can just do the shift on a 64-bit register, then use the lower 32-bit, 16-bit, or 8-bit overlaps (though rightshift requires first zero-extending, which is either a movzx for 16-bit and 8-bit, or a plain mov for 32-bit).

@scottmcm
Copy link
Member

scottmcm commented Aug 13, 2024

Given a time machine, I'd absolutely change what Rust 1.0 did here. Good integer behaviour follows the rule from rust-lang/rust#100422 (comment) -- as though it was done in infinite precision, and you just got the low bits.

That's why uN::ilog2 is better, in my opinion, than uN::leading_zeroes. Similarly why uN::count_ones is better than uN::count_zeroes.

I think it's bad that

let x = 3;
foo(x);
dbg!(x << 16);

does something so drastically different depending on what the argument type of foo happens to be.

Needing a conditional instruction -- notably one that doesn't add any dependencies -- is 100% not an issue. Rust isn't an assembly language; the defaults should be the things that work properly even if it's another instruction, because the fast-but-wrong ones are better as functions, not operators.

x >>= n and for _ in 0..n { x >>= 1 } should be the same, just like x += n and for _ in 0..n { x += 1 } are the same. Anything else is bad.


But I agree that it's probably too late to change it now. (At least, it'd need a very long transition period, probably over multiple editions.)

Spitballing names:

  • proper_sh[lr]
  • accurate_sh[lr]
  • correct_sh[lr]
  • consistent_sh[lr]

@pitaj
Copy link

pitaj commented Aug 14, 2024

Some more options

  • truncating_
  • infinite_
  • winding_
  • modular_

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs-api meeting. We ended up settling on @cuviper's suggestion of unbounded_shl/unbounded_shr as the least confusing option. The semantic is that the base shl/shr have a bounded shift count, and the unbounded_ variants have no bound on the shift count.

@joshtriplett joshtriplett added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Aug 20, 2024
@joshtriplett
Copy link
Member

Marking this as accepted, because we're accepting the functionality proposed by ACP, with the names changed to unbounded_*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

9 participants