-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
widening_mul #2417
widening_mul #2417
Conversation
I think we could come up with a better name for the method, but otherwise this is something I'd very much like to see added. Given that on some architectures (x86 to name one), multiplication always produces result twice the width of the operands, it seems rather inefficient to not provide a way to use the higher half. |
@le-jzr We can also choose an efficient-as-possible implementation on architectures where it doesn't and save users the bother. Some name ideas:
|
For reference, here is the generic definition from my u2size crate: #[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
let n = 0usize.count_zeros() >> 1;
let halves = |x| (x >> n, x & (!0 >> n));
let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
let (z2, z0) = (x1 * y1, x0 * y0);
let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
let (w, c) = usize::overflowing_add(z0, z1 << n);
(z2 + z1 >> n + c as u32, w)
} The emitted machine code for this is quite heinous. |
Having never heard of this operation before, but immediately understanding why it would be useful, I think the name needs to have some form of the word "wide" in it.
|
After a bit of though, I see the logic behind |
@strake I know it's not really the point, but couldn't you do something like this? #[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
let usize_bits = 0usize.count_zeros();
if usize_bits <= 32 {
let z = (x as u64) * (y as u64);
((z >> usize_bits) as usize, z as usize)
} else if usize_bits <= 64 {
let z = (x as u128) * (y as u128);
((z >> usize_bits) as usize, z as usize)
} else {
// fallback
let n = usize_bits >> 1;
let halves = |x| (x >> n, x & (!0 >> n));
let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
let (z2, z0) = (x1 * y1, x0 * y0);
let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
let (w, c) = usize::overflowing_add(z0, z1 << n);
(z2 + z1 >> n + c as u32, w)
}
} |
Maybe instead of returning a tuple, it would make sense to return a struct? So e.g. |
I have only seen this called "widening multiply", but mostly in low-level (compiler/ISA) contexts. |
I do wonder why this is a |
The broader rust project already has this in compiler-builtins (https://github.com/rust-lang-nursery/compiler-builtins/blob/master/src/int/mul.rs#L7), so I'd personally be happy to have it in core instead. A quick playground experiment shows the ASM is simpler for returning But I wonder if this really wants to be ternary instead, like fn mul_with_carry(self, multiplier: Self, carry: Self) -> (Self, Self) |
Despite what the drawbacks section says, `a as u128 * b as u128` results in
ideal code for widening u64 multiplication (with or without optimisations,
provided overflow checks are disabled.) Very similar results are obtained
if result is split into two parts afterwards.
|
Regarding output tuple order, I think it could be made more obvious using the method name. For example |
To me this operation falls in the same category as leading/trailing zeros, rotates, bit reverse, saturation, etc. in that it
For these operations I'm in favor of providing them in the Rust standard library, both to offer a canonical way to write out these important primitives and to make it easier for the compiler to generate good code reliably (i.e., no need to pattern match a particular way to implement the operation). |
@joshtriplett the @nagisa yes, it does for me too, but i as a crate author worry whether it will remain so on other targets — will it generate such code on a 32-bit machine? a 128-bit machine? is Rust willing to preclude ever supporting 128-bit targets (e.g. RV128I)? I would prefer to have a method which means efficient double-wide multiplication rather than rely on the compiler to figure out what i mean and myself to not mess it up every time. @rkruppe thanks for verbalizing these points — yes, the notion is to have a canonical form so neither the compiler nor a future reader needs to guess what is going on.
|
Personally I think that the return type should be the next size up of an integer type. I wouldn't mind creating |
@clarcharr I'd love to have |
As the order of the high and low half of the result tuple is not obvious suggests to me that a tuple is probably not the best return type. The obvious solution would be a struct with two fields named "high" and "low", but I'm wondering if simply returning a wider integer type is not simpler (though that means i128/u128 will unfortunately lack the method). I.e. add the following functions (and the equivalent for unsigned integers) to the standard library.
|
@JonSeverinsson Not sure there is much of a point in doing that because you can simply cast to the wider type and perform the multiplication. |
Returning twice as wide type instead of a tuple has the issue that it wouldn't work on |
I'd be fine with returning either a tuple or a struct. As @le-jzr noted, returning twice-as-wide type is impossible for |
Does it make sense to define fn widening_mul(self, other: isize) -> (isize, usize); |
@kennytm absolutely imho. |
I took a stab at the simplest version of that using the proposed method, and I think it ends up being something like this: fn scalar_mul_assign(big: &mut Vec<u8>, scalar: u8) {
let mut carry = 0;
for i in big.iter_mut() {
let (low, high) = i.wide_mul(scalar);
let (low, overflowed) = low.overflowing_add(carry);
*i = low;
carry = high + if overflowed { 1 } else { 0 };
}
if carry > 0 {
big.push(carry);
}
} That's not great, and I doubt LLVM will do a great job with it. So I think it'd be good to have the carry built-in, so it can be something like pub fn scalar_mul_assign2(big: &mut Vec<u8>, scalar: u8) {
let mut carry = 0;
for i in big.iter_mut() {
let (x, c) = i.mul_with_carry(scalar, carry);
*i = x;
carry = c;
}
if carry > 0 {
big.push(carry);
}
} Since one can always pass 0 as the carry, which LLVM will fold away easily. |
+1, would make writing certain cryptographic code very simple |
One interesting use case for this might be a more efficient version of the
I mean, |
ping have we reached consensus on the name |
I'd like to point out that the apfloat component of the official librustc has had a |
The |
@sm-Fifteen It is no coincidence I wrote that function as a non-allocating bigint multiplication helper when I ported APFloat from C++ and #2417 (comment) - so don't count my "vote" for |
Consensus for name seems to be |
What's the rationale for having |
@fstirlitz One use case would be intermediate results that may overflow, e.g. rationals. While the end result of x*y/z might fit in a |
@fstirlitz I had 2 motivating use cases for
|
Would be good to have the variants that return the wider type directly, since it has direct uses for a number of common dsp operations. |
@lu-zero |
I think it's sad that this petered out. This is a small, useful API addition which apparently isn't too controversial or prone to vigorous bikeshedding (except the name, but that's now resolved). Maybe we should just skip ahead to implementing it? Evidently the RFC process is not working well for keeping this RFC PR moving. |
That's usually a good idea for small libs features. |
So I found myself needing this again today, and it looks like the overall decision was to just implement this? I would be willing to help do that, but I honestly have no idea how -- I'm not sure what LLVM intrinsics are used to do wide-multiplication, and I couldn't find the docs for it. If we at least add some intrinsics for this it would make implementing a working solution easier, and we can figure out the exact API later. |
@clarfonthey there's no intrinsic for it; it represents it with Demo, with beautiful assembly: https://rust.godbolt.org/z/86of1z184 (I do think that people writing Rust shouldn't need to know that, though, and I could imagine the best way to implement it in cranelift could be different, which would be another reason to have an exposed specialized method for it.) |
…ou-se Add carrying_add, borrowing_sub, widening_mul, carrying_mul methods to integers This comes in part from my own attempts to make (crude) big integer implementations, and also due to the stalled discussion in [RFC 2417](rust-lang/rfcs#2417). My understanding is that changes like these are best offered directly as code and then an RFC can be opened if there needs to be more discussion before stabilisation. Since all of these methods are unstable from the start, I figured I might as well offer them now. I tried looking into intrinsics, messed around with a few different implementations, and ultimately concluded that these are "good enough" implementations for now to at least put up some code and maybe start bikeshedding on a proper API for these. For the `carrying_add` and `borrowing_sub`, I tried looking into potential architecture-specific code and realised that even using the LLVM intrinsics for `addcarry` and `subborrow` on x86 specifically, I was getting exactly the same assembly as the naive implementation using `overflowing_add` and `overflowing_sub`, although the LLVM IR did differ because of the architecture-specific code. Longer-term I think that they would be best suited to specific intrinsics as that would make optimisations easier (instructions like add-carry tend to use implicit flags, and thus can only be optimised if they're done one-after-another, and thus it would make the most sense to have compact intrinsics that can be merged together easily). For `widening_mul` and `carrying_mul`, for now at least, I simply cast to the larger type and perform arithmetic that way, since we currently have no intrinsic that would work better for 128-bit integers. In the future, I also think that some form of intrinsic would work best to cover that case, but for now at least, I think that they're "good enough" for now. The main reasoning for offering these directly to the standard library even though they're relatively niche optimisations is to help ensure that the code generated for them is optimal. Plus, these operations alone aren't enough to create big integer implementations, although they could help simplify the code required to do so and make it a bit more accessible for the average implementor. That said, I 100% understand if any or all of these methods are not desired simply because of how niche they are. Up to you. 🤷🏻
This was accepted for nightly without the RFC, tracking issue rust-lang/rust#85532 https://doc.rust-lang.org/nightly/std/primitive.u32.html#method.widening_mul As such, I'm going to close this on the assumption that it's no longer needed. Feel free to reopen if so desired. |
View