widening_mul #2417

strake · 2018-04-24T17:30:32Z

le-jzr · 2018-04-24T18:12:46Z

I think we could come up with a better name for the method, but otherwise this is something I'd very much like to see added. Given that on some architectures (x86 to name one), multiplication always produces result twice the width of the operands, it seems rather inefficient to not provide a way to use the higher half.

strake · 2018-04-24T18:17:40Z

@le-jzr We can also choose an efficient-as-possible implementation on architectures where it doesn't and save users the bother.

Some name ideas:

carrying_mul
double_mul
double_wide_mul

strake · 2018-04-24T18:29:46Z

For reference, here is the generic definition from my u2size crate:

#[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
    let n = 0usize.count_zeros() >> 1;
    let halves = |x| (x >> n, x & (!0 >> n));
    let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
    let (z2, z0) = (x1 * y1, x0 * y0);
    let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
    let (w, c) = usize::overflowing_add(z0, z1 << n);
    (z2 + z1 >> n + c as u32, w)
}

The emitted machine code for this is quite heinous.

Ixrec · 2018-04-24T20:05:16Z

Having never heard of this operation before, but immediately understanding why it would be useful, I think the name needs to have some form of the word "wide" in it.

carrying_mul sounds as if regular multiplication doesn't "carry the 1", which would just be inaccurate. double_mul sounds like multiplying twice, or multiplying by two, which are totally different operations. double_wide_mul I'd be fine with, but is probably overkill unless there's a quadruple_wide_mul we need to contrast it with. Finally, std already has wrapping_mul and overflowing_mul methods on these types, so for naming consistency I would call it widening_mul.

le-jzr · 2018-04-24T20:27:31Z

After a bit of though, I see the logic behind widening_mul. It could work.

tspiteri · 2018-04-24T23:03:23Z

@strake I know it's not really the point, but couldn't you do something like this?

#[inline(always)]
fn carrying_mul(x: usize, y: usize) -> (usize, usize) {
    let usize_bits = 0usize.count_zeros();
    if usize_bits <= 32 {
        let z = (x as u64) * (y as u64);
        ((z >> usize_bits) as usize, z as usize)
    } else if usize_bits <= 64 {
        let z = (x as u128) * (y as u128);
        ((z >> usize_bits) as usize, z as usize)
    } else {
        // fallback
        let n = usize_bits >> 1;
        let halves = |x| (x >> n, x & (!0 >> n));
        let ((x1, x0), (y1, y0)) = (halves(x), halves(y));
        let (z2, z0) = (x1 * y1, x0 * y0);
        let z1 = (x1 + x0) * (y1 + y0) - z2 - z0;
        let (w, c) = usize::overflowing_add(z0, z1 << n);
        (z2 + z1 >> n + c as u32, w)
    }
}

devonhollowood · 2018-04-25T00:02:27Z

Maybe instead of returning a tuple, it would make sense to return a struct? So e.g. u32::carrying_mul() would return a carried_product<u32>. This struct would have high and low fields. This makes it hard to confuse the fields, which would otherwise be an error-prone and hard to lint against. (Feel free to bikeshed on all the names here.)

eddyb · 2018-04-25T00:09:01Z

I have only seen this called "widening multiply", but mostly in low-level (compiler/ISA) contexts.

joshtriplett · 2018-04-25T00:40:06Z

I do wonder why this is a Mul impl on u2size rather than a mul of two usize values that returns a u2size (or equivalent such as a 2-tuple of usize, which I'd prefer over a new type here).

scottmcm · 2018-04-25T08:19:52Z

The broader rust project already has this in compiler-builtins (https://github.com/rust-lang-nursery/compiler-builtins/blob/master/src/int/mul.rs#L7), so I'd personally be happy to have it in core instead.

A quick playground experiment shows the ASM is simpler for returning (low, high) with u64. (For smaller types it didn't matter.) Conveniently that's consistent with overflowing_mul which also returns the low part of the result as the .0 part of the tuple.

But I wonder if this really wants to be ternary instead, like

fn mul_with_carry(self, multiplier: Self, carry: Self) -> (Self, Self)

nagisa · 2018-04-25T09:03:03Z

Despite what the drawbacks section says, `a as u128 * b as u128` results in ideal code for widening u64 multiplication (with or without optimisations, provided overflow checks are disabled.) Very similar results are obtained if result is split into two parts afterwards.

tspiteri · 2018-04-25T09:15:35Z

Regarding output tuple order, I think it could be made more obvious using the method name. For example fn mul_low_high(self, multiplier: Self) -> (Self, Self) makes it more obvious that ret.0 is the low word and ret.1 is the high word, and let (high, low) = a.mul_low_high(b); sets off alarm bells.

hanna-kruppe · 2018-04-25T09:18:30Z

To me this operation falls in the same category as leading/trailing zeros, rotates, bit reverse, saturation, etc. in that it

is relatively niche, but important for some crucial routines and sometimes provided by hardware
is not provided directly by C and friends, and thus generally implemented with intrinsics or in terms of other operators (e.g., rotate ~= bit shifts and bitwise or) that then have to be pattern matched against to recognize the operation
(on some targets, at least) requires special casing by instruction selection to achieve good codegen, so the aforementioned pattern matching is important

For these operations I'm in favor of providing them in the Rust standard library, both to offer a canonical way to write out these important primitives and to make it easier for the compiler to generate good code reliably (i.e., no need to pattern match a particular way to implement the operation).

strake · 2018-04-26T04:13:44Z

@joshtriplett the Mul impl is merely an example of use; the actual method is fn(usize, usize) -> (usize, usize).

@nagisa yes, it does for me too, but i as a crate author worry whether it will remain so on other targets — will it generate such code on a 32-bit machine? a 128-bit machine? is Rust willing to preclude ever supporting 128-bit targets (e.g. RV128I)? I would prefer to have a method which means efficient double-wide multiplication rather than rely on the compiler to figure out what i mean and myself to not mess it up every time.

@rkruppe thanks for verbalizing these points — yes, the notion is to have a canonical form so neither the compiler nor a future reader needs to guess what is going on.

widening_mul LGTM, will modify RFC when we reach a consensus.

clarfonthey · 2018-04-26T18:28:58Z

Personally I think that the return type should be the next size up of an integer type. I wouldn't mind creating u256 and i256 types that only allow bit operations and casting, but we can also just not define this for 128-bit integers.

joshtriplett · 2018-04-26T18:52:20Z

@clarcharr I'd love to have u256 (and maybe even u512) available, but not with just a limited set of operations. But that aside, I think we can universally define these operations as a function from T, T to (T, T), even for the largest type we have. Having versions that turn into the next larger type, for every type other than the largest, seems useful as well, but I'd hope that the LLVM optimizer makes it unnecessary to have two different versions.

JonSeverinsson · 2018-04-26T18:58:27Z

As the order of the high and low half of the result tuple is not obvious suggests to me that a tuple is probably not the best return type. The obvious solution would be a struct with two fields named "high" and "low", but I'm wondering if simply returning a wider integer type is not simpler (though that means i128/u128 will unfortunately lack the method).

I.e. add the following functions (and the equivalent for unsigned integers) to the standard library.

impl i8 {
    pub fn widening_mul(self, rhs: Self) -> i16 { ... }
}
impl i16 {
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
}
impl i32 {
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
}
impl i64 {
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}
impl isize {
    #[cfg(target_pointer_width = "16")]
    pub fn widening_mul(self, rhs: Self) -> i32 { ... }
    #[cfg(target_pointer_width = "32")]
    pub fn widening_mul(self, rhs: Self) -> i64 { ... }
    #[cfg(target_pointer_width = "64")]
    pub fn widening_mul(self, rhs: Self) -> i128 { ... }
}

eddyb · 2018-04-26T19:31:45Z

@JonSeverinsson Not sure there is much of a point in doing that because you can simply cast to the wider type and perform the multiplication.

le-jzr · 2018-04-26T20:18:59Z

Returning twice as wide type instead of a tuple has the issue that it wouldn't work on usize, which is the canonical use case for this kind of operation (at least until someone adds ireg/ureg).

strake · 2018-04-26T20:58:08Z

I'd be fine with returning either a tuple or a struct. As @le-jzr noted, returning twice-as-wide type is impossible for usize (unless we define u2size in core, which i consider far out of scope of this RFC).

kennytm · 2018-04-26T21:27:21Z

Does it make sense to define widening_mul on signed integers?

fn widening_mul(self, other: isize) -> (isize, usize);

clarfonthey · 2018-04-30T23:28:33Z

@kennytm absolutely imho.

scottmcm · 2018-05-04T08:33:48Z

Double-wide multiplication is a prerequisite of arbitrary-precision multiplication

I took a stab at the simplest version of that using the proposed method, and I think it ends up being something like this:

fn scalar_mul_assign(big: &mut Vec<u8>, scalar: u8) {
    let mut carry = 0;
    for i in big.iter_mut() {
        let (low, high) = i.wide_mul(scalar);
        let (low, overflowed) = low.overflowing_add(carry);
        *i = low;
        carry = high + if overflowed { 1 } else { 0 };
    }
    if carry > 0 {
        big.push(carry);
    }
}

That's not great, and I doubt LLVM will do a great job with it.

So I think it'd be good to have the carry built-in, so it can be something like

pub fn scalar_mul_assign2(big: &mut Vec<u8>, scalar: u8) {
    let mut carry = 0;
    for i in big.iter_mut() {
        let (x, c) = i.mul_with_carry(scalar, carry);
        *i = x;
        carry = c;
    }
    if carry > 0 {
        big.push(carry);
    }
}

Since one can always pass 0 as the carry, which LLVM will fold away easily.

Pratyush · 2018-05-06T23:10:44Z

+1, would make writing certain cryptographic code very simple

clarfonthey · 2018-05-10T16:44:18Z

One interesting use case for this might be a more efficient version of the multiply_and_divide function that num-integer uses internally:

/// Calculate r * a / b, avoiding overflows and fractions.
///
/// Assumes that b divides r * a evenly.
fn multiply_and_divide<T: Integer + Clone>(r: T, a: T, b: T) -> T {
    // See http://blog.plover.com/math/choose-2.html for the idea.
    let g = gcd(r.clone(), b.clone());
    r/g.clone() * (a / (b/g))
}

I mean, num-integer specifically almost certainly won't use this as it's not backwards compatible, but it is a nice use case.

strake · 2018-07-03T22:26:05Z

ping

have we reached consensus on the name widening_mul?

sm-Fifteen · 2018-11-30T19:07:15Z

I'd like to point out that the apfloat component of the official librustc has had a widening_mul function for well over a year, so there's some precedent for this in official rust projects, just not in the language itself.

TheIronBorn · 2018-11-30T19:20:39Z

The rand library uses wmul

eddyb · 2018-12-01T08:15:09Z

@sm-Fifteen It is no coincidence I wrote that function as a non-allocating bigint multiplication helper when I ported APFloat from C++ and #2417 (comment) - so don't count my "vote" for widening_mul twice, heh.

strake · 2018-12-05T08:11:44Z

Consensus for name seems to be widening_mul; modified RFC.

fstirlitz · 2019-01-19T15:38:07Z

What's the rationale for having u2size at all? Given that usize is supposed to be used for pointers, offsets and memory sizes, if multiplication involving it overflows, it means the result represents an amount of memory you cannot possibly allocate or refer to. You shouldn't even care what the value is: all that really matters is that it's too large to handle.

clarfonthey · 2019-01-19T16:32:27Z

@fstirlitz One use case would be intermediate results that may overflow, e.g. rationals. While the end result of x*y/z might fit in a usize, the product x*y may not.

strake · 2019-01-21T19:49:48Z

@fstirlitz I had 2 motivating use cases for u2size:

Intermediate values in arbitrary-precision multiplication
The which-trees-present value in smoothsort, which is word_size/log₂((1+√5)/2) bits

lu-zero · 2019-01-29T16:13:18Z

Would be good to have the variants that return the wider type directly, since it has direct uses for a number of common dsp operations.

strake · 2019-01-29T17:51:10Z

@lu-zero u2size is not in core/std; i merely cite it as an example use of this operation. For fixed-width types, you can merely cast to the wider type before multiplying, and the emitted code will use the machine instruction for widening multiplication.

hanna-kruppe · 2019-06-21T19:04:02Z

I think it's sad that this petered out. This is a small, useful API addition which apparently isn't too controversial or prone to vigorous bikeshedding (except the name, but that's now resolved). Maybe we should just skip ahead to implementing it? Evidently the RFC process is not working well for keeping this RFC PR moving.

Centril · 2019-06-21T20:39:41Z

Maybe we should just skip ahead to implementing it?

That's usually a good idea for small libs features.

clarfonthey · 2021-05-06T05:15:05Z

So I found myself needing this again today, and it looks like the overall decision was to just implement this? I would be willing to help do that, but I honestly have no idea how -- I'm not sure what LLVM intrinsics are used to do wide-multiplication, and I couldn't find the docs for it. If we at least add some intrinsics for this it would make implementing a working solution easier, and we can figure out the exact API later.

scottmcm · 2021-05-06T06:21:52Z

@clarfonthey there's no intrinsic for it; it represents it with n[su]w multiplication on the wider integer type.

Demo, with beautiful assembly: https://rust.godbolt.org/z/86of1z184

(I do think that people writing Rust shouldn't need to know that, though, and I could imagine the best way to implement it in cranelift could be different, which would be another reason to have an exposed specialized method for it.)

…ou-se Add carrying_add, borrowing_sub, widening_mul, carrying_mul methods to integers This comes in part from my own attempts to make (crude) big integer implementations, and also due to the stalled discussion in [RFC 2417](rust-lang/rfcs#2417). My understanding is that changes like these are best offered directly as code and then an RFC can be opened if there needs to be more discussion before stabilisation. Since all of these methods are unstable from the start, I figured I might as well offer them now. I tried looking into intrinsics, messed around with a few different implementations, and ultimately concluded that these are "good enough" implementations for now to at least put up some code and maybe start bikeshedding on a proper API for these. For the `carrying_add` and `borrowing_sub`, I tried looking into potential architecture-specific code and realised that even using the LLVM intrinsics for `addcarry` and `subborrow` on x86 specifically, I was getting exactly the same assembly as the naive implementation using `overflowing_add` and `overflowing_sub`, although the LLVM IR did differ because of the architecture-specific code. Longer-term I think that they would be best suited to specific intrinsics as that would make optimisations easier (instructions like add-carry tend to use implicit flags, and thus can only be optimised if they're done one-after-another, and thus it would make the most sense to have compact intrinsics that can be merged together easily). For `widening_mul` and `carrying_mul`, for now at least, I simply cast to the larger type and perform arithmetic that way, since we currently have no intrinsic that would work better for 128-bit integers. In the future, I also think that some form of intrinsic would work best to cover that case, but for now at least, I think that they're "good enough" for now. The main reasoning for offering these directly to the standard library even though they're relatively niche optimisations is to help ensure that the code generated for them is optimal. Plus, these operations alone aren't enough to create big integer implementations, although they could help simplify the code required to do so and make it a bit more accessible for the average implementor. That said, I 100% understand if any or all of these methods are not desired simply because of how niche they are. Up to you. 🤷🏻

scottmcm · 2022-09-18T19:19:50Z

This was accepted for nightly without the RFC, tracking issue rust-lang/rust#85532

https://doc.rust-lang.org/nightly/std/primitive.u32.html#method.widening_mul
https://doc.rust-lang.org/nightly/std/primitive.u32.html#method.carrying_mul

As such, I'm going to close this on the assumption that it's no longer needed. Feel free to reopen if so desired.

strake added 3 commits April 24, 2018 09:31

carrying_mul

28e836f

delete duplicate reference-level explanation

74d0010

actually commit guide-level explanation

9870ae9

note another alternative

fb029b4

joshtriplett added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Apr 25, 2018

scottmcm mentioned this pull request May 17, 2018

Add compensated_add for floats rust-lang/rust#50774

Closed

Centril added A-arithmetic Arithmetic related proposals & ideas A-inherent-impl Proposals related to inherent implementations A-primitive Primitive types related proposals & ideas labels Nov 22, 2018

strake changed the title ~~carrying_mul~~ widening_mul Dec 5, 2018

rename as widening_mul

329e58b

strake force-pushed the carrying_mul branch from c420699 to 329e58b Compare December 5, 2018 08:11

AaronKutch mentioned this pull request Dec 19, 2019

Greatly improve division performance for u128 and other cases rust-lang/compiler-builtins#332

Merged

KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020

AaronKutch mentioned this pull request Nov 25, 2020

Overhaul split integer handling and fix numerous issues rust-lang/compiler-builtins#384

Merged

clarfonthey mentioned this pull request May 7, 2021

Add carrying_add, borrowing_sub, widening_mul, carrying_mul methods to integers rust-lang/rust#85017

Merged

clarfonthey mentioned this pull request May 21, 2021

Tracking Issue for bigint helper methods rust-lang/rust#85532

Open

8 tasks

scottmcm closed this Sep 18, 2022

widening_mul #2417

widening_mul #2417

Conversation

strake commented Apr 24, 2018 • edited Loading

le-jzr commented Apr 24, 2018

strake commented Apr 24, 2018 • edited Loading

strake commented Apr 24, 2018

Ixrec commented Apr 24, 2018

le-jzr commented Apr 24, 2018

tspiteri commented Apr 24, 2018 • edited Loading

devonhollowood commented Apr 25, 2018

eddyb commented Apr 25, 2018

joshtriplett commented Apr 25, 2018 • edited Loading

scottmcm commented Apr 25, 2018

nagisa commented Apr 25, 2018 via email

tspiteri commented Apr 25, 2018

hanna-kruppe commented Apr 25, 2018

strake commented Apr 26, 2018 • edited Loading

clarfonthey commented Apr 26, 2018 • edited Loading

joshtriplett commented Apr 26, 2018

JonSeverinsson commented Apr 26, 2018

eddyb commented Apr 26, 2018

le-jzr commented Apr 26, 2018

strake commented Apr 26, 2018 • edited Loading

kennytm commented Apr 26, 2018

clarfonthey commented Apr 30, 2018

scottmcm commented May 4, 2018

Pratyush commented May 6, 2018

clarfonthey commented May 10, 2018

strake commented Jul 3, 2018

sm-Fifteen commented Nov 30, 2018

TheIronBorn commented Nov 30, 2018

eddyb commented Dec 1, 2018

strake commented Dec 5, 2018

fstirlitz commented Jan 19, 2019

clarfonthey commented Jan 19, 2019

strake commented Jan 21, 2019

lu-zero commented Jan 29, 2019

strake commented Jan 29, 2019

hanna-kruppe commented Jun 21, 2019 • edited Loading

Centril commented Jun 21, 2019

clarfonthey commented May 6, 2021

scottmcm commented May 6, 2021 • edited Loading

scottmcm commented Sep 18, 2022

strake commented Apr 24, 2018 •

edited

Loading

strake commented Apr 24, 2018 •

edited

Loading

tspiteri commented Apr 24, 2018 •

edited

Loading

joshtriplett commented Apr 25, 2018 •

edited

Loading

strake commented Apr 26, 2018 •

edited

Loading

clarfonthey commented Apr 26, 2018 •

edited

Loading

strake commented Apr 26, 2018 •

edited

Loading

hanna-kruppe commented Jun 21, 2019 •

edited

Loading

scottmcm commented May 6, 2021 •

edited

Loading