Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline double word integers #13

Closed
cmpute opened this issue Jun 24, 2022 · 12 comments
Closed

Inline double word integers #13

cmpute opened this issue Jun 24, 2022 · 12 comments

Comments

@cmpute
Copy link
Contributor

cmpute commented Jun 24, 2022

As mentioned in #6 (comment), inline integers larger than a word are feasible without increasing the size of UBig. Specifically, at most 3 words can be inlined in the struct. However, it doesn't benefit much to inline a 3 word (most algorithms don't have specialization for 3-word integers). However, inlining a double word integer is pretty easy and beneficial. I'm thinking of creating a PR to add double word into the UBig struct, resulting in something like:

pub struct UBig(Repr);
pub(crate) enum Repr {
    Small(Word), // change the name to "Single" sounds better in this case
    Double(DoubleWord),
    Large(Buffer),
}

How's your opinion on this please? 😃

@cmpute cmpute changed the title Inline u128 Inline double word integers Jun 24, 2022
@tczajka
Copy link
Owner

tczajka commented Jun 24, 2022

I think I'd rather have this done more generally, with up to N words inlined (where N is a constant, perhaps different on different platforms), rather than with a special case for 2.

The benefit wouldn't really be that there are better algorithms for 2 words (current representation doesn't prevent using those), but rather avoiding heap allocations.

@cmpute
Copy link
Contributor Author

cmpute commented Jun 24, 2022

I think I'd rather have this done more generally, with up to N words inlined (where N is a constant, perhaps different on different platforms), rather than with a special case for 2.

The benefit wouldn't really be that there are better algorithms for 2 words (current representation doesn't prevent using those), but rather avoiding heap allocations.

But I think the limit of inlined words (without increasing the size of UBig) will be 3 regardless of platform right? Correct me if I'm wrong. If arbitrary size is desired, then a very simple way is to replace Vec<Word> to something like smallvec

@tczajka
Copy link
Owner

tczajka commented Jun 24, 2022

But I think the limit of inlined words (without increasing the size of UBig) will be 3 regardless of platform right? Correct me if I'm wrong.

Usually, not always. Word isn't always same size as usize.

If arbitrary size is desired, then a very simple way is to replace Vec<Word> to something like smallvec.

Yes. That would be good I think. Also then there is no real need for the Large / Small distinction, because that's redundant with the smallvec length. So that would additionally save space for the enum discriminant.

However I have tried this a while ago and saw significant performance deterioration for some reason. I may have badly interacted with some optimizations. There is probably a good way to do this.

@cmpute
Copy link
Contributor Author

cmpute commented Jun 24, 2022

I guess it might be better to implement a customized barebone smallvec. The following two structs somehow have the same memory footprint:

use smallvec::SmallVec;

enum UBig {
    Small(usize),
    Large(Vec<usize>)
}

struct UBig2(SmallVec<[usize; 2]>); // it should be able to inline [usize; 3]

fn main() {
    dbg!(std::mem::size_of::<UBig>());
    dbg!(std::mem::size_of::<UBig2>());
}

And one performance problem for inlining integers with larger size is that, you have to keep track of the position of highest non-zero word. The overhead of this can be large for basic arithmetic operations. This is also why I think double word is a sweet spot for inlining. 3-word is also feasible if we have inlined double word since we don't have to track the zero state of the highest word, but it will lead to explosion of number of matching arms.

@cmpute
Copy link
Contributor Author

cmpute commented Jun 24, 2022

Besides, a quick and easy improvement in terms of memory usage is to let Buffer and Memory allocate on stack if the oprands are not large enough.

@tczajka
Copy link
Owner

tczajka commented Jun 24, 2022

And one performance problem for inlining integers with larger size is that, you have to keep track of the position of highest non-zero word.

The same would be true for Double(DoubleWord). After adding or subtracting two Doubles, it will have to check whether the result has 1, 2 or 3 words.

@cmpute
Copy link
Contributor Author

cmpute commented Jun 25, 2022

And one performance problem for inlining integers with larger size is that, you have to keep track of the position of highest non-zero word.

The same would be true for Double(DoubleWord). After adding or subtracting two Doubles, it will have to check whether the result has 1, 2 or 3 words.

hmm, that's correct

@cmpute
Copy link
Contributor Author

cmpute commented Jun 26, 2022

Usually, not always. Word isn't always same size as usize.

I actually just looked round for this, GMP says in their documentation:

A limb means the part of a multi-precision number that fits in a single machine word. (We chose this word because a limb of the human body is analogous to a digit, only larger, and containing several digits.) Normally a limb is 32 or 64 bits. The C data type for a limb is mp_limb_t.

And in their header files, they have the option macro __GMP_SHORT_LIMB to set the limb size to half word. But they only use this on the m68xx chipset, which is already 40 years old. In another part of their changelog, they said

* gmp-h.in, mp-h.in (__GMP_SHORT_LIMB): Renamed from _SHORT_LIMB, to keep in our namespace. (Not actually used anywhere currently.) Reported by Patrick Pelissier.

So I can confidently say that for GMP, a word is almost always usize. Do you have any actual situation where word being not usize is beneficial? I think it's better to tie the word to usize (and therefore the internal buffer could be exposed).

@tczajka
Copy link
Owner

tczajka commented Jun 26, 2022

Do you have any actual situation where word being not usize is beneficial?

On x32 architecture (x86_64-unknown-linux-gnux32), usize is 32 bits, Word is 64 bits because the CPU has 64-bit registers.

@cmpute
Copy link
Contributor Author

cmpute commented Jun 26, 2022

Ok I thought you were referring to some cases where the processor doesn't provide enough arithmetic support for native bit size. But in this case, it makes more sense to me to use 32bit word and inline 64bit double word, which still utilize the full CPU capability.

@tczajka
Copy link
Owner

tczajka commented Jun 26, 2022

But in this case, it makes more sense to me to use 32bit word

Definitely not -- Word is the native machine word, which in this case is 64 bits. Using smaller words would slow down all algorithms.

@tczajka
Copy link
Owner

tczajka commented Jun 27, 2022

Closing. Inlining a few words on the stack might be beneficial, but specializing for DoubleWord specifically doesn't seem worth it -- operations on DoubleWords are not native anyway, and this would add a lot of special cases (Double * Double, Double * Single, Double * Large etc) for a relatively small benefit.

@tczajka tczajka closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants