-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arbitrary-precision integer to std #1081
Conversation
A few notes on the implementation: - Any unsigned power of two integer type less than 64 bits in size is supported as a Limb type. - The algorithms used are kept simple for the moment. More complicated algorithms are generally only more useful as integer sizes increase a lot and I don't expect our current usage to be used for this purpose just yet. - All branches (practically) have been covered by tests. See https://github.com/tiehuis/zig-bn/tree/986a2b3243d0454b8430a6adf4ad48611850c1b8/bench for rough performance comparison numbers. Closes #364.
@@ -464,6 +464,7 @@ set(ZIG_STD_FILES | |||
"math/atan.zig" | |||
"math/atan2.zig" | |||
"math/atanh.zig" | |||
"math/big/int.zig" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just following a standard structure, same as Go.
// Converts primitive integer values onto a stack-based big integer, or passes through existing | ||
// Int types with no modifications. This can fail at runtime if using a very large dynamic | ||
// integer but it is very unlikely and is considered a user error. | ||
fn wrapInt(allocator: *Allocator, bn: var) *const Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I have this to be unconditional (not return an error) is that something like cmp
should work without error. This does mean there is an upper limit on comptime integers passed as arguments. Fine in practice, a user can simply call Bn.set
first in this case.
The wrapper_buffer_size
is larger than it needs to be as well. Technically 512
gives us space for 512 * 8 = 4096
bit comptime arguments, but the maximum size is actually smaller 512
. Probably just overlooked something or the allocator is over-allocating (shouldn't be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense. We should be able to get a compile error (not runtime error) when the value cannot fit, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it starts to get too complicated, we can always cast the wrapInt
value to u64
/i64
(compile error if it does not fit) and then document that one should use Bn.set
for values outside this range.
return !r.isOdd(); | ||
} | ||
|
||
fn bitcount(self: *const Int) usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loosely, but did we want to expose the LLVM ctpop
intrinsic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that is #767
self.len += 1; | ||
|
||
// TODO: shift == 64 at compile-time fails. Fails on u128 limbs. | ||
w_value >>= Limb.bit_count / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few deficiencies in the stage1 compiler's big integer library. I'll make a pass through at some point and fix up the edge cases as these issues typically result in silent comptime errors which is the worst kind.
Great timing! I think I'll be needing this within the next couple weeks for the self hosted compiler. |
// rma = a * b | ||
// | ||
// For greatest efficiency, ensure rma does not alias a or b. | ||
pub fn mul(rma: *Int, av: var, bv: var) !void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have actually implemented Karatsuba (and Toom-3) variants for this, but they were a bit unclean to implement well (on low-level limbs with temporary allocations) so I haven't pushed them anywhere just yet. These are only useful for balanced operand sizes as well, so I expect the main use case right now wouldn't get any benefits. Unbalanced multiplication requires some other algorithms which I haven't explored in detail.
One additional note: Did we want a rational big number type at this point for anything? This is pretty easy to implement atop this. |
} | ||
} | ||
|
||
pub const Int = struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some ideas that might be worth exploring, but maybe you already considered them, and have some thoughts?
Also I don't consider any of these merge blockers.
allocator in function params instead of Int struct
Downsides: introduces a new kind of footgun where you can use one allocator for initializing and then a different one for addition or something and that would be undefined behavior. This could be mitigated in several ways:
- Int could include allocator as a field, but only in debug mode and assert that passed allocator matches.
- Generally, allocators can include safety in debug mode that make note of the
*Allocator
address as metadata along side allocated data, and can assert that a free/realloc uses the same address. - Even more generally, the
Allocator
interface itself, could do this on top of every allocator implementation.
Another downside: less convenient from an API perspective
Benefits:
- less memory usage
- better cache performance?
union for limbs to handle the case where the int is <= @maxValue(Limb)
This is for higher performance and to avoid using memory allocation when possible. In general we would like to use big ints when the value could possibly be larger than one of the builtin int sizes, but even in these cases it is likely that a usize
could store it after all.
try to use less memory
One example source file is std/special/compiler_rt/udivmodti4_test.zig
. It's 65300 lines and each line has 4 bigints. At 40 bytes per big int, this comes out to 10MB. If big ints were 16 bytes instead, this comes out to 4MB, which is a pretty significant difference.
Related: #471
Here are some ideas to conserve memory:
comptime {
// status quo
// 40 bytes on x86_64
// 20 bytes on i386
@compileLog(@sizeOf(struct {
allocator: *u8,
positive: bool,
limbs: []i32,
len: usize,
}));
// no allocator field
// 32 bytes on x86_64
// 16 bytes on i386
@compileLog(@sizeOf(struct {
positive: bool,
limbs: []i32,
len: usize,
}));
// Use len to manage heap allocation
// 24 bytes on x86_64
// 12 bytes on i386
@compileLog(@sizeOf(struct {
limbs: packed union {
Heap: [*]u8,
Single: usize,
},
len: usize,
positive: bool,
}));
// u32 for len
// 16 bytes on x86_64
// 12 bytes on i386
@compileLog(@sizeOf(struct {
limbs: packed union {
Heap: [*]u8,
Single: usize,
},
len: u32,
positive: bool,
}));
// use the most significant bit of len for positive
// 16 bytes on x86_64
// 8 bytes on i386
@compileLog(@sizeOf(packed struct {
limbs: packed union {
Heap: [*]u8,
Single: usize,
},
positive: bool,
len: @IntType(false, usize.bit_count - 1),
}));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this all comes down to the difference in usage of arbitrary precision integers. For mathematical/cryptographic purposes you generally don't have a large number of objects and they are typically very large so you don't need to specialize small sizes. The compiler on the other hand uses it as its default integer type and allocates many of them at once.
Let me think about this a bit and I'll get back to you. I'm still inclined to prefer the allocator field still stored with the type as is since it simplifies the usage and avoids the possible use errors as suggested. I'll see how much the implementation changes special-casing small limbs and/or changing the length representation. At the least, packing the pointer in the length is a worthwhile change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think about this a bit and I'll get back to you.
Sounds good.
Another argument in favor of how you did it here is that the compiler is the one which has weird memory requirements, and thus can be the one to do the tricks. For example, in the compiler the type can look something like this:
const Value = union(enum) {
Pointer: *PointerInfo,
Bool: bool,
SmallSignedInt: i64,
SmallUnsignedInt: u64,
Int: *std.math.big.Int,
};
So here we've managed to keep the base Value
type down to @sizeOf(usize)
, at the cost of a little complexity initializing and dealing with some of the extra info that requires more bytes to store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I think for the moment I'll use the type:
struct {
allocator: *u8,
limbs: []u8,
metadata: usize,
}
Where the top-bit of metadata
is the sign bit, with the remaining being the length of the limbs. This is 32 bytes so a small 8 byte saving. We can drop this down to 24 bytes in the future by using a u31/u32
for the cap and length if we want a bit more space (limiting the max Int
size to 16gb, however).
Also note that general purpose memory allocators may not allocate certain small bin sizes (jemalloc for instance only does 16, 32, 48...) so certain byte sizes may not be as useful. This isn't always the case though, ptmalloc2 (libc) for instance looks like it does 8-byte strides so it doesn't matter there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an internal change/optimization and I don't expect the API to change, I'll merge now and get around to doing this when it is used/later.
I think it would make sense to use that for the |
This is required for the self-hosted implementation so here it is. The main thing remaining that would be useful (besides general performance improvements) would be randomized testing against another library (GMP). All branches have been covered however and since we don't employ any complex algorithms this isn't a huge concern just yet.
A few notes on the implementation:
as a Limb type.
algorithms are generally only more useful as integer sizes increase a
lot and I don't expect our current usage to be used for this purpose
just yet.
See https://github.com/tiehuis/zig-bn/tree/986a2b3243d0454b8430a6adf4ad48611850c1b8/bench
for rough performance comparison numbers.
Closes #364.