`Integer` as enum type and optimized constrained and variable-sized integer encoding #289

Nicceboy · 2024-08-04T20:21:15Z

Enum approach seemed to go more complex than it should with the IntegerType trait so I just tried wrapping struct with generic as

/// `Integer`` type is newtype wrapper for any integer type which implements `IntegerType` trait. Uses `i128` by default if no type defined.
#[derive(Debug, Clone, Ord, Hash, Eq, PartialEq, PartialOrd)]
#[non_exhaustive]
pub struct Integer<I: IntegerType = i128>(pub I)

which seems to work fine in the end. It also easily allows adding any new Integer type that implements the trait.
At least encoding speed for OER is more than doubled now for integers.

One concern is that there are many standards which are defined just with Integer type and now they internally default to i128 type. Is that a problem?

XAMPPRocky · 2024-08-04T20:38:54Z

One concern is that there are many standards which are defined just with Integer type and now they internally default to i128 type. Is that a problem?

I think in terms of correctness, yes, because this is now fixed size, where as integer with constraints is not that. Now in fairness it's such a large integer that it likely exceeds any fixed size optimisation. But I'm generally unsure about not matching that behaviour of a variable sized integer in terms of API, because with this implementation, all libraries would have to re-export the integer generic to allow it to be configurable by applications, and I don't think that we should do that.

Nicceboy · 2024-08-04T20:52:35Z

Would it be better if we default to BigInt internally instead? I think that would work similarly as earlier. It is just slower by default. Instead of type alias it would be just newtype.

all libraries would have to re-export the integer generic to allow it to be configurable by applications, and I don't think that we should do that.

Do you mean cases that they want to use Integer defined in standards differently than originally? Yeah, that does not work with this approach, unless we feature gate the default type. Maybe that would be enough for most.

But I don't think this will get any better in terms of performance, since now there is no branching in encoding, and its quite similar than the previously added decoding approach.

Code with enum approach gets quite complex, but it is possible, at least if we just use single primitive type internally.

Edit:

Enum could be used with internal generic for primitive type but then there is again one default type which cannot be changed, when reusing standards, for example.

Alternatively I tried variants of every primitive integer type but that added much of complexity and branching.

Easiest one is just using that one primitive type (e.g. i128) which makes possible to avoid allocations (what I initially did). With the current state of the library, there are no any benefits for smaller integer types in terms of performance, unless we go for very low-end devices.

There were also some challenges on benefiting from the added trait in the case of enum, which might make it confusing. But maybe it does not matter.

XAMPPRocky · 2024-08-05T12:36:49Z

Code with enum approach gets quite complex, but it is possible, at least if we just use single primitive type internally.

I do think the enum approach is the way to go forward, because I think it offers the best tradeoffs I think how I would design the enum is like the following, using isize and Box rather u128. Because with this the size of the enum is 16 bytes, rather the 32 bytes on 64 bit systems (and smaller on 32 bit systems which a lot of embedded hardware uses), so we're halving the overall size which should make it slightly faster and easier to optimise.

enum Integer {
    Primitive(isize),
    Variable(Box<num_bigint::BigInt>),
}

I also think to make the implementation easier, let's start by adding the type to crate but not replacing the integer type, and then replacing Integer usage inside codecs (e.g. internals like length decoding). Then once that is covered, focus on replacing it as the Integer, that way we don't need to focus on making a complete integer API, we can focus on just what codecs need at first, and then focus on making a good public API.

Nicceboy · 2024-08-05T12:49:09Z

I see.

What would you think if I try to do it as:

pub enum Integer<P: IntegerType + PrimInt = isize> {
    Primitive(P),
    Variable(Box<BigInt>),
}

That would make it easier to use internally, e.g. if we use primitive types, like u64 when defining ASN.1 types, then for optimal usage they would need to be always then casted to isize type and that truncates the value. I was using i128 initially for that reason, to avoid conversion to BigInt, while avoiding truncation. But above allows their usage directly.

XAMPPRocky · 2024-08-05T15:02:14Z

That would make it easier to use internally, e.g. if we use primitive types, like u64 when defining ASN.1 types, then for optimal usage they would need to be always then casted to isize type and that truncates the value. I was using i128 initially for that reason, to avoid conversion to BigInt, while avoiding truncation. But above allows their usage directly.

I think there's no getting around converting to BigInt, if we want to have this as an API, I would it expect it to handle over and underflowing into a BigInt even if it was i128, and with that in mind, I don't think we'd get a lot of value from having it be generic, because using a smaller integer type, wouldn't actually make the type smaller, it would always be at least the size of a pointer, so using a Integer<u8> shouldn't have any performance difference to Integer<u64> because it will still be 16 bytes.

Nicceboy · 2024-08-05T15:41:52Z

That would make it easier to use internally, e.g. if we use primitive types, like u64 when defining ASN.1 types, then for optimal usage they would need to be always then casted to isize type and that truncates the value. I was using i128 initially for that reason, to avoid conversion to BigInt, while avoiding truncation. But above allows their usage directly.

I think there's no getting around converting to BigInt, if we want to have this as an API, I would it expect it to handle over and underflowing into a BigInt even if it was i128, and with that in mind, I don't think we'd get a lot of value from having it be generic, because using a smaller integer type, wouldn't actually make the type smaller, it would always be at least the size of a pointer, so using a Integer<u8> shouldn't have any performance difference to Integer<u64> because it will still be 16 bytes.

I think we are thinking of different kind of performance. The main issue with BigInt is that it always uses system call to allocate memory and then frees it. It is very costly operation. Even if there is no underflow or overflow, every constrained type is always converted to BigInt. With constrained types, I would not like that to happen, and conversion would happen only if the types are summed or subtracted on the application size. Underflow or overflow during encoding/decoding with constrained types is typically a constraint error. Most modern standards I see are mostly constrained, particularly because of the performance benefits in encoding/decoding speed (reduced memory usage benefit is secondary).

If we always use the variable sized type, we lose the one big benefit of using Rust, and even Java might be faster (with current BigInt types it actually is). Rust would be a replacement for C/C++ in many areas, but it isn't if it does not provide competitive speed, and as a result safer language like C# or something else is chosen.

Some areas have high requirements for encoding/decoding speed, but they suffer the security issues of C/C++ when relying on them, and we could achieve the same performance by using Rust, while mitigating all the memory issues and also moving many other bugs to compile-time.

Main reason I have been working on this crate is to improve the above, but the C/C++ replacement is not competitive enough if we don't use the stack more.

For non-constrained types it is feasible to add the automatic conversion, but the main benefit would come from the stack-based constrained types by not using any system calls.

XAMPPRocky · 2024-08-15T12:40:12Z

I think we are thinking of different kind of performance. The main issue with BigInt is that it always uses system call to allocate memory and then frees it. It is very costly operation. Even if there is no underflow or overflow, every constrained type is always converted to BigInt.

I'm not sure I understand, because:

All decodes on constrained integers are decoded directly into their primitive type. (This still needs to be done for encodes).
This discussion has been explicitly about unconstrained integers, which is what types::Integer is.
If you're using an enum that I showed above, it would not allocate memory until we allocate a Variable variant, which would only happen on underflow and overflow.

Nicceboy · 2024-08-15T13:29:43Z

Maybe the title was was bit misleading in that sense. This PR attempted to solve all cases of using integers, particularly the issue with constraints which are defined by using primitive integers (which I counted as integers too).

All decodes on constrained integers are decoded directly into their primitive type. (This still needs to be done for encodes).

My primary reason for suggesting the generic parameter for enum was kinda this, because all the primary types are converted to general Integer type currently. With generic they could be used as they are, just wrapped to Integer type internally, and encode with similar logic than anything else. With the generic in enum we can handle the both cases, and allocation happens only if the type is larger than any type Rust supports, if decided so, and the memory footprint could be always the smallest possible. I think it would need larger changes than there are benefits if we change the current behavior by other means, if we use other than encode_integer trait method for primitives.

In case there would be separate implementation also for encoding primitives directly on top of Integer type which uses isize internally for lower values, then it is likely okay, but personally I am not sure if it is worth it, just to avoid wrapping the type for Integer type internally.

Or maybe easy option now than I think it, is to use plain generic as parameter for encode_integer, and we just use Integer as your suggested new variable-sized enum Integer type, which then implements the IntegerType trait, and we would need only small changes for this PR, so that the above function accepts all types.
Also decoding needs only slight changes.

Maybe I can explore it a bit but that would be the fourth implementation I have tried.

XAMPPRocky · 2024-08-15T13:48:37Z

Or maybe easy option now than I think it, is to use plain generic as parameter for encode_integer, and we just use Integer as your suggested new variable-sized enum Integer type, which then implements the IntegerType trait, and we would need only small changes for this PR, so that the above function accepts all types.
Also decoding needs only slight changes.

Maybe I can explore it a bit but that would be the fourth implementation I have tried.

I believe that is what I originally stated we should do for encoding here: #254 (comment)

Nicceboy · 2024-08-15T13:55:22Z

I would have appreciated a bit more information, as it also currently uses the trait in this PR as your comment suggests, but just for variants of enum instead of direct type.

XAMPPRocky · 2024-08-15T17:51:29Z

Sorry for not being clear.

I think for encoding it should essentially match the way decode_integer works, except instead of just a generic argument, we replace the current types::Integer argument with I: types::IntegerType and provide a new method to_bytes_be to convert it to bytes. That way we can accept all integer types directly, and not perform any conversions for encoding integers.

Separately I think it's worth moving types::Integer to an enum that we showed before, because then for unconstrained integers we can write it so the encode/decode code can decode into a isize integer when possible, and only use BigInt when it exceeds the max size, this would heavily reduce the amount of heap allocations in standards which don't use constrained integers so we can't exclusively use primitive types but nevertheless don't have large integer values.

If there's anything unclear, please feel free to open a discussion, and we can discuss it more in depth.

Nicceboy · 2024-08-16T12:54:11Z

I think it is quite clear now. There was misunderstanding with the constrained and variable-sized integers. I though that with isize we try to optimize both cases. This PR needs only small changes to match above.

Nicceboy · 2024-09-03T16:12:30Z

There is one, maybe minor behavior left which should be decided (just about how PartialEq/Eq works, or what is the level of performance optimization we should consider) for

#[derive(Debug, Clone, Ord, Hash, Eq, PartialEq, PartialOrd)]
pub enum Integer {
    Primitive(isize),
    Variable(Box<BigInt>),
}

Currently builds fail on 32-bit platforms because when decoding constrained Integer types (at least on OER), isize is sized for 32 bit, and if we decode constrained integer which has around 64-bit range (so the type is not actually using i64 directly), for example, it does not possibly fit for isize. However, it many cases it could fit if the byte stream just has leading zeros or ones, which do not actually carry the value.

Some logic could be implemented in codecs in order to attempt to identify that, and drop the leading bytes, or the easy solution is to decode is as Variable sized. However, this will result into different types when encoding the type and decoding it, and the comparing the results.

Encoding part currently is aware about leading bytes (because it is necessary for fixed-size integers) and it does not convert to bigger type just because of the required leading bytes, and always uses the isize when possible.
So roundtrip will result to different types internally, and currently equality check fails.

Should we try to optimize on codec level that they drop leading bytes so more values can be used with isize when decoding or we just make equality check based on the value alone and not the internal type?

I think that checking those leading bytes is not worth the effort currently, as it mainly impacts 32 bit platforms, and I will change the equality just to compare numeric values.

XAMPPRocky · 2024-09-04T08:00:41Z

Should we try to optimize on codec level that they drop leading bytes so more values can be used with isize when decoding or we just make equality check based on the value alone and not the internal type?

I believe the other codecs do remove leading bytes first, if that's correct would probably be good to match.

I think that checking those leading bytes is not worth the effort currently, as it mainly impacts 32 bit platforms, and I will change the equality just to compare numeric values.

Make an issue for it and it can be pursued and tested separately. It's not needed for the initial implementation.

Nicceboy · 2024-09-04T21:37:57Z

I believe this can be reviewed at this point.

Everything integer related is now moved into types/integer.rs.

Closes #308 #296

Some initial bench results below based on this . UPER does not get as significant improvements because there are many other allocations. I was planning to remove as many allocations as possible next, starting from OER.

XAMPPRocky

Have one question but doesn't need to be addressed in this PR.

Thank you for your PR! And thank you for working on these integer improvements over the past few months!

XAMPPRocky · 2024-09-05T06:48:10Z

src/de.rs

    isize,
    u8,
    u16,
    u32,
    u64,
+    // TODO cannot support u128 as it is constrained type by default and current constraints uses i128 for bounds
+    // u128,


Can we implement this with its signed variant being BigInt?

Could you clarify this bit more?

Casting u128 to signed bytes indeed currently changes the meaning. We could use BigInt there.

This particular line was more related to how constraints work currently, which uses i128 for bounds, in here:

rasn/src/types.rs

Lines 270 to 296 in 0aa23e3

macro_rules! asn_integer_type {

($($int:ty),+ $(,)?) => {

$(

impl AsnType for $int {

const TAG: Tag = Tag::INTEGER;

const CONSTRAINTS: Constraints<'static> = Constraints::new(&[

constraints::Constraint::Value(Extensible::new(constraints::Value::new(constraints::Bounded::const_new(<$int>::MIN as i128, <$int>::MAX as i128)))),

]);

}

)+

}

}

asn_integer_type! {

i8,

i16,

i32,

i64,

i128,

isize,

u8,

u16,

u32,

u64,

u128, // TODO upper constraint truncated

usize,

}

Maybe we could change the upper constraint bound to be u128 type. That makes checking the bounds more challenging as more type conversions are included but it should be possible.

Actually I tried to change the constraint behavior, and it does not seem to be easy.

E.g. if we add more types for

#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)] pub enum Bounded<S, E> { #[default] None, Single(S), Range { start: Option<S>, end: Option<E>, }, }

We would need to re-adjust the constraint logic a lot, because many functions currently return single type, and expect that the type is same for Single or Range variants, for example.

Nicceboy marked this pull request as draft August 4, 2024 20:21

Nicceboy force-pushed the integer-encoding branch 2 times, most recently from 3eeed71 to b1656bf Compare August 5, 2024 00:23

Nicceboy closed this Aug 15, 2024

Nicceboy reopened this Aug 16, 2024

Nicceboy force-pushed the integer-encoding branch from 57423e9 to 9bbe220 Compare August 22, 2024 14:04

Nicceboy changed the title ~~Newtype Integer and optimized integer encoding~~ Integer as enum type and optimized constrained and variable-sized integer encoding Aug 22, 2024

Nicceboy force-pushed the integer-encoding branch from 7abefe0 to 40a3f78 Compare August 22, 2024 14:34

Nicceboy added 10 commits August 23, 2024 13:37

Initial working version for optimized integer encoding

26d37ac

Add int bench

c9c3d90

OER: Reduce integer-related allocations

3ec3ef2

Add some docs, typo fixes

e22b81a

Fix jer integer encoding

019ff68

Fix integer encoding of unsigned as signed bytes, cleanup

649be39

PER: reduce unwraps on integer encoding

2250c98

Initial adjusted version

956b382

Fix jer integer encoding

6e254e3

OER: remove unnecessary use of custom to_vec method on BitVec

11cf3a0

Nicceboy force-pushed the integer-encoding branch from 40a3f78 to 11cf3a0 Compare August 23, 2024 10:40

Integer types cleanup

1f11a98

Impl Encode for BigInt, clippy fixes, custom PartialEq

f72a7a0

Nicceboy force-pushed the integer-encoding branch from 04e8c95 to f72a7a0 Compare September 3, 2024 16:17

Nicceboy added 2 commits September 4, 2024 14:21

Bug fixes, clippy, associated UnsignedPair and SignedPair types

3af53e5

Docs

bf67724

Nicceboy mentioned this pull request Sep 4, 2024

Decoding of signed unsigned bytes or unsigned signed bytes currently truncates the value and PER has some relevant logic problems #308

Closed

Nicceboy added 10 commits September 4, 2024 14:59

Unnecessary full type in Decode for ConstrainedInteger

eea2331

Add extra integer tests

1f770b5

All int bench tests

15aded2

PER: explicit usize decoding for choice

2998960

Remove unnecessary parentheses

f76fb80

Inline some core functions of IntegerType trait

e8aa299

PER: parse integer small optimization

92d28a7

Make constraint minimum copy explicit

8f8eda6

One clone less in constraints

73ab9f6

More integer tests

0ff9f8b

Nicceboy marked this pull request as ready for review September 4, 2024 21:38

XAMPPRocky approved these changes Sep 5, 2024

View reviewed changes

XAMPPRocky merged commit 0417ff5 into librasn:main Sep 5, 2024
65 checks passed

This was referenced Sep 5, 2024

chore: release v0.17.0 #309

Merged

chore: release v0.17.1 #312

Merged

chore: release v0.17.2 #315

Merged

chore: release v0.17.3 #316

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Integer` as enum type and optimized constrained and variable-sized integer encoding #289

`Integer` as enum type and optimized constrained and variable-sized integer encoding #289

Nicceboy commented Aug 4, 2024

XAMPPRocky commented Aug 4, 2024

Nicceboy commented Aug 4, 2024 •

edited

Loading

XAMPPRocky commented Aug 5, 2024 •

edited

Loading

Nicceboy commented Aug 5, 2024 •

edited

Loading

XAMPPRocky commented Aug 5, 2024 •

edited

Loading

Nicceboy commented Aug 5, 2024

XAMPPRocky commented Aug 15, 2024 •

edited

Loading

Nicceboy commented Aug 15, 2024 •

edited

Loading

XAMPPRocky commented Aug 15, 2024

Nicceboy commented Aug 15, 2024

XAMPPRocky commented Aug 15, 2024

Nicceboy commented Aug 16, 2024

Nicceboy commented Sep 3, 2024 •

edited

Loading

XAMPPRocky commented Sep 4, 2024

Nicceboy commented Sep 4, 2024

XAMPPRocky left a comment

XAMPPRocky Sep 5, 2024

Nicceboy Sep 5, 2024

Nicceboy Sep 6, 2024

	macro_rules! asn_integer_type {
	($($int:ty),+ $(,)?) => {
	$(
	impl AsnType for $int {
	const TAG: Tag = Tag::INTEGER;
	const CONSTRAINTS: Constraints<'static> = Constraints::new(&[
	constraints::Constraint::Value(Extensible::new(constraints::Value::new(constraints::Bounded::const_new(<$int>::MIN as i128, <$int>::MAX as i128)))),
	]);
	}
	)+
	}
	}

	asn_integer_type! {
	i8,
	i16,
	i32,
	i64,
	i128,
	isize,
	u8,
	u16,
	u32,
	u64,
	u128, // TODO upper constraint truncated
	usize,
	}

Integer as enum type and optimized constrained and variable-sized integer encoding #289

Integer as enum type and optimized constrained and variable-sized integer encoding #289

Conversation

Nicceboy commented Aug 4, 2024

XAMPPRocky commented Aug 4, 2024

Nicceboy commented Aug 4, 2024 • edited Loading

XAMPPRocky commented Aug 5, 2024 • edited Loading

Nicceboy commented Aug 5, 2024 • edited Loading

XAMPPRocky commented Aug 5, 2024 • edited Loading

Nicceboy commented Aug 5, 2024

XAMPPRocky commented Aug 15, 2024 • edited Loading

Nicceboy commented Aug 15, 2024 • edited Loading

XAMPPRocky commented Aug 15, 2024

Nicceboy commented Aug 15, 2024

XAMPPRocky commented Aug 15, 2024

Nicceboy commented Aug 16, 2024

Nicceboy commented Sep 3, 2024 • edited Loading

XAMPPRocky commented Sep 4, 2024

Nicceboy commented Sep 4, 2024

XAMPPRocky left a comment

Choose a reason for hiding this comment

XAMPPRocky Sep 5, 2024

Choose a reason for hiding this comment

Nicceboy Sep 5, 2024

Choose a reason for hiding this comment

Nicceboy Sep 6, 2024

Choose a reason for hiding this comment

`Integer` as enum type and optimized constrained and variable-sized integer encoding #289

`Integer` as enum type and optimized constrained and variable-sized integer encoding #289

Nicceboy commented Aug 4, 2024 •

edited

Loading

XAMPPRocky commented Aug 5, 2024 •

edited

Loading

Nicceboy commented Aug 5, 2024 •

edited

Loading

XAMPPRocky commented Aug 5, 2024 •

edited

Loading

XAMPPRocky commented Aug 15, 2024 •

edited

Loading

Nicceboy commented Aug 15, 2024 •

edited

Loading

Nicceboy commented Sep 3, 2024 •

edited

Loading