-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer
as enum type and optimized constrained and variable-sized integer encoding
#289
Conversation
I think in terms of correctness, yes, because this is now fixed size, where as integer with constraints is not that. Now in fairness it's such a large integer that it likely exceeds any fixed size optimisation. But I'm generally unsure about not matching that behaviour of a variable sized integer in terms of API, because with this implementation, all libraries would have to re-export the integer generic to allow it to be configurable by applications, and I don't think that we should do that. |
Would it be better if we default to
Do you mean cases that they want to use Integer defined in standards differently than originally? Yeah, that does not work with this approach, unless we feature gate the default type. Maybe that would be enough for most. But I don't think this will get any better in terms of performance, since now there is no branching in encoding, and its quite similar than the previously added decoding approach. Code with enum approach gets quite complex, but it is possible, at least if we just use single primitive type internally. Edit: Enum could be used with internal generic for primitive type but then there is again one default type which cannot be changed, when reusing standards, for example. Alternatively I tried variants of every primitive integer type but that added much of complexity and branching. Easiest one is just using that one primitive type (e.g. i128) which makes possible to avoid allocations (what I initially did). With the current state of the library, there are no any benefits for smaller integer types in terms of performance, unless we go for very low-end devices. There were also some challenges on benefiting from the added trait in the case of enum, which might make it confusing. But maybe it does not matter. |
3eeed71
to
b1656bf
Compare
I do think the enum approach is the way to go forward, because I think it offers the best tradeoffs I think how I would design the enum is like the following, using enum Integer {
Primitive(isize),
Variable(Box<num_bigint::BigInt>),
} I also think to make the implementation easier, let's start by adding the type to crate but not replacing the integer type, and then replacing |
I see. What would you think if I try to do it as: pub enum Integer<P: IntegerType + PrimInt = isize> {
Primitive(P),
Variable(Box<BigInt>),
} That would make it easier to use internally, e.g. if we use primitive types, like |
I think there's no getting around converting to |
I think we are thinking of different kind of performance. The main issue with If we always use the variable sized type, we lose the one big benefit of using Rust, and even Java might be faster (with current Some areas have high requirements for encoding/decoding speed, but they suffer the security issues of C/C++ when relying on them, and we could achieve the same performance by using Rust, while mitigating all the memory issues and also moving many other bugs to compile-time. Main reason I have been working on this crate is to improve the above, but the C/C++ replacement is not competitive enough if we don't use the stack more. For non-constrained types it is feasible to add the automatic conversion, but the main benefit would come from the stack-based constrained types by not using any system calls. |
I'm not sure I understand, because:
|
Maybe the title was was bit misleading in that sense. This PR attempted to solve all cases of using integers, particularly the issue with constraints which are defined by using primitive integers (which I counted as integers too).
My primary reason for suggesting the generic parameter for enum was kinda this, because all the primary types are converted to general In case there would be separate implementation also for encoding primitives directly on top of Or maybe easy option now than I think it, is to use plain generic as parameter for Maybe I can explore it a bit but that would be the fourth implementation I have tried. |
I believe that is what I originally stated we should do for encoding here: #254 (comment) |
I would have appreciated a bit more information, as it also currently uses the trait in this PR as your comment suggests, but just for variants of enum instead of direct type. |
Sorry for not being clear. I think for encoding it should essentially match the way Separately I think it's worth moving If there's anything unclear, please feel free to open a discussion, and we can discuss it more in depth. |
I think it is quite clear now. There was misunderstanding with the constrained and variable-sized integers. I though that with |
57423e9
to
9bbe220
Compare
Integer
and optimized integer encodingInteger
as enum type and optimized constrained and variable-sized integer encoding
7abefe0
to
40a3f78
Compare
40a3f78
to
11cf3a0
Compare
There is one, maybe minor behavior left which should be decided (just about how #[derive(Debug, Clone, Ord, Hash, Eq, PartialEq, PartialOrd)]
pub enum Integer {
Primitive(isize),
Variable(Box<BigInt>),
} Currently builds fail on 32-bit platforms because when decoding constrained Integer types (at least on OER), Some logic could be implemented in codecs in order to attempt to identify that, and drop the leading bytes, or the easy solution is to decode is as Variable sized. However, this will result into different types when encoding the type and decoding it, and the comparing the results. Encoding part currently is aware about leading bytes (because it is necessary for fixed-size integers) and it does not convert to bigger type just because of the required leading bytes, and always uses the Should we try to optimize on codec level that they drop leading bytes so more values can be used with I think that checking those leading bytes is not worth the effort currently, as it mainly impacts 32 bit platforms, and I will change the equality just to compare numeric values. |
04e8c95
to
f72a7a0
Compare
I believe the other codecs do remove leading bytes first, if that's correct would probably be good to match.
Make an issue for it and it can be pursued and tested separately. It's not needed for the initial implementation. |
I believe this can be reviewed at this point. Everything integer related is now moved into Some initial bench results below based on this . UPER does not get as significant improvements because there are many other allocations. I was planning to remove as many allocations as possible next, starting from OER. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have one question but doesn't need to be addressed in this PR.
Thank you for your PR! And thank you for working on these integer improvements over the past few months!
isize, | ||
u8, | ||
u16, | ||
u32, | ||
u64, | ||
// TODO cannot support u128 as it is constrained type by default and current constraints uses i128 for bounds | ||
// u128, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement this with its signed variant being BigInt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify this bit more?
Casting u128 to signed bytes indeed currently changes the meaning. We could use BigInt there.
This particular line was more related to how constraints work currently, which uses i128
for bounds, in here:
Lines 270 to 296 in 0aa23e3
macro_rules! asn_integer_type { | |
($($int:ty),+ $(,)?) => { | |
$( | |
impl AsnType for $int { | |
const TAG: Tag = Tag::INTEGER; | |
const CONSTRAINTS: Constraints<'static> = Constraints::new(&[ | |
constraints::Constraint::Value(Extensible::new(constraints::Value::new(constraints::Bounded::const_new(<$int>::MIN as i128, <$int>::MAX as i128)))), | |
]); | |
} | |
)+ | |
} | |
} | |
asn_integer_type! { | |
i8, | |
i16, | |
i32, | |
i64, | |
i128, | |
isize, | |
u8, | |
u16, | |
u32, | |
u64, | |
u128, // TODO upper constraint truncated | |
usize, | |
} |
Maybe we could change the upper constraint bound to be u128
type. That makes checking the bounds more challenging as more type conversions are included but it should be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I tried to change the constraint behavior, and it does not seem to be easy.
E.g. if we add more types for
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum Bounded<S, E> {
#[default]
None,
Single(S),
Range {
start: Option<S>,
end: Option<E>,
},
}
We would need to re-adjust the constraint logic a lot, because many functions currently return single type, and expect that the type is same for Single or Range variants, for example.
Enum approach seemed to go more complex than it should with the
IntegerType
trait so I just tried wrapping struct with generic aswhich seems to work fine in the end. It also easily allows adding any new Integer type that implements the trait.
At least encoding speed for OER is more than doubled now for integers.
One concern is that there are many standards which are defined just with
Integer
type and now they internally default toi128
type. Is that a problem?