-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: improve payload parsing #53
Conversation
Directly return a parsed packet from `Payload` iterator to avoid string allocations
So the solution I would like to implement consist of implementing the It would allow to buffer the input stream until a packet separator is found and then yield a new packet each time. Thanks to that we could limit the input size directly before buffering anything. |
Sounds really cool, like the solution |
So, I started to implement the streams wrapper for the request body in polling request version. However for V3 it is hard to optimize and do the less utf8 decoding and buffering as possible because of the char count based encoding. Some issues remains for V3 and can be found with the If anyone wan't to deep dive in this horror don't hesitate 😄. Also everything deserve a refacto because it is currently really unreadable |
Upgraded Packet Reader to correctly calculate the packet length when handling Unicode characters. This change was necessary because the previous design would incorrectly size Unicode characters as single bytes. Additional dependency `unicode-segmentation` was introduced to use Unicode segmentation algorithm for correct grapheme counting. Updated tests to verify the new implementation and made modifications in Cargo.toml and Cargo.lock accordingly. Featured 'v4' and 'v3' were updated to depend on 'unicode-segmentation.' This change improves system's ability to correctly handle Unicode.
In the Cargo.toml file, 'unicode-segmentation' has been removed from the v4 feature. Previously, both v3 and v4 features had 'unicode-segmentation', which was unnecessary duplication. Now, 'unicode-segmentation' is only included in the v3 feature, ensuring a cleaner and more efficient configuration.
I created pr fixing parsing |
Modified byte reading code to consume all available data if no separator was identified. Also, updated the test data in 'test_payload_stream_v3' to adjust for the changed reading of bytes. This ensures the payload stream processes all data and the tests reflect the appropriate changes.
Ft improve payload parsing
This allow to test all kind of payload input forms (splitted between 1 and the total size of the payload)
…etween packet yielding the `end_of_stream` was reinitialized at each yielding, therefore the input stream was always polled whereas it was already emptied
If the stream was empty, but with remaining bufferered chunks and no separator in the current one, it was before considered as an exausted stream.
the unchecked string conversion was made on `data` and not `packet_buf` leading to out of bound exceptions.
…or graph counting
The payload size is now limited by the config given (by default 100KB). Also the body polling in extracted to a separated common fn for both protocol versions
This PR includes :
Payload
parsing by directly returning a parsed packet to avoid any string allocationhyper::aggregate
fn to avoid server crash with malicious input