Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: improve payload parsing #53

Merged
merged 28 commits into from
Jul 23, 2023
Merged

Feat: improve payload parsing #53

merged 28 commits into from
Jul 23, 2023

Conversation

Totodore
Copy link
Owner

@Totodore Totodore commented Jul 3, 2023

This PR includes :

  • better Payload parsing by directly returning a parsed packet to avoid any string allocation
  • fixing malicious length input from the user in v3
  • applying max_payload size to the hyper::aggregate fn to avoid server crash with malicious input
  • supporting utf8 packet parsing in v3

@Totodore Totodore added enhancement New feature or request vulnerability This reference a vulnerability found on socketioxide or engineioxide labels Jul 3, 2023
@Totodore Totodore marked this pull request as draft July 3, 2023 16:03
@Totodore
Copy link
Owner Author

Totodore commented Jul 4, 2023

So the solution I would like to implement consist of implementing the Stream trait for the Payload rather than Iterator and directly take the http_body::Body stream as input.

It would allow to buffer the input stream until a packet separator is found and then yield a new packet each time.

Thanks to that we could limit the input size directly before buffering anything.

@biryukovmaxim
Copy link
Contributor

So the solution I would like to implement consist of implementing the Stream trait for the Payload rather than Iterator and directly take the http_body::Body stream as input.

It would allow to buffer the input stream until a packet separator is found and then yield a new packet each time.

Sounds really cool, like the solution

engineioxide/src/payload/mod.rs Fixed Show fixed Hide fixed
engineioxide/src/payload/mod.rs Fixed Show fixed Hide fixed
engineioxide/src/payload/mod.rs Fixed Show fixed Hide fixed
engineioxide/src/payload/mod.rs Fixed Show fixed Hide fixed
engineioxide/src/payload/mod.rs Fixed Show fixed Hide fixed
engineioxide/src/payload/mod.rs Fixed Show fixed Hide fixed
engineioxide/src/payload/mod.rs Fixed Show fixed Hide fixed
@Totodore
Copy link
Owner Author

Totodore commented Jul 5, 2023

So, I started to implement the streams wrapper for the request body in polling request version.
For V4 it is 100% ok.

However for V3 it is hard to optimize and do the less utf8 decoding and buffering as possible because of the char count based encoding.

Some issues remains for V3 and can be found with the test_payload_stream_v3 test. It is a problem of a some bytes after the first packet being buffered because we check the packet_len in the chunk without taking in account the existing char length. However the existing char length decoded from the current packet buffer maybe bad because of some complex chars being splitted into different chunks and therefore having at the current iteration non utf8 data into the packet_buffer.

If anyone wan't to deep dive in this horror don't hesitate 😄.

Also everything deserve a refacto because it is currently really unreadable

Upgraded Packet Reader to correctly calculate the packet length when handling Unicode characters. This change was necessary because the previous design would incorrectly size Unicode characters as single bytes. Additional dependency `unicode-segmentation` was introduced to use Unicode segmentation algorithm for correct grapheme counting.

Updated tests to verify the new implementation and made modifications in Cargo.toml and Cargo.lock accordingly. Featured 'v4' and 'v3' were updated to depend on 'unicode-segmentation.'

This change improves system's ability to correctly handle Unicode.
In the Cargo.toml file, 'unicode-segmentation' has been removed from the v4 feature. Previously, both v3 and v4 features had 'unicode-segmentation', which was unnecessary duplication. Now, 'unicode-segmentation' is only included in the v3 feature, ensuring a cleaner and more efficient configuration.
@biryukovmaxim
Copy link
Contributor

If anyone wan't to deep dive in this horror don't hesitate smile.
Sure, that's about me

I created pr fixing parsing
it still doesn't look good, but functionally it's okay

biryukovmaxim and others added 9 commits July 6, 2023 22:22
Modified byte reading code to consume all available data if no separator was identified. Also, updated the test data in 'test_payload_stream_v3' to adjust for the changed reading of bytes. This ensures the payload stream processes all data and the tests reflect the appropriate changes.
This allow to test all kind of payload input forms (splitted between 1 and the total size of the payload)
…etween packet yielding

the `end_of_stream` was reinitialized at each yielding, therefore the input stream was always polled whereas it was already emptied
If the stream was empty, but with remaining bufferered chunks and no separator in the current one, it was before considered as an exausted stream.
the unchecked string conversion was made on `data` and not `packet_buf` leading to out of bound exceptions.
@Totodore Totodore marked this pull request as ready for review July 20, 2023 15:39
@Totodore Totodore merged commit 3638eb9 into main Jul 23, 2023
5 of 6 checks passed
@Totodore Totodore deleted the ft-improve-payload-parsing branch July 23, 2023 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vulnerability This reference a vulnerability found on socketioxide or engineioxide
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants