Feat: improve payload parsing #53

Totodore · 2023-07-03T15:49:31Z

This PR includes :

better Payload parsing by directly returning a parsed packet to avoid any string allocation
fixing malicious length input from the user in v3
applying max_payload size to the hyper::aggregate fn to avoid server crash with malicious input
supporting utf8 packet parsing in v3

Directly return a parsed packet from `Payload` iterator to avoid string allocations

…parsing

Totodore · 2023-07-04T16:24:01Z

So the solution I would like to implement consist of implementing the Stream trait for the Payload rather than Iterator and directly take the http_body::Body stream as input.

It would allow to buffer the input stream until a packet separator is found and then yield a new packet each time.

Thanks to that we could limit the input size directly before buffering anything.

biryukovmaxim · 2023-07-04T16:32:39Z

So the solution I would like to implement consist of implementing the Stream trait for the Payload rather than Iterator and directly take the http_body::Body stream as input.

It would allow to buffer the input stream until a packet separator is found and then yield a new packet each time.

Sounds really cool, like the solution

engineioxide/src/payload/mod.rs

Totodore · 2023-07-05T23:29:58Z

So, I started to implement the streams wrapper for the request body in polling request version.
For V4 it is 100% ok.

However for V3 it is hard to optimize and do the less utf8 decoding and buffering as possible because of the char count based encoding.

Some issues remains for V3 and can be found with the test_payload_stream_v3 test. It is a problem of a some bytes after the first packet being buffered because we check the packet_len in the chunk without taking in account the existing char length. However the existing char length decoded from the current packet buffer maybe bad because of some complex chars being splitted into different chunks and therefore having at the current iteration non utf8 data into the packet_buffer.

If anyone wan't to deep dive in this horror don't hesitate 😄.

Also everything deserve a refacto because it is currently really unreadable

Upgraded Packet Reader to correctly calculate the packet length when handling Unicode characters. This change was necessary because the previous design would incorrectly size Unicode characters as single bytes. Additional dependency `unicode-segmentation` was introduced to use Unicode segmentation algorithm for correct grapheme counting. Updated tests to verify the new implementation and made modifications in Cargo.toml and Cargo.lock accordingly. Featured 'v4' and 'v3' were updated to depend on 'unicode-segmentation.' This change improves system's ability to correctly handle Unicode.

In the Cargo.toml file, 'unicode-segmentation' has been removed from the v4 feature. Previously, both v3 and v4 features had 'unicode-segmentation', which was unnecessary duplication. Now, 'unicode-segmentation' is only included in the v3 feature, ensuring a cleaner and more efficient configuration.

biryukovmaxim · 2023-07-06T18:14:34Z

If anyone wan't to deep dive in this horror don't hesitate smile.
Sure, that's about me

I created pr fixing parsing
it still doesn't look good, but functionally it's okay

Modified byte reading code to consume all available data if no separator was identified. Also, updated the test data in 'test_payload_stream_v3' to adjust for the changed reading of bytes. This ensures the payload stream processes all data and the tests reflect the appropriate changes.

Ft improve payload parsing

This allow to test all kind of payload input forms (splitted between 1 and the total size of the payload)

…etween packet yielding the `end_of_stream` was reinitialized at each yielding, therefore the input stream was always polled whereas it was already emptied

If the stream was empty, but with remaining bufferered chunks and no separator in the current one, it was before considered as an exausted stream.

the unchecked string conversion was made on `data` and not `packet_buf` leading to out of bound exceptions.

…or graph counting

The payload size is now limited by the config given (by default 100KB). Also the body polling in extracted to a separated common fn for both protocol versions

Totodore added 4 commits July 2, 2023 20:02

feature: add CONTENT_LENGTH header to responses

33f000b

feat: improve payload parsing:

d048ebe

Directly return a parsed packet from `Payload` iterator to avoid string allocations

Merge remote-tracking branch 'origin/main' into feat-improve-payload-…

1c4f083

…parsing

feat: add utf8 check to v4

bfb6737

Totodore added enhancement New feature or request vulnerability This reference a vulnerability found on socketioxide or engineioxide labels Jul 3, 2023

Totodore marked this pull request as draft July 3, 2023 16:03

Totodore added 3 commits July 4, 2023 15:09

feat: adding utf8 char parsing in Payload v3

e5c395a

fix(clippy): unused BufReadCharsExt

ec32ccc

Merge branch 'main' into ft-improve-payload-parsing

c010eaa

Totodore added 2 commits July 5, 2023 23:05

Feat(engineioxide/payload): body stream implementation

80e924a

Feat(engineioxide/payload): body stream implementation for V3

8db3bd4

github-advanced-security bot found potential problems Jul 5, 2023

View reviewed changes

biryukovmaxim added 4 commits July 6, 2023 10:27

refactor logic returning stream based on features

739268d

add comments describing the flow of parsing packet

1a91d30

biryukovmaxim and others added 9 commits July 6, 2023 22:22

fix iterator_test

b492b54

fix wrong test

00e4222

Merge pull request #55 from biryukovmaxim/ft-improve-payload-parsing

37b5b66

Ft improve payload parsing

Merge branch 'main' into ft-improve-payload-parsing

07f6c3e

test(engineioxide/payload): split payload into different chunk sizes

73088d3

This allow to test all kind of payload input forms (splitted between 1 and the total size of the payload)

fix(engineioxide/payload): extract the end_of_stream var to keep it b…

37f1154

…etween packet yielding the `end_of_stream` was reinitialized at each yielding, therefore the input stream was always polled whereas it was already emptied

fix(engineioxide/payload):

528c3ae

If the stream was empty, but with remaining bufferered chunks and no separator in the current one, it was before considered as an exausted stream.

fix(engineioxide/payload):

fc2a859

the unchecked string conversion was made on `data` and not `packet_buf` leading to out of bound exceptions.

Totodore added 5 commits July 20, 2023 16:31

refactor(engineioxide/payload): removing dbg + improving some logic

5028ff0

fix(engineioxide/payload): switching from char indices to graphemes f…

e31a55f

…or graph counting

chore(deps): making v3 dependencies optional

3c66b62

refactor(engineioxide/payload): local import for featured deps

b4e917e

feat(engineioxide/payload): check the input payload size

d07f2aa

The payload size is now limited by the config given (by default 100KB). Also the body polling in extracted to a separated common fn for both protocol versions

Totodore marked this pull request as ready for review July 20, 2023 15:39

Merge branch 'main' into ft-improve-payload-parsing

2af84a2

Totodore merged commit 3638eb9 into main Jul 23, 2023
5 of 6 checks passed

Totodore deleted the ft-improve-payload-parsing branch July 23, 2023 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: improve payload parsing #53

Feat: improve payload parsing #53

Totodore commented Jul 3, 2023 •

edited

Loading

Totodore commented Jul 4, 2023 •

edited

Loading

biryukovmaxim commented Jul 4, 2023

Totodore commented Jul 5, 2023 •

edited

Loading

biryukovmaxim commented Jul 6, 2023

Feat: improve payload parsing #53

Feat: improve payload parsing #53

Conversation

Totodore commented Jul 3, 2023 • edited Loading

Totodore commented Jul 4, 2023 • edited Loading

biryukovmaxim commented Jul 4, 2023

Totodore commented Jul 5, 2023 • edited Loading

biryukovmaxim commented Jul 6, 2023

Totodore commented Jul 3, 2023 •

edited

Loading

Totodore commented Jul 4, 2023 •

edited

Loading

Totodore commented Jul 5, 2023 •

edited

Loading