Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bytes::Bytes instead of Vec<u8> to represent binary payloads #285

Merged
merged 7 commits into from
Apr 14, 2024

Conversation

kelnos
Copy link
Contributor

@kelnos kelnos commented Mar 17, 2024

As requested, moved to a separate PR :)

@kelnos kelnos changed the title Use bytes instead of vecu8 Use bytes::Bytes instead of Vec<u8> to represent binary payloads Mar 17, 2024
@kelnos kelnos marked this pull request as ready for review March 17, 2024 20:21
@kelnos kelnos force-pushed the use-bytes-instead-of-vecu8 branch from b7e5e24 to 71a7922 Compare March 17, 2024 20:50
Copy link
Owner

@Totodore Totodore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a bigger review tomorrow. I only have a question for the moment.
Did you consider to change the packet_buf in the v3_binary_encoder from Vec to Bytes and also did you think about the optimization we could get by using the Bytes struct (I think about fast cloning and also maybe no continuous array so that is it possible to have faster encoding/decoding for polling).
That's why I wanted to move to Bytes at the origin but I didn't check this deep enough for the moment. If you want, feel free to do it in this PR, if you prefer to work on other things I'll do this in another PR (if I find optimisation that could be done with Bytes).

engineioxide/src/transport/polling/payload/encoder.rs Outdated Show resolved Hide resolved
@kelnos
Copy link
Contributor Author

kelnos commented Mar 18, 2024

Did you consider to change the packet_buf in the v3_binary_encoder from Vec to Bytes

I did do that. Or are you mean the decoder, not the encoder? If so, I did look at that, but they both use std::io::BufRead.read_until(), which specifically takes a Vec<u8> to write into. Maybe there's a way around that with a larger change or refactor, but I wasn't ready to tackle looking into that yet.

@kelnos kelnos force-pushed the use-bytes-instead-of-vecu8 branch from 71a7922 to 4f95065 Compare March 18, 2024 01:01
@Totodore
Copy link
Owner

I added a little commit to refactor the v3_bin_packet_encoder fn when calculating the length of the packet.

For further optimisations, here we could switch from &str/String to Bytes and therefore avoid useless re-allocations because we can cheaply slice the Bytes struct:
https://github.com/Totodore/socketioxide/blob/39d700a3bef9e5d5b97a147bc6795c1dcb53c193/engineioxide/src/packet.rs#L159C1-L159C61
This will be the subject of other PRs though.

I only have one remaining question maybe your opinion will help me to decide. What do you think between having public interface requiring impl Into<Bytes> rather than Bytes directly to avoid API breakage and also make the use of Bytes transparent (the user don't have to explicitly specify Bytes::from_*).

@kelnos
Copy link
Contributor Author

kelnos commented Mar 18, 2024

What do you think between having public interface requiring impl Into<Bytes> rather than Bytes directly to avoid API breakage and also make the use of Bytes transparent (the user don't have to explicitly specify Bytes::from_*).

Yeah, that seems like a good idea to me. Are there any downsides to that? I guess larger code size for each monomorphized function if the user uses several different things that implement Into<Bytes>, but that's something the user can control themselves if it's a problem.

One thing that will be an API break is changing the argument type in EngineIoHandler::on_binary(), I don't think there's a way around that.

@Totodore
Copy link
Owner

One thing that will be an API break is changing the argument type in EngineIoHandler::on_binary(), I don't think there's a way around that.

It is mainly the socketioxide API which is important. It is not a problem to change the engineioxide API.

@Totodore
Copy link
Owner

Yeah, that seems like a good idea to me. Are there any downsides to that? I guess larger code size for each monomorphized function if the user uses several different things that implement Into, but that's something the user can control themselves if it's a problem.

@kelnos do you plan to implement this anytime soon ? Once it is done we can merge this. If you don't have enough free time for this I can work on it and merge the PR if your ok with this.

@kelnos
Copy link
Contributor Author

kelnos commented Apr 13, 2024

Ah, sorry, I completely forgot there was more to do here & was going to ping you about getting it merged. Sure, I can take care of that.

@kelnos
Copy link
Contributor Author

kelnos commented Apr 13, 2024

Hm, I'm unable to run the tests now; it looks like you changed how the test harness works, but when I:

export RUSTFLAGS="--cfg socketioxide-test"

And then run cargo test, I get:

error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `/home/brian/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/bin/rustc - --crate-name ___ --print=file-names --cfg test=socketioxide-test --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=split-debuginfo --print=crate-name --print=cfg` (exit status: 1)
  --- stderr
  error: invalid `--cfg` argument: `test=socketioxide-test` (expected `key` or `key="value"`, ensure escaping is appropriate for your shell, try 'key="value"' or key=\"value\")

Not really sure what's expected here.

Also in CONTRIBUTING you have a section that links to "common scripts" for running the test, but that link seems to not go anywhere now.

@kelnos
Copy link
Contributor Author

kelnos commented Apr 13, 2024

Ah, looks like you put socketioxide-test in the doc, but it's actually socketioxide_test -- underscore, not dash. Weird that the dash caused such an obscure error message though.

@kelnos kelnos force-pushed the use-bytes-instead-of-vecu8 branch from 615409b to 642a9b6 Compare April 13, 2024 23:57
@kelnos
Copy link
Contributor Author

kelnos commented Apr 13, 2024

I only have one remaining question maybe your opinion will help me to decide. What do you think between having public interface requiring impl Into<Bytes> rather than Bytes directly to avoid API breakage and also make the use of Bytes transparent (the user don't have to explicitly specify Bytes::from_*).

Ok started working on this after finishing the rebase and fixing conflicts. I'm a little concerned about extra overhead here, though.

For example:

impl ConfOperators {
    pub fn bin<B: Into<Bytes>>(mut self, binary: Vec<B>) -> Self {
        self.binary = binary.into_iter().map(Into::into).collect();
        self
    }
}

Even in the case where a Vec<Bytes> is passed, we have to re-wrap it in a new Vec, which is a bit of a waste. What do you think?

@kelnos kelnos force-pushed the use-bytes-instead-of-vecu8 branch 2 times, most recently from 1109c80 to eec1c47 Compare April 14, 2024 00:06
@Totodore
Copy link
Owner

Totodore commented Apr 14, 2024

Even in the case where a Vec is passed, we have to re-wrap it in a new Vec, which is a bit of a waste. What do you think?

The doc on the collect in vec specify that in certain case it will do the mutation "in place" without allocating for a new Vec. Moreover if the user directly specify a Bytes struct, the Into will have no effect and therefore there won't be any modification (I think?).

Maybe we could try to benchmark this but I don't think it is a limiting factor.

According to the implementation, our use case matches the optimization:

@kelnos
Copy link
Contributor Author

kelnos commented Apr 14, 2024

Ah, neat. And while that's only an implementation detail, that seems like something that's unlikely to change.

@kelnos
Copy link
Contributor Author

kelnos commented Apr 14, 2024

Also I just realized I force-pushed without first pulling your additional commit that refactored the v3_packet_encoder. Hopefully you still have that locally and can push it again?

@kelnos kelnos force-pushed the use-bytes-instead-of-vecu8 branch from eec1c47 to adf5f66 Compare April 14, 2024 05:58
@kelnos
Copy link
Contributor Author

kelnos commented Apr 14, 2024

At any rate, I think I made the changes you asked for.

@Totodore
Copy link
Owner

Totodore commented Apr 14, 2024

At any rate, I think I made the changes you asked for.

Thanks, I'll review this this week! :)

@Totodore Totodore enabled auto-merge (squash) April 14, 2024 22:29
@Totodore Totodore merged commit 2aa1a3b into Totodore:main Apr 14, 2024
13 checks passed
@Totodore
Copy link
Owner

Totodore commented Apr 14, 2024

I added some other changes:

  • I removed some generic Into<Bytes> from the less used/specialized API
  • I changed the bound from Vec<Into<Bytes>> to IntoIterator<Into<Bytes>>. The underlying code doesn't change but it will be more flexible for the user :).

Sorry I should have been more clear on what API bits should be flexible and what can stay fixed.

@kelnos kelnos deleted the use-bytes-instead-of-vecu8 branch April 14, 2024 23:18
@kelnos
Copy link
Contributor Author

kelnos commented Apr 14, 2024

No worries and thank you for taking care of it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants