Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Performance issue in multi-thread runtime #5

Open
zonyitoo opened this issue Jan 18, 2022 · 9 comments
Open

[Question] Performance issue in multi-thread runtime #5

zonyitoo opened this issue Jan 18, 2022 · 9 comments
Labels
question Further information is requested

Comments

@zonyitoo
Copy link

zonyitoo commented Jan 18, 2022

I couldn't find another use cases in Github with tokio except yours.

I am working on a Project with also using smoltcp as an user space network stack and provide wrappers for interpolate with existed Tokio IO structs (TcpStream, ...), but I found some problems:

  1. Since Interface::poll requires to call frequently in a very short interval, it have to be put in a separated task (the Reactor in this Project).
  2. In the latest version of smoltcp, SocketSet is now managed by Interface, so if you want to call Interface::poll, and also AsyncRead::poll_read and AsyncRead::poll_write on TcpStream (wrapper of TcpSocket), you will have to take a lock on the Interface. (which is the same in this Project, the SocketAllocator).

I don't know if you have any benchmark data about the current design of tokio-smoltcp. I made a simple test with iperf3, which showed that the interface is relatively slower than system's network stack.

Here I open an issue and hoping you can share any informations about this with me.

@spacemeowx2
Copy link
Owner

Hi, thank you for interested in this project.

Unfortunately I didn't do the benchmarking. And I think it's not surprise to me to be slower than the system's network stack.

  • In Interface::poll, it loops to find the socket instead of a HashMap/BTreeMap search. smoltcp is not designed for performance, but for embedded platforms.
  • We are in user space, every packet will pass to kernel space. So we didn't take shortcuts.

But I believe that the practice in tokio-smoltcp is not the best, there is still room for improvement.

@zonyitoo
Copy link
Author

zonyitoo commented Jan 19, 2022

There are some projects out there:

  • embassy-net, which ​simply forbids multi-thread usage, so everything runs in a single thread (basic scheduler).

  • onetun, which make a global Bus (a broadcast channel) for dispatching events, like packets (send/recv), socket creation and destruction.

onetun's method may help to decoupling TcpStream and the poller (SocketAllocator), so every read and write won't require the global lock. I just did an experiment but performance drops even more (CPU usage goes higher).

@spacemeowx2
Copy link
Owner

spacemeowx2 commented Jan 19, 2022

When I tried to upgrade to smoltcp 0.8 I found that the interface changes should make the lock bigger. Therefore the efficiency may be reduced.

@zonyitoo
Copy link
Author

It shouldn't. You will have to take the lock of SocketSet when calling read, write and poll anyway, so there should be no difference.

@spacemeowx2
Copy link
Owner

Previous version I can send packets(

interf
.device_mut()
.send_queue()
.await
.expect("Failed to send queue");
) and get data in poll_read/poll_write at the same time, but now they are behind the same lock.

@zonyitoo
Copy link
Author

I was using an alternative way: https://github.com/shadowsocks/shadowsocks-rust/blob/4d30371bdf7b9c6eec2ab54ce4a042cfc82eac17/crates/shadowsocks-service/src/local/tun/tcp.rs#L225-L353

Every TcpConnection will have 2 RingBuffers and the main poll will copy all the data from/to every buffers in every single sockets. So we can avoid a huge lock for the whole system.

@spacemeowx2
Copy link
Owner

Looks reasonable. It is a space-for-time approach.

@zonyitoo
Copy link
Author

zonyitoo commented Jan 20, 2022

But still, the CPU usage is very high. I don't think this is the ultimate approach.

@zonyitoo zonyitoo reopened this Jan 20, 2022
@spacemeowx2
Copy link
Owner

Just published a performance fix.

Previously, after the TCP buffer was consumed, Reactor was not notified to send a packet to notify the other side, so the other side would assume that smoltcp's tcp buffer had not been consumed.

It has been fixed in 35245cc

@spacemeowx2 spacemeowx2 added the question Further information is requested label Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants