Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix packet loss in heavy traffic #151

Closed
wants to merge 1 commit into from
Closed

Conversation

rockuw
Copy link

@rockuw rockuw commented Apr 16, 2015

In a testing of ~15MB/s net traffic, gor report "malformed http request" frequently(output http mode).
It turns out that it misses some packet of a relative large HTTP POST request. Examining the recv-Q by netstat -an --raw tells me that there is packet loss.

This pull request fixes two issues:

  1. Use goroutine in the raw socket listener to speedup receiving packets
  2. A uinque TCPMessage id should be source-addr + source-port + ack

@buger
Copy link
Owner

buger commented Apr 17, 2015

Thank you! So after you applied this patch, how it affected packet loss?

@rockuw
Copy link
Author

rockuw commented Apr 18, 2015

The failed POST request(due to packet loss. the server has integrity check) drops from ~50% to ~5%. And the recv-Q rarely sees pileup.

But the 5% still should not happen, for 15MB/s does not count as heavy traffic. I don't know if anything can be done to help.

@buger
Copy link
Owner

buger commented Apr 18, 2015

@rockuw try to run gor with GOMAXPROCX = 2*num_of_cores for better cpu utilization

@rockuw
Copy link
Author

rockuw commented Apr 18, 2015

@buger Tried that. It doesn't help. With PROC=20, the overall cpu utilization of gor is about 50%. There must be some performance problems.

@buger
Copy link
Owner

buger commented Apr 18, 2015

@rockuw i made typo in previous command, env name should be GOMAXPROCS not GOMAXPROCX

@buger
Copy link
Owner

buger commented Apr 18, 2015

I think 5% loss happens due to the way i capture traffic via raw sockets. It does not guarantee delivery, you process as fast you can, and rest will be disregarded.

@rockuw
Copy link
Author

rockuw commented Apr 18, 2015

@buger

  1. The processing of packet doesn't block the receiving. Is the raw socket lib not efficient enough?
  2. Is there another way other than raw socket for capturing packets?

@buger
Copy link
Owner

buger commented Apr 18, 2015

@rockuw

  1. Launching goroutine still not for free, and Go have garbage collector, so for sure it can affect traffic collection.
  2. There is alternative to raw sockets called libpcap, it is used by tcpdump. I tried it about year or more ago, and existing binding was very buggy. Maybe now situation changed, and theoretically it can give better performance, but it may require configuring linux kernel as well...

http://stackoverflow.com/questions/21200009/linux-pcap-vs-raw-socket-for-capture-performance
http://stackoverflow.com/questions/7856509/does-libpcap-use-raw-sockets-underneath-them

@buger
Copy link
Owner

buger commented Apr 18, 2015

Also there is http://www.ntop.org/products/pf_ring/, but i did not have experience with it.

@rockuw
Copy link
Author

rockuw commented Apr 18, 2015

@buger Glad to know. Thanks very much.

@buger
Copy link
Owner

buger commented Apr 29, 2015

@rockuw i think you can try one more thing, and increase connection buffer size. It should be set in this function https://github.com/buger/gor/blob/master/raw_socket_listener/listener.go#L63

     # Number set in bytes, play with this value, give at least 4MB
      if err := tcpConn.SetReadBuffer(4096); err != nil {
          panic(err)
      }

Also ensure that you not limited by OS, check this values, if rmem_max less then you set above, increase it as well.

/proc/sys/net/core/rmem_default
/proc/sys/net/core/rmem_max

@buger
Copy link
Owner

buger commented Apr 29, 2015

I also found this lib https://github.com/google/gopacket, which support libpcap, make sense to evaluate it as well.

@rockuw
Copy link
Author

rockuw commented Apr 29, 2015

@buger The returned conn by "conn, e := net.ListenPacket("ip4:tcp", t.addr)" is a PacketConn, which does not support SetReadBuffer

@buger
Copy link
Owner

buger commented Apr 29, 2015

@rockuw then try to increase /proc/sys/net/core/rmem_default

@rytutis
Copy link

rytutis commented May 17, 2015

This change:

  •        t.parsePacket(addr, buf[:n])
    
  •        go t.parsePacket(addr, buf[:n])
    

It will be overwriting buf while paserPacket is running, I think making a copy is needed here?

rockuw> how are you running your test? I'd like to try it out as well.

@rockuw
Copy link
Author

rockuw commented Jun 6, 2015

@rytutis Sorry, I cannot reproduce my testing results myself now.
But it occurs to me that the mechanism of gor that acting as the http client of the target server may not suite for heavy traffics. For the source server, the traffic is generated by hundreds of clients. To replay the traffic to the target server, gor must do the work equally heavy to that hundreds clients. This may be impossible for a real production server.

Correct me if wrong @buger

I turned to another solution using tcpcopy: https://github.com/session-replay-tools/tcpcopy
It uses raw socket to send replayed traffic and doesn't suffer from the problem described above.

@buger
Copy link
Owner

buger commented Jul 11, 2015

Some of your changes, with some improvements were merged to #170. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants