-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latency benchmarking cohttp shows queueing #328
Comments
I haven't had a chance to look in detail yet as I'm travelling this week, but I've provisioned some servers to try this out when back home. This should be good to use with our ongoing multicore work as well. |
It would be really useful to have a Cohttp_lwt version of this test, since then we can run it against MirageOS in Xen mode as well. How about an Lwt one that just serves the file from memory, to avoid touching the disk? |
Hi, I just translated it as closely as possible to Lwt, here's the code: NOTE: You need to have libev installed. It's probably in your package manager, or get it here. I added a line to force Lwt to use it, otherwise it'll use select() and the performance will be as horrible as Async with select(). Then run (* This file is in the public domain *)
open Core.Std
open Lwt
open Cohttp_lwt_unix
(* given filename: hello_world.ml compile with:
$ corebuild -package lwt,cohttp.lwt hello_world.native
*)
let handler _ req _ =
let uri = Cohttp.Request.uri req in
match Uri.path uri with
| "/" -> Server.respond_string ~status:`OK ~body:"CHAPTER I. Down the Rabbit-Hole Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, <and what is the use of a book,> thought Alice <without pictures or conversations?> So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, <Oh dear! Oh dear! I shall be late!> (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again. The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well. Either the well was very deep, or she fell very slowly, for she had plenty of time as she went down to look about her and to wonder what was going to happen next. First, she tried to look down and make out what she was coming to, but it was too dark to see anything; then she looked at the sides of the well, and noticed that they were filled with cupboards......" ()
| _ -> Server.respond_string ~status:`Not_found ~body:"Route not found" ()
let start_server port () =
eprintf "Listening for HTTP on port %d\n" port;
eprintf "Try 'curl http://localhost:%d/'\n%!" port;
Server.create
~ctx:(Cohttp_lwt_unix_net.init ())
~mode:(`TCP (`Port port))
(Server.make ~callback:handler ())
let () = Lwt_engine.set ~transfer:true ~destroy:true (new Lwt_engine.libev)
let () =
Command.basic
~summary:"Start a hello world Lwt server"
Command.Spec.(empty +>
flag "-p" (optional_with_default 8080 int)
~doc:"int Source port to listen on"
) (fun port () -> start_server port () |> Lwt_unix.run)
|> Command.run |
Awesome! Here is the run with cohttp-lwt:
It has better request rate, but more timeout'ed connections. Here is the latency diagram for the same thing: It is the same problem: We sit at 100% CPU load and can't keep up with the request rate. Comparisons to Go and Haskell/Wai isn't "fair" in the sense that they use multiple cores, and the test machine has 8 of them. Go hovers around 200% CPU load for instance, so it spreads the work over multiple cores. But the The graph scream queueing all over it. And the queue just grows and grows over the course of the test. Even the median is fairly bad at 4 seconds. Here is the Nim-run for comparison:
|
@jlouis Thank you very much for doing this. This is hugely appreciated. |
Having thought somewhat on these numbers, there are a couple of things which are peculiar. We have a test which runs for one minute. Some of the latencies are close to that limit, being around 55+ seconds. This means there are connections in there which are stalled for the entirety of the test case. Had the system been a queue, in strict FIFO order, this wouldn't happen. But the same latency chart shows up if you have stalling behavior or processing in LIFO stack order. This suggests there are some connections which are not even being visited in time by the system before more work is piled on top. A typical simple functional solution would be to take the new work and then run
which would exhibit this problem. Though there are things happened in the upper percentiles which suggest things are somewhat off: "Puma" is a Ruby framework, which also exhibit queuing and not being able to keep up. But it's latency curve is far "smoother" than what we see in CoHTTP/OCaml. But note that the slowest response time for Puma is around 15 seconds, not close to the 60 second mark. |
I'm just wondering if there are any updates or plans? |
I'm planning to do a Cohttp bug sweep later this week as part of the next Mirage release. Apologies for the delay...it's been a busy term time here in Cambridge :) |
This has to be a pathology somewhere. The differences to other frameworks is simply too high for it to not matter. However, I'd suggest starting out by establishing if you can reproduce the above test case. It's been a while. If you do, note Will Glozer made wrk 4.0.1 in the meantime and Gil has been pulling changes to wrk2 from that. So it is worth looking into if they tests are any different before embarking upon fixing the errors here. |
This issue still seems to exist. Has any progress been made / is there anything I can try to do to help? |
@seliopou has a parser combinator generator that addresses much of the GC latency. As soon as that's released, we should be able to port the parser to using it. |
Just a heads up that while testing Angstrom's HTTP request parsing I got a 75% throughput increase (~20MB/s to ~35MB/s) for Lwt by increasing the input channel buffer size to 64K. The default buffer size is 4K, and I don't think it grows without you asking. Async's This is on a FreeBSD VM without any kernel tweaking so YMMV. Also, Async is using |
You may want to add the following line in the Lwt test server before it starts accepting requests: Lwt_io.set_default_buffer_size 0x10000; |
If these libraries are using |
@jlouis Can you point us to the implementations of the other web servers you used to do this benchmark? In particular, I'd like to the the nim implementation, as that's the most apples-to-apples comparison. |
Of course, see https://gist.github.com/jlouis/e3db53339bf4c404d6197a3b541c3c93 There are two modes for the GC: mark and sweep and reference counting, the latter being the default about a year ago when I did the benchmark. I also have the wai and go solutions hiding somewhere, though knowing the Haskell world, wai might have diverged by now. |
@jlouis one more request for you. Could you please provide the raw data you used to plot the histograms for all the non-ocaml implementations? I'd like to do some comparisons to some of my performance improvements. |
Yes, I'll dig them up somewhere. I think I still have them, but we should
redo the project. I expect things have improved all over the place.
…On Fri, Dec 9, 2016, 17:59 Spiros Eliopoulos ***@***.***> wrote:
@jlouis <https://github.com/jlouis> one more request for you. Could you
please provide the raw data you used to plot the histograms for all the
non-ocaml implementations? I'd like to do some comparisons to some of my
performance improvements.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#328 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAWH9JkBEo7wvrG72Ofrmp4KLMaMdcyks5rGYjVgaJpZM4EEVf0>
.
|
Managed to find them, they should be attached here. The naming could be better as some of them mention a given language, where others mention a web server in that language. You will have to search a bit around to figure out what they are in every case. out.cohttp-async.txt |
Perhaps people who have subscribed to this thread are interested in this PR: #819 |
And this comment: #821 (comment) |
Hi,
So I'm in the process of doing latency-benchmarks for a couple of different web server frameworks. One of the frameworks is cohttp. I have the following code for the test:
and it is built with:
There are 2 servers, one load-generating, and one running the cohttp server, over a 1gigabit line (iperf shows 953mbit throughput), with no latency whatsoever (There is a switch between, but that is all).
The servers are linux:
Linux lady-of-pain 3.19.3-3-ARCH #1 SMP PREEMPT Wed Apr 8 14:10:00 CEST 2015 x86_64 GNU/Linux
fairly recent Core i7's, no virtualization since it just slows you down.
Some sysctl.conf tuning were necessary:
Async is the recent point-release and I'm forcing Epoll support because the test just fails when it runs in a select(2) loop.
The load generator is wrk2 (github.com/giltene/wrk2) which avoids coordinated omission. Most load generators coordinate: if a connection stalls towards the system-under-test (SUT), then no further requests are issued on that connection until the first one completes. You get one "bad" latency number. In wrk2, the rate is kept stable: at an interval, new requests are planned to be issued on the stalling connection, so one can get more realistic latencies from stalled connections.
We run the following:
That is, 10k connections, and a rate of 30k req/s which means 3 req/s per connection. Such a run yields the following latency histogram:
There are a few timeout errors, but note we can't really keep the request rate at what we want, as it is at 12k req/s. The single core on the SUT maxes out, and queuing buildup happens. I've attached the histogram plot (Note the x axis, which is compressed quite a lot so you can see the high latencies) and also included
nim
which is another single-threaded solution making for a fair comparison. Thego
solution uses all 8 cores, so it has considerably less work to do per core, i.e., that comparison isn't really fair. Thewai
solution is a Haskell framework, but it also utilizes all 8 cores.I have yet to try an Lwt solution with cohttp. It may perform totally different, but I'm not sure from where to cut it. I thought about starting off of the file server examples by Hannes, but I'm not really sure this is the way to go. Also, tuning of the OCaml system is something I'm very interested in.
The text was updated successfully, but these errors were encountered: