-
Notifications
You must be signed in to change notification settings - Fork 308
Something is really broken in our http serving #1617
Comments
I see essentially the same results. The maximum transfer speed I'm seeing is about 6 Kbytes/sec, I get about that for any thread count (ab -c#) from 2 up through 20. With just a single ab thread count, I get 5 Kbytes/sec. |
The best thing we can do right now is consolidate css and js files and put all static assets on CDN. That would bring the request count per page to 1. That alone would be eightfold increase for users with a cold cache (a storm of new visitors we are expecting on Friday after Chads interview is out). |
I guess what I would really like is to get performance like this (it is a fairly high profile site in CZ, like czech google):
but even something like this would be ok :).
|
This issue appears to be fundamental to Aspen. Running gittip locally (which is difficult far too often, see #1619) I get worse throughput with more requesting threads.
The CPU on my 4-core laptop never went anywhere near to 100%. My theory is that Aspen is, in effect, single threaded. This behavior is much like Rails not too long ago. Heroku has made a lot money because they allow one-request-at-a-time Rails apps to scale up easily with many dynos. I assume the quickest (and not cheap) fix is to add more Gittip Aspen servers. This requires some code fixes, to move user session handling (for Twitter and Bitbucket) out of memory. |
What is your value of MIN_THREADS in local.env? In production we are now running with 40. Aspen is using under the hood the http server from cherrypy (cheroot). I've never checked it any closer so I don't know anything about it. When I get home I was planning to try some simple wsgi hello world app under different wsgi containers/servers (like gunicorn or uwsgi) locally to see what difference does the server make. |
I hadn't even thought about MIN_THREADS. Looking now, it's set to the default of MIN_THREADS=10. |
I've always been saying aspen is a little slow, I wanted to do some tests like @zwn will be doing but I never found the time. Good luck, let us know what happens! |
Testing a single static page on a vanilla Aspen server gives much better performance. It is fast enough that I'm not sure I can tell how well it is processes requests in parallel. (These tests were run on my 4-core i5 laptop.)
|
@bruceadams was your test using postgression for your local Gittip? |
@clone1018 yes, I used Postgression. I don't expect fetching a static page to exercise the database. I could capture database network traffic to be sure. |
Since we serve the static page via aspen as a webserver, is there a chance we're opening a database connection? |
We should also look at the effect of logging on performance. I have a tingly sense that lots of stdout chatter could slow us down. |
Logging has the potential to have a huge impact on performance. The best logging frameworks jump through all kinds of hoops to avoid synchronization, excessive object creation and other stuff. |
If we are it'd be in a hook, and the places to check would be in |
@bruceadams I am not getting the same results as you.
But the pattern is clear. The more threads the worse. I have also 4 core i5 laptop. I am able to get to 50% utilization in total. |
@zwn interesting. Do you have a local database? If yes, your faster results suggest the database is being hit when serving this static asset. (I haven't had a chance to check.) (Also, I could not use "localhost" because my machine treats "localhost" as ip6 and something misbehaved.) |
@bruceadams I have a local db. I am not sure what work we do with db per request (I have looked but we need some instrumentation to be sure). |
Running
That is a different ballpark. |
It is possible to mount Aspen as a WSGI app on gunicorn: |
Watching network traffic to the database, yes there is a round trip to the database for every fetch of assets/reset.css I'll see if I can figure out where that is in the code (by stumbling around in the dark). |
@bruceadams It's probably in a hook somewhere. Actually, it's probably related to authentication. |
I'm in the process of setting up a load testing environment locally. I have the FreeBSD machine I set up while my laptop was out of commission. I'm installing ab there to hit Gittip on my laptop over the LAN. |
For kicks, I stood up an instance of Gittip at Digital Ocean (smallest droplet, NY1). It's reading from the production database and is configured with In general, I'm seeing ~115 r/s for Two possible ways to account for the discrepancy are SSL and a baseline load of ~2 req/sec, which are present on the Heroku dyno but not the DO droplet. |
Testing locally I am seeing ~177 r/s for |
I ran two This result seems to indicate against the "baseline load" explanation for the slowness of reset.css at Heroku. |
Using Gunicorn locally, I am seeing similar results as with Cheroot: ~170 r/s for Edit: I may very well not have been running Gunicorn with multiple processes, cf. #1617 (comment). |
Which, I suppose, is all to say that, yes, something is really broken in our http serving. 💔 |
BTW, here are the steps I followed to run Gittip with Gunicorn on port 8537:
|
I can see https://help.heroku.com/tickets/101694 (and don't have time to pay attention to this right now). |
(Thanks for the confirmation, @bruceadams.) |
Actually, the test we're concerned with here is this one (from your test run):
That tests loading a static page, not the homepage content. The expected result is to see an order of magnitude more requests per second here. |
Sure. Does this 'static' file get routed through the app process maybe? This is probably where @kennethreitz can help more. |
The original issue as I understand it is:
|
Yup. We've been using that as a benchmark to understand the performance characteristics of our app within Heroku. Our reasoning is that serving a static file out of our app server represents an upper bound on the performance we can expect. If we can only get ~8 req/sec for a static file from our app server at Heroku, then we're not going to see more than that for dynamic pages. For comparison, we are seeing performance from 64 to 177 req/sec for the same
If we vary the WSGI server we're using to gunicorn, we get:
|
I just remembered that we have a QA instance of the app running at https://gittip.whit537.org/. I'm going to dust it off and see what we get there. |
Okay!
|
Sorry if this isn't strictly germane to this ticket, but has anyone run gittip.com through http://webpagetest.org? A couple of things I noted:
Hopefully webpagetest can help identify and fix some performance issues. |
New results using |
|
We ran a test where @zwn loaded up gittip.whit537.org with this command:
He achieved ~6 req/sec over the 60 seconds, which is higher than our average load in production. I then ran |
On a tip from @clone1018 (IRC), I just did a |
|
|
Let's see if we can land this. |
Whoa! Now I'm seeing ~200 req/sec from production:
|
Comparable performance from QA:
|
What changed? |
@kennethreitz You tell us? :-) Very little in the way of code, just some javascript cleanup (irrelevant to |
Per @kennethreitz in IRC, "noisy neighbors" (or other AWS weather factors?) could explain the difference between QA and production we were seeing at #1617 (comment). This explanation is consistent with the behavior at #1617 (comment), where performance in production increased after a restart, as well as the better performance (and consistent between QA and production) we're seeing today. |
So the worst of the problem is explained by using @zwn We good to close? |
Closing per IRC. |
💃 |
!m * |
Something is IMO horribly broken in gittip http serving. The file 'reset.css' is under 1kB in size. When testing the site with 'ab' it is easy to see that no matter the number of threads, with just 20 parallel connections we cannot get more than 5 requests per second. That is like 4kB/s transfer speed for a plain small file fully cached in memory. BTW: that transfer speed, that is just twice the size of our logs.
Just for the perspective - I believe each browser uses up to 6 connection for a single domain. The homepage does 8 requests to the main site. When someone on twitter with a 100 followers sends out a tweet where 1 out of 10 clicks the link, we get 10 users, 60 parallel connections and 80 requests.... and slow down to a crawl :(
And I am not even mentioning that hitting the homepage, the most db optimized page on gittip, we get not 5 but 1 request per second. So, if the site does not die right away (that is the better case), we go to 10s to 15s per request.
The text was updated successfully, but these errors were encountered: