Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gunicorn instrumentation via statsD and a new logger class #748

Merged
merged 23 commits into from
Jun 28, 2014
Merged

gunicorn instrumentation via statsD and a new logger class #748

merged 23 commits into from
Jun 28, 2014

Conversation

alq666
Copy link
Contributor

@alq666 alq666 commented May 14, 2014

(This is another attempt at getting statsD instrumentation to gunicorn initially in #672)

What is this pull request?

statsd is the de facto low-overhead instrumentation of many web stacks. I have come across a number of different ways to instrument gunicorn, none of which are part of the core and rely on external scripts being run out-of-band to guess the number of requests processed or the amount of cpu spent.

I am hereby proposing that gunicorn optionally submits metrics to a local (or remote) statsD instance with the use of --logger-class gunicorn.instrument.statsd.Statsd option. Being that statsD uses UDP as transport, the receiving end being up or down does not affect the performance of the gunicorn application (unlike other stat collection schemes).

Here is an example of such metric being graphed.

metric_explorer___datadog-2

How does this work?

The arbiter and the base worker use a new logger that can take additional arguments to turn a log message into a metric. In addition certain known logging levels are automatically turned into metrics.

In particular the access method is overriden to capture duration and count of each gunicorn request.

The new logger class inherits from gunicorn.glogging.Logger so existing logging should be preserved. No external dependencies were created.

Who is behind this?

I use gunicorn at Datadog and have found statsD an indispensable aid to measure the performance of high-traffic applications. There is no company-specific bits in the pull request of course.

@tilgovi
Copy link
Collaborator

tilgovi commented May 14, 2014

This looks like a pretty beautiful solution. I had forgotten how easy it is to send data to statsd.

I wish we had this at my last company!

@alq666
Copy link
Contributor Author

alq666 commented May 14, 2014

Thanks @tilgovi

@tilgovi
Copy link
Collaborator

tilgovi commented May 18, 2014

A few questions:

  • Can you imagine any value in having the extra keyword arguments available for log formatting?
  • If yes, should we use the extras keyword argument that log calls take anyway?
  • Should the worker count log in the arbiter be debug level? It's expected not to change except through administrator action and manage_workers is called every timeout cycle. That seems like unnecessary log noise (though I understand info level is pretty high, so maybe it's okay).
  • Should the logger check the log level or should we assume that anyone using statsD wants the metrics collected no matter the log level?
  • What do you think about logging resp.status (the status code)?

@alq666
Copy link
Contributor Author

alq666 commented May 19, 2014

@tilgovi

  • I did not know about extras, I can indeed stuff my keywords into extras to save a call to a function call for each log entry.
  • With respect to logging level and instrumentation you are correct in that there is a coupling between using INFO or greater to get some instrumentation going. My thinking behind this was two-fold:
    1. to provide instrumentation that makes sense out of the box, without having to set up anything special (save command-line arguments). As such DEBUG statements don't send anything to statsD and ERROR/CRITICAL/WARNING only send limited metrics.
    2. to equate instrumentation with informational logging: instrumentation is most useful for production, where INFO is likely to be enabled (but not DEBUG).
  • resp.status is certainly worth capturing as a rate; I'll add that.

@alq666
Copy link
Contributor Author

alq666 commented May 31, 2014

@tilgovi I've made the changes we discussed (using extra and capturing resp.status as a rate).

I have also added a test to verify that even if statsD instrumentation fails, the logger continues to work.

def __init__(self, status):
self.status = status

def test_statsd_fail():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't actually assert anything, do they?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, I guess. I see you've wrapped all the methods in the logger in except handlers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if the exception bubbles back up, which it should not, the test will fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Right :)
On May 31, 2014 1:54 PM, "Alexis Lê-Quôc" notifications@github.com wrote:

In tests/test_009-statsd.py:

  •    self.failp = failp
    
  •    self.msgs = []  # accumulate messages for later inspection
    
  • def send(self, msg):
  •    if self.failp:
    
  •        raise TestException("Should not interrupt the logger")
    
  •    self.msgs.append(msg)
    
  • def reset(self):
  •    self.msgs = []
    
    +class MockResponse(object):
  • def init(self, status):
  •    self.status = status
    
    +def test_statsd_fail():

Right, if the exception bubbles back up, which it should not, the test
will fail.


Reply to this email directly or view it on GitHub
https://github.com/benoitc/gunicorn/pull/748/files#r13263654.

@tilgovi
Copy link
Collaborator

tilgovi commented May 31, 2014

@benoitc this looks alright to me. Thoughts?

@tilgovi
Copy link
Collaborator

tilgovi commented Jun 10, 2014

@benoitc any reason not to include this in the next release"

@benoitc
Copy link
Owner

benoitc commented Jun 12, 2014

The patch looks fine for me but I would change the way it is enabled before to integrate it. Instead of passing the class to the the logger class option and add an option to set the host I would automatically set the statd logger if the --logs-statsd-to option is used. Also since it would be a supported backend we should be able to only use a short name with the --logging-class option. While we would be here I would rename this option as --logging-backend. Also naming --logs-statsd-to in --statsd-host sounds better for me.

If it's done until tomorrow we can include it in R19.0. If not it will wait the next one (planned this month or in july).

Thoughts?

@benoitc
Copy link
Owner

benoitc commented Jun 12, 2014

let's not wait much more for the 19,0, This feature will be added in 19.1 this month. We should put it in master ASAP.

@alq666
Copy link
Contributor Author

alq666 commented Jun 12, 2014

@benoitc I can implemented the suggested changes over the next week or so.

@benoitc
Copy link
Owner

benoitc commented Jun 12, 2014

@alq666 cool! will merge once everything is OK :)

alq666 added 3 commits June 18, 2014 21:33
No need to specify a logger class.
--statsd-to is enough to trigger the Statsd logger automatically
@alq666
Copy link
Contributor Author

alq666 commented Jun 19, 2014

@benoitc @tilgovi I changed the option to --statsd-host. That in itself is enough to trigger the statsD logger automatically so there is no need to use --logger-class to use it. I left the --logger-class option as is to keep this simpler.

@tilgovi
Copy link
Collaborator

tilgovi commented Jun 19, 2014

Looking great.

The readme example needs to be updated to reflect the extras keyword use, yeah?

@alq666
Copy link
Contributor Author

alq666 commented Jun 22, 2014

@tilgovi should be good now.

@matrixise
Copy link
Collaborator

+1. Can we merge this branch?

@@ -482,6 +482,11 @@ def manage_workers(self):
(pid, _) = workers.pop(0)
self.kill_worker(pid, signal.SIGTERM)

self.log.info("{0} workers".format(len(workers)),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can't test it right now but does this change works with the default logger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work with default logger indeed:

Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Starting gunicorn 19.1.0
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Listening at: http://127.0.0.1:8000 (17090)
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Using worker: sync
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17095] Booting worker with pid: 17095
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17096] Booting worker with pid: 17096
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17097] Booting worker with pid: 17097
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17098] Booting worker with pid: 17098
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17090] 4 workers

using gunicorn --log-syslog --log-level=info --workers=4 test:app as a test.

benoitc added a commit that referenced this pull request Jun 28, 2014
gunicorn instrumentation via statsD and a new logger class
@benoitc benoitc merged commit ceda9c5 into benoitc:master Jun 28, 2014
@benoitc
Copy link
Owner

benoitc commented Jun 28, 2014

merged thanks!

@srgshames
Copy link

Really cool feature!

I was wondering if there's a way to differentiate between multiple applications? Is there a way to assign a name/prefix to an instance of a gunicorn app to be reflected in the metric name?

@tilgovi
Copy link
Collaborator

tilgovi commented Sep 9, 2014

@srgshames good thought.
@alq666 do you think we should consider prepending the app name to the metrics? Or appending? It's easily available as self.cfg.proc_name from the logger.

@alq666
Copy link
Contributor Author

alq666 commented Sep 9, 2014

I can certainly stick it in the middle in the normal case, just after
gunicorn but before the actual metric name.

In the case the statsd's implementation supports explicit tags (like
Datadog's), I will add the app name as a tag.
On Sep 9, 2014 2:52 AM, "Randall Leeds" notifications@github.com wrote:

@srgshames https://github.com/srgshames good thought.
@alq666 https://github.com/alq666 do you think we should consider
prepending the app name to the metrics? Or appending? It's easily available
as self.cfg.proc_name from the logger.


Reply to this email directly or view it on GitHub
#748 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants