Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only log '[num] workers' message when it changes. #1078

Merged
merged 3 commits into from
Aug 19, 2015

Conversation

rwe
Copy link
Contributor

@rwe rwe commented Jul 12, 2015

Otherwise when debug logging is on, the message prints every second even with no system activity, drowning out more useful log messages:

[2015-07-12 17:39:25 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:26 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:27 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:28 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:29 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:30 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:31 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:32 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:33 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:34 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:35 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:36 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:37 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:38 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:39 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:40 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:41 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:42 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:43 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:45 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:46 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:47 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:48 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:49 +0000] [19970] [DEBUG] 3 workers
[2015-07-12 17:39:50 +0000] [19970] [DEBUG] 3 workers

@@ -473,6 +473,8 @@ def manage_workers(self):
Maintain the number of workers by spawning or killing
as required.
"""
orig_num_workers = self.num_workers

if len(self.WORKERS.keys()) < self.num_workers:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather tag here if a change has been done and test against it instead of comparing numbers. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, using a mutable has_changed-type flag tends to open up a class of errors where the flag does not get set appropriately after modification. Particularly when it happens in a separate method call (Can kill_worker/spawn_workers modify num_workers? Pretty sure. Do they always? I don't know). That kind of bug crops up more during future code changes/maintenance.

It also makes the if-statement harder to reason about: does this flag represent that the value being printed out changed? That it could have changed? Should have? If I see multiple "3 workers" messages in sequence, is that expected or a bug?

This is a short method, but those are the reasons I tend to prefer the state-comparison approach when it's not a computationally difficult one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More concretely, as someone unfamiliar with the codebase, I didn't know if spawn_workers would necessarily change that number, or could be capped by memory or some other configuration value.

It could be expected to try to spawn each time but not necessarily succeed at increasing that number; I'd have to look deeper to see if that was a (long-term) guarantee.

(I hope that makes it clear where my head was at here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your reasoning is sound but if the flag is localized to this function then it's pretty fail proof.

At the top of the function, just set count_changed = False.
In each of the loops you can set count_changed = True.
At the bottom of the function you can check count_changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think I slightly prefer what I just described for exactly the reasons you say. While it should be clear from the code that self.num_workers must change or the loops would not terminate, it's perhaps clearest to just set a flag when we're attempting to change the number.

It's conceivable that self.num_workers could change between the loops so that it is first increased and then decreased, so if the edge case of logging the same number twice bothers us then your way is indeed better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if kill_worker moved the worker from self.WORKERS to a dead worker list? I don't think ESRCH block is necessary because reap_workers should still get the SIGCHLD.

Then the manage_workers becomes three steps:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that last part, not three steps. manage_workers can stay the same, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erydo maybe let's make a PR to clean this dance up a bit and then let's rebase this. If you're up for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I'll give it a stab.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1084.

@rwe
Copy link
Contributor Author

rwe commented Jul 13, 2015

This PR is on hold until #1084 is completed.

Otherwise when debug logging is on, the message prints every second even
with no system activity.
@rwe
Copy link
Contributor Author

rwe commented Jul 19, 2015

(Rebasing against current master)

@rwe rwe force-pushed the reduce-arbiter-noise branch from 258eba0 to 09357ed Compare July 19, 2015 20:03
This is easier and safer than only logging when we detect that
self.WORKERS has changed or that `spawn_worker` or `kill_worker` has
been done.
@rwe
Copy link
Contributor Author

rwe commented Jul 19, 2015

@tilgovi @benoitc — I've updated this to no longer rely on synchronicity/asynchronicity of spawn_worker, kill_worker, or self.WORKERS. It just stores the last number that was logged and checks against that.

This was possibly the right approach to begin with (no error conditions!) and decouples it from #1084, which might take some more time to merge in.

@benoitc
Copy link
Owner

benoitc commented Jul 23, 2015

Quite like the idea :) While we are here maybe we could also just log where the number of workers did really change in spawn workers and when we reap them. Something like the patch below. I didn't test it though. Thoughts?

diff --git a/gunicorn/arbiter.py b/gunicorn/arbiter.py
index b7ee05d..85fa5ea 100644
--- a/gunicorn/arbiter.py
+++ b/gunicorn/arbiter.py
@@ -55,6 +55,7 @@ class Arbiter(object):
         os.environ["SERVER_SOFTWARE"] = SERVER_SOFTWARE

         self._num_workers = None
+        self.worker_count = None
         self.setup(app)

         self.pidfile = None
@@ -464,6 +465,7 @@ class Arbiter(object):
                     if not worker:
                         continue
                     worker.tmp.close()
+                    self._log_numworkers()
         except OSError as e:
             if e.errno != errno.ECHILD:
                 raise
@@ -475,6 +477,7 @@ class Arbiter(object):
         """
         if len(self.WORKERS.keys()) < self.num_workers:
             self.spawn_workers()
+            self._log_numworkers()

         workers = self.WORKERS.items()
         workers = sorted(workers, key=lambda w: w[1].age)
@@ -482,10 +485,6 @@ class Arbiter(object):
             (pid, _) = workers.pop(0)
             self.kill_worker(pid, signal.SIGTERM)

-        self.log.debug("{0} workers".format(len(workers)),
-                       extra={"metric": "gunicorn.workers",
-                              "value": len(workers),
-                              "mtype": "gauge"})

     def spawn_worker(self):
         self.worker_age += 1
@@ -563,8 +562,20 @@ class Arbiter(object):
                 try:
                     worker = self.WORKERS.pop(pid)
                     worker.tmp.close()
+                    self._log_numworkers()
                     self.cfg.worker_exit(self, worker)
                     return
                 except (KeyError, OSError):
                     return
             raise
+
+    def _log_numworkers(self):
+        nworker_count = len(workers)
+        if self.worker_count != nworker_count:
+            self.log.debug("{0} workers".format(nworker_count),
+                    extra={"metric": "gunicorn.workers",
+                        "value": len(workers),
+                        "mtype": "gauge"})
+            self.worker_count = nworker_count
+

@@ -51,6 +51,8 @@ class Arbiter(object):
if name[:3] == "SIG" and name[3] != "_"
)

last_logged_worker_count = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be probably an instance variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instance variable is created once it's assigned to for the first time. I'd be fine moving it into __init__.

@rwe
Copy link
Contributor Author

rwe commented Jul 23, 2015

@benoitc — I would make the same argument as I made at the beginning of this PR. The patch I've proposed here has no edge cases and is resilient to refactorings instead of trying to find and maintain every place that modifies self.WORKERS.

Additionally, the patch in your comment would log every change in manage_workers which could be quite noisy if it's spawning/killing multiple workers (e.g. on startup you'd get a log message with an incrementing number for each worker). My motivation for this PR in the first place was to reduce the unnecessary noise.

@benoitc
Copy link
Owner

benoitc commented Jul 23, 2015

hrm the patch in manage_workers only log if new workers need to be spawned and after it. Did I miss smth?

The point to log there and in reap workers is to only log when the number did really change not before it changes on kill. I think it would be more accurate although we can optimise the reap workers case. This is at least the intention :)

@rwe
Copy link
Contributor Author

rwe commented Jul 23, 2015

Ah, you're right, I misread your patch! For some reason I misremembered spawning as being in a loop like killing is. I'd still stand by the maintenance argument, though: manage_workers gets called regularly. Logging the worker count there is the least complex, least error-prone implementation of logging the current active worker count.

If it's desirable to also log once workers finish dying, I'd suggest that that change should be made separately, and should likely be based off #1084, which consolidates that worker cleanup work.

@benoitc
Copy link
Owner

benoitc commented Jul 24, 2015

All of this depends of what is expected by using this metric: supervise the number of active workers or supervise the number of workers alives (running). Maybe indeed we should have 2 metrics indeed and I guess you're right and your patch is enough to tell the number of active workers and then logged as it.

I commented the patch for a last one change, then let's commit it:)

@@ -55,6 +55,8 @@ def __init__(self, app):
os.environ["SERVER_SOFTWARE"] = SERVER_SOFTWARE

self._num_workers = None
self._last_logged_active_worker_count = None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last_logged is not needed. let's just call it self._active_worker_count .

@benoitc
Copy link
Owner

benoitc commented Jul 29, 2015

OK if noone object I will merge this PR and do the renaming right after. OK?

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 29, 2015

👍

On Wed, Jul 29, 2015, 06:08 Benoit Chesneau notifications@github.com
wrote:

OK if noone object I will merge this PR and do the renaming right after.
OK?


Reply to this email directly or view it on GitHub
#1078 (comment).

benoitc added a commit that referenced this pull request Aug 19, 2015
Only log '[num] workers' message when it changes.
@benoitc benoitc merged commit a132ca9 into benoitc:master Aug 19, 2015
mjjbell pushed a commit to mjjbell/gunicorn that referenced this pull request Mar 16, 2018
Only log '[num] workers' message when it changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants