-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only log '[num] workers' message when it changes. #1078
Conversation
@@ -473,6 +473,8 @@ def manage_workers(self): | |||
Maintain the number of workers by spawning or killing | |||
as required. | |||
""" | |||
orig_num_workers = self.num_workers | |||
|
|||
if len(self.WORKERS.keys()) < self.num_workers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather tag here if a change has been done and test against it instead of comparing numbers. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, using a mutable has_changed
-type flag tends to open up a class of errors where the flag does not get set appropriately after modification. Particularly when it happens in a separate method call (Can kill_worker
/spawn_workers
modify num_workers
? Pretty sure. Do they always? I don't know). That kind of bug crops up more during future code changes/maintenance.
It also makes the if-statement harder to reason about: does this flag represent that the value being printed out changed? That it could have changed? Should have? If I see multiple "3 workers" messages in sequence, is that expected or a bug?
This is a short method, but those are the reasons I tend to prefer the state-comparison approach when it's not a computationally difficult one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More concretely, as someone unfamiliar with the codebase, I didn't know if spawn_workers
would necessarily change that number, or could be capped by memory or some other configuration value.
It could be expected to try to spawn each time but not necessarily succeed at increasing that number; I'd have to look deeper to see if that was a (long-term) guarantee.
(I hope that makes it clear where my head was at here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your reasoning is sound but if the flag is localized to this function then it's pretty fail proof.
At the top of the function, just set count_changed = False
.
In each of the loops you can set count_changed = True
.
At the bottom of the function you can check count_changed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think I slightly prefer what I just described for exactly the reasons you say. While it should be clear from the code that self.num_workers
must change or the loops would not terminate, it's perhaps clearest to just set a flag when we're attempting to change the number.
It's conceivable that self.num_workers
could change between the loops so that it is first increased and then decreased, so if the edge case of logging the same number twice bothers us then your way is indeed better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if kill_worker
moved the worker from self.WORKERS
to a dead worker list? I don't think ESRCH
block is necessary because reap_workers
should still get the SIGCHLD
.
Then the manage_workers
becomes three steps:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that last part, not three steps. manage_workers
can stay the same, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erydo maybe let's make a PR to clean this dance up a bit and then let's rebase this. If you're up for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, I'll give it a stab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1084.
This PR is on hold until #1084 is completed. |
Otherwise when debug logging is on, the message prints every second even with no system activity.
(Rebasing against current master) |
258eba0
to
09357ed
Compare
This is easier and safer than only logging when we detect that self.WORKERS has changed or that `spawn_worker` or `kill_worker` has been done.
@tilgovi @benoitc — I've updated this to no longer rely on synchronicity/asynchronicity of This was possibly the right approach to begin with (no error conditions!) and decouples it from #1084, which might take some more time to merge in. |
Quite like the idea :) While we are here maybe we could also just log where the number of workers did really change in spawn workers and when we reap them. Something like the patch below. I didn't test it though. Thoughts? diff --git a/gunicorn/arbiter.py b/gunicorn/arbiter.py
index b7ee05d..85fa5ea 100644
--- a/gunicorn/arbiter.py
+++ b/gunicorn/arbiter.py
@@ -55,6 +55,7 @@ class Arbiter(object):
os.environ["SERVER_SOFTWARE"] = SERVER_SOFTWARE
self._num_workers = None
+ self.worker_count = None
self.setup(app)
self.pidfile = None
@@ -464,6 +465,7 @@ class Arbiter(object):
if not worker:
continue
worker.tmp.close()
+ self._log_numworkers()
except OSError as e:
if e.errno != errno.ECHILD:
raise
@@ -475,6 +477,7 @@ class Arbiter(object):
"""
if len(self.WORKERS.keys()) < self.num_workers:
self.spawn_workers()
+ self._log_numworkers()
workers = self.WORKERS.items()
workers = sorted(workers, key=lambda w: w[1].age)
@@ -482,10 +485,6 @@ class Arbiter(object):
(pid, _) = workers.pop(0)
self.kill_worker(pid, signal.SIGTERM)
- self.log.debug("{0} workers".format(len(workers)),
- extra={"metric": "gunicorn.workers",
- "value": len(workers),
- "mtype": "gauge"})
def spawn_worker(self):
self.worker_age += 1
@@ -563,8 +562,20 @@ class Arbiter(object):
try:
worker = self.WORKERS.pop(pid)
worker.tmp.close()
+ self._log_numworkers()
self.cfg.worker_exit(self, worker)
return
except (KeyError, OSError):
return
raise
+
+ def _log_numworkers(self):
+ nworker_count = len(workers)
+ if self.worker_count != nworker_count:
+ self.log.debug("{0} workers".format(nworker_count),
+ extra={"metric": "gunicorn.workers",
+ "value": len(workers),
+ "mtype": "gauge"})
+ self.worker_count = nworker_count
+ |
@@ -51,6 +51,8 @@ class Arbiter(object): | |||
if name[:3] == "SIG" and name[3] != "_" | |||
) | |||
|
|||
last_logged_worker_count = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be probably an instance variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance variable is created once it's assigned to for the first time. I'd be fine moving it into __init__
.
@benoitc — I would make the same argument as I made at the beginning of this PR. The patch I've proposed here has no edge cases and is resilient to refactorings instead of trying to find and maintain every place that modifies Additionally, the patch in your comment would log every change in |
hrm the patch in manage_workers only log if new workers need to be spawned and after it. Did I miss smth? The point to log there and in reap workers is to only log when the number did really change not before it changes on kill. I think it would be more accurate although we can optimise the reap workers case. This is at least the intention :) |
Ah, you're right, I misread your patch! For some reason I misremembered spawning as being in a loop like killing is. I'd still stand by the maintenance argument, though: If it's desirable to also log once workers finish dying, I'd suggest that that change should be made separately, and should likely be based off #1084, which consolidates that worker cleanup work. |
All of this depends of what is expected by using this metric: supervise the number of active workers or supervise the number of workers alives (running). Maybe indeed we should have 2 metrics indeed and I guess you're right and your patch is enough to tell the number of active workers and then logged as it. I commented the patch for a last one change, then let's commit it:) |
@@ -55,6 +55,8 @@ def __init__(self, app): | |||
os.environ["SERVER_SOFTWARE"] = SERVER_SOFTWARE | |||
|
|||
self._num_workers = None | |||
self._last_logged_active_worker_count = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last_logged
is not needed. let's just call it self._active_worker_count
.
OK if noone object I will merge this PR and do the renaming right after. OK? |
👍 On Wed, Jul 29, 2015, 06:08 Benoit Chesneau notifications@github.com
|
Only log '[num] workers' message when it changes.
Only log '[num] workers' message when it changes.
Otherwise when debug logging is on, the message prints every second even with no system activity, drowning out more useful log messages: