-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart agent if it uses too much memory: #429
Conversation
Augment the Watchdog to restart the agent process not only if it has been running longer than the restart interval, but also if the agent process exceeds a memory threshold. The obvious way to do this would be setrlimit with a max RSS size, and allow the process to die with a MemoryError when it is exceeded, but the Tornado library being used swallows MemoryErrors and allows the process to continue. In order to ensure the process is terminated, the current memory usage is checked in the Watchdog-reset method, and the process is aborted if the threshold is exceeded.
Looks like the hard-coded timeout and memory limit in the new test don't work on the Travis test VM. |
@echohead I'm wondering whether we should put it here or make it part of /etc/security/limits.conf.d/ so as to catch every single case of going over quota. The nice thing about the systemic approach in limits.conf is that we get that for free but it requires a reboot to take (and testing has been done for us by generations of linux users). Is that an approach you've considered? |
Why not to use an event-listener in supervisord? http://supervisord.org/events.html#configuring-an-event-listener it is in the superlance plugin https://superlance.readthedocs.org/en/latest/memmon.html |
I do like the supervisor plugin, but sadly it doesn't cover CentOS 5 users, since they don't run with supervisor. Good to know about the memmon plugin though. |
alq666 - putting it under /etc/security/limits.conf.d would make sense to me. I was basically just throwing this out there for discussion. While of course any approach will have its tradeoffs, a couple of nice things about this approach:
Also, this approach was chosen with a motivation of achieving the desired behavior with minimal modification to the agent: |
@@ -208,6 +214,11 @@ def self_destruct(signum, frame): | |||
|
|||
|
|||
def reset(self): | |||
# self destruct if using too much memory, as tornado will swallow MemoryErrors | |||
mem_usage_kb = int(os.popen('ps -p %d -o %s | tail -1' % (os.getpid(), 'rss')).read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code won't work on Windows.
I think you should use the resource python module: http://docs.python.org/2/library/resource.html#resource-usage
@clofresh any thought ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@remh The Watchdog isn't used on Windows at all so it should be okay.
Thanks @mastrolinux it looks good ! |
Restart agent if it uses too much memory:
Augment the Watchdog to restart the agent process
not only if it has been running longer than the restart
interval, but also if the agent process exceeds a
memory threshold.
The obvious way to do this would be setrlimit with a max
RSS size, and allow the process to die with a MemoryError
when it is exceeded, but the Tornado library being used
swallows MemoryErrors and allows the process to continue.
In order to ensure the process is terminated, the current
memory usage is checked in the Watchdog-reset method, and
the process is aborted if the threshold is exceeded.