Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live salt-master Profiling with SIGUSR2 fails #24276

Closed
marbx opened this issue May 31, 2015 · 26 comments
Closed

Live salt-master Profiling with SIGUSR2 fails #24276

marbx opened this issue May 31, 2015 · 26 comments
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P3 Priority 3 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@marbx
Copy link
Contributor

marbx commented May 31, 2015

According to http://docs.saltstack.com/en/latest/topics/troubleshooting/master.html#live-salt-master-profiling
killall -SIGUSR2 salt-master
should turn on profiling.

Instead, the command turnes salt-master into a zombie process.


salt --versions-report
                  Salt: 2014.7.4
                Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
                Jinja2: 2.6
              M2Crypto: 0.21.1
        msgpack-python: 0.1.10
          msgpack-pure: Not Installed
              pycrypto: 2.6
               libnacl: Not Installed
                PyYAML: 3.10
                 ioflo: Not Installed
                 PyZMQ: 13.1.0
                  RAET: Not Installed
                   ZMQ: 3.2.3
                  Mako: 0.7.0
 Debian source package: 2014.7.4+ds-1~bpo70+1

@jacksontj
Copy link
Contributor

If you look in the master logs, were there any exceptions? I've used this feature a few times and haven't had issues with it.

@marbx
Copy link
Contributor Author

marbx commented Jun 3, 2015

/var/log/salt/master was empty

I started the salt-master in the foreground with
sudo service salt-master stop && sudo salt-master -l info
and got a to-screen logging that partly also goes to /var/log/salt/master

I issued
killall -SIGUSR2 salt-master

On the screen I instantly got

Process Reactor-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 687, in run
    for data in self.event.iter_events(full=True):
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 374, in iter_events
    data = self.get_event(tag=tag, full=full)
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 345, in get_event
    ret = self._get_event(wait, tag, pending_tags)
  File "/usr/lib/python2.7/dist-packages/salt/utils/event.py", line 288, in _get_event
    socks = dict(self.poller.poll(wait * 1000))
  File "/usr/lib/python2.7/dist-packages/zmq/sugar/poll.py", line 97, in poll
    return zmq_poll(list(self.sockets.items()), timeout=timeout)
  File "_poll.pyx", line 116, in zmq.core._poll.zmq_poll (zmq/core/_poll.c:1598)
  File "checkrc.pxd", line 21, in zmq.core.checkrc._check_rc (zmq/core/_poll.c:1965)
ZMQError: Interrupted system call

/var/log/salt/master does not contain the above traceback, only a KeyboardInterrupt Exception.

One salt-master process turned into a zombie

I stopped the salt-master service
sudo service salt-master stop

And started the salt-master service
sudo service salt-master start

jacksontj added a commit to jacksontj/salt that referenced this issue Jun 4, 2015
poller.poll ends up calling the poll system call under the hood. If the process which called poll() gets a signal it will raise `ZMQError: Interrupted system call` which we should catch and move continue.
@jacksontj
Copy link
Contributor

Relatively easy fix (#24405). Basically that process that threw the backtrace was the reactor. That process sits and watches the event but and then reacts to those events. The problem was that the call to poll() wasn't within the try/except block, meaning that if it got a signal while calling poll() it would throw this exception (which for a relatively quiet master-- this is where it will spend all of its time).

Generally speaking I'd suggest being more specific with the SIGUSR2 (catching the process you intend) unless the goal is to get all of the profiling data-- which this will do. In case you didn't know if you install python-setproctitle it will actually set the name (in ps) for each salt-master process to what it actually does.

thatch45 added a commit that referenced this issue Jun 4, 2015
@cachedout
Copy link
Contributor

Nice catch, @jacksontj !

@jfindlay jfindlay added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Core relates to code central or existential to Salt P3 Priority 3 fixed-pls-verify fix is linked, bug author to confirm fix labels Jun 5, 2015
@jfindlay jfindlay added this to the Approved milestone Jun 5, 2015
@jfindlay
Copy link
Contributor

jfindlay commented Jun 5, 2015

@markuskramerIgitt, close this issue if you think @jacksontj's fix solved your problem, thanks.

@marbx
Copy link
Contributor Author

marbx commented Jun 6, 2015

Hello @jacksontj,
thank you a lot for the fix!
I indeed use reactor and orchestration.

Thank you also for your suggestion to send SIGUSR2 only to one process.
I can easily identify the only process that causes high load because it has 17-19 threads and consumes about 3 times as much CPU than all other salt-master processes, which only have 7-9 threads, each.
(Over 24h we generally need to restart the salt-master because at that time it remains at 100% CPU while it does not need much CPU after restart) I hope to get more clarity through profiling. What can I do for clarification in the meantime?

Thank you a third time for pointing to setproctitle, which I never heard of.
I look forward to use it, but I have to install it first.

Are these the correct steps for installation on Debian? Do I have to configure something?

sudo apt-get install python-setuptools
sudo easy_install setproctitle

@marbx marbx closed this as completed Jun 6, 2015
@marbx
Copy link
Contributor Author

marbx commented Jun 6, 2015

Hello @jfindlay
Which Salt version will contain the fix from @jacksontj?

@marbx marbx reopened this Jun 6, 2015
@jfindlay
Copy link
Contributor

jfindlay commented Jun 6, 2015

@markuskramerIgitt, the next release will be 2015.5.3, which will come out in about a month.

@rallytime
Copy link
Contributor

@markuskramerIgitt Do you consider this issue resolved, now that 2015.5.3 (and 2015.5.5) have been released? Or does more work need to be done here?

@marbx
Copy link
Contributor Author

marbx commented Sep 13, 2015

Hi @rallytime, thank you for the reminder.
With 2015.5.3 (this is the highest packaged version on Debian) I see an improvement, but the issue is not resolved.
The salt-master no longer turns into a zombie but I cannot find the yuppi result.

I reread https://docs.saltstack.com/en/latest/topics/troubleshooting/master.html#live-salt-master-profiling

Running a background salt-master
Issuing sudo killall -SIGUSR2 salt-master
The salt-master continues to work.
Issuing some salt commands
Issuing a second sudo killall -SIGUSR2 salt-master
There is no file at /tmp.
/var/log/salt/master is empty

Running a foreground salt-master by issuing
sudo service salt-master stop && sudo salt-master -l info
Issuing sudo killall -SIGUSR2 salt-master
The foreground salt-master does not "report filename for the results"
The salt-master continues to work.
Issuing some salt commands
Issuing a second sudo killall -SIGUSR2 salt-master
Terminating the foreground salt-master with Ctrl-C
There is no file at /tmp.
/var/log/salt/master does not mention yuppi

sudo salt --versions-report
                  Salt: 2015.5.3
                Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
                Jinja2: 2.6
              M2Crypto: 0.21.1
        msgpack-python: 0.1.10
          msgpack-pure: Not Installed
              pycrypto: 2.6
               libnacl: Not Installed
                PyYAML: 3.10
                 ioflo: Not Installed
                 PyZMQ: 13.1.0
                  RAET: Not Installed
                   ZMQ: 3.2.3
                  Mako: 0.7.0
               Tornado: Not Installed
 Debian source package: 2015.5.3+ds-1~bpo70+2

@rallytime
Copy link
Contributor

Thanks for the update @markuskramerIgitt. I'll remove the Fixed Pending Verification label.

@jacksontj or @cachedout: just an FYI ping on this one.

@rallytime rallytime removed the fixed-pls-verify fix is linked, bug author to confirm fix label Sep 14, 2015
@marbx
Copy link
Contributor Author

marbx commented Sep 16, 2015

Hello @rallytime, I would gladly verify the fix - I just don't find the yappi output and I don't know how to look for it.

Btw, I installed setproctitle on Debian with:

apt-get -V install python-pip
apt-get -V install python-dev
pip install setproctitle
service salt-master stop
service salt-master start

resulting in this htop output:

/sbin/init
├─ /usr/bin/python /usr/bin/salt-master ProcessManager
│  ├─ /usr/bin/python /usr/bin/salt-master ReqServer_ProcessManager
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorkerQueue
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  ├─ /usr/bin/python /usr/bin/salt-master MWorker
│  │  └─ /usr/bin/python /usr/bin/salt-master MWorker
│  ├─ /usr/bin/python /usr/bin/salt-master EventPublisher
│  ├─ /usr/bin/python /usr/bin/salt-master Publisher
│  └─ /usr/bin/python /usr/bin/salt-master Maintenance

@stale
Copy link

stale bot commented Jan 3, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 3, 2018
@stale stale bot closed this as completed Jan 10, 2018
@marbx
Copy link
Contributor Author

marbx commented May 29, 2019

How do I find the profiling result, please?

When the salt-master runs in the foreground and receives killall -SIGUSR2 salt-master, it does not “report filename for the results”, but instead

[DEBUG   ] Writing master key cache
[DEBUG   ] This salt-master instance has accepted 2 minion keys.

There is no file under /tmp.

salt --versions-report
Salt Version:
Salt: 2018.3.4

Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.5 (default, Apr 9 2019, 14:30:50)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: centos 7.6.1810 Core
locale: UTF-8
machine: x86_64
release: 3.10.0-957.10.1.el7.x86_64
system: Linux
version: CentOS Linux 7.6.1810 Core

@marbx
Copy link
Contributor Author

marbx commented May 30, 2019

@jacksontj , how did you find the profile results?

@marbx
Copy link
Contributor Author

marbx commented May 30, 2019

@rallytime could you please reopen this issue? Thank you

@marbx
Copy link
Contributor Author

marbx commented Jun 12, 2019

@rallytime , @gtmanfred or @SaltStackSupport could you please reopen this issue and kindly advise how to find the profiler result? Thank you

Instead the location of the profiler result, killall -SIGUSR2 salt-master only leads to

[DEBUG   ] Writing master key cache
[DEBUG   ] This salt-master instance has accepted 2 minion keys.

@gtmanfred
Copy link
Contributor

@markuskramerIgitt neither @rallytime nor I work for Saltstack anymore.

@saltstack/team-core can you take a look at this?

@marbx
Copy link
Contributor Author

marbx commented Jun 12, 2019 via email

@waynew waynew reopened this Jun 12, 2019
@stale
Copy link

stale bot commented Jun 12, 2019

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Jun 12, 2019
@waynew
Copy link
Contributor

waynew commented Jun 12, 2019

@markuskramerIgitt I'll take a look at this

@waynew
Copy link
Contributor

waynew commented Jun 13, 2019

@markuskramerIgitt it looks like stderr for the master process has to be attached to a tty. Make sure you have yappi installed, but then all you have to do is this:

# salt-master

In one terminal, and then in the other:

# killall -SIGUSR2 salt-master
# killall -SIGUSR2 salt-master

What you should then see in your other terminal window is something like this:

[root@c0ff859c66ab salt]# salt-master
Profiling started
Saved profiling data to: /tmp/callgrind.salt-1560451978-918

I'm going to go ahead and close this under the assumption that this will work for you - but if it doesn't, please let me know and I'll reopen.

@waynew waynew closed this as completed Jun 13, 2019
@marbx marbx mentioned this issue Jun 14, 2019
@marbx
Copy link
Contributor Author

marbx commented Jun 14, 2019

Thank you, @waynew
Do you agree that this should be documented?

@waynew
Copy link
Contributor

waynew commented Jun 14, 2019

https://docs.saltstack.com/en/latest/topics/troubleshooting/master.html#live-python-debug-output

It appears to be:

To pass a SIGUSR1 to the master, first make sure the master is running in the foreground. Stop the service if it is running as a daemon, and start it in the foreground like so:

...

When faced with performance problems one can turn on master process profiling by sending it SIGUSR2.

...

This will activate yappi profiler inside salt-master code, then after some time one must send SIGUSR2 again to stop profiling and save results to file. If run in foreground salt-master will report filename for the results, which are usually located under /tmp on Unix-based OSes and c:\temp on windows.

I'm not sure how that could be improved 🤷‍♂

@marbx
Copy link
Contributor Author

marbx commented Jun 17, 2019

@waynew by "this" I meant your comment "Make sure you have yappi installed." and the pull request above, which I assumed you noticed, has been accepted: https://github.com/saltstack/salt/pull/53481/files

@waynew
Copy link
Contributor

waynew commented Jun 17, 2019

Ah, I see. Yeah, that makes sense. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt P3 Priority 3 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

7 participants