Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👌 IMPROVE: Check for recycled circus PID #4858

Merged
merged 2 commits into from
Jul 28, 2021

Conversation

dev-zero
Copy link
Contributor

No description provided.

@dev-zero dev-zero force-pushed the bugfix/daemon-pid-recycling branch from b45c11a to b0bec30 Compare April 16, 2021 09:00
@ramirezfranciscof
Copy link
Member

Hey @dev-zero , thanks for the contribution! It seems quite straightforward code-wise, but could you explain a bit the exact objective of this and what kind of problem is addressing in general (maybe you already did this in some issue, in which case, could you link it?)

I guess most of the functionality can be deduced from the change itself but just to be sure intent aligns with execution. 😅

@dev-zero
Copy link
Contributor Author

@ramirezfranciscof in that part of the code we are looking up whether there's a process for the recorded PID. If there is not we can assume the daemon crashed and we have to restart. If there is we have to be a bit more careful since the PID can be recycled (maybe after a restart of the machine, but also on long-running machines). If a process is found we have to see whether the process is in fact a circus process, and if it is a circus process whether it is started by the same user requesting the start. The last one is a real corner case because it would mean that another user on the same machine is running an aiida daemon and by chance got the same PID as this users aiida daemon before.
Besides, one case is still not caught: a user running multiple aiida daemons from different virtualenvs/conda environments.

@dev-zero dev-zero force-pushed the bugfix/daemon-pid-recycling branch from b0bec30 to 505e32f Compare April 26, 2021 12:12
@codecov
Copy link

codecov bot commented Apr 26, 2021

Codecov Report

Merging #4858 (9fb013d) into develop (91c1c0b) will decrease coverage by 0.57%.
The diff coverage is 50.00%.

❗ Current head 9fb013d differs from pull request most recent head f9d3a0e. Consider uploading reports for the commit f9d3a0e to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4858      +/-   ##
===========================================
- Coverage    80.23%   79.66%   -0.56%     
===========================================
  Files          515      523       +8     
  Lines        36746    37170     +424     
===========================================
+ Hits         29478    29608     +130     
- Misses        7268     7562     +294     
Flag Coverage Δ
django 74.26% <50.00%> (-0.44%) ⬇️
sqlalchemy 73.18% <50.00%> (-0.44%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/cmdline/utils/daemon.py 70.15% <50.00%> (-0.61%) ⬇️
aiida/__init__.py 59.46% <0.00%> (-30.01%) ⬇️
aiida/orm/computers.py 68.37% <0.00%> (-13.05%) ⬇️
aiida/engine/processes/workchains/restart.py 67.28% <0.00%> (-11.04%) ⬇️
aiida/engine/processes/workchains/workchain.py 85.90% <0.00%> (-7.08%) ⬇️
aiida/tools/graph/deletions.py 83.68% <0.00%> (-6.57%) ⬇️
aiida/schedulers/scheduler.py 76.25% <0.00%> (-6.50%) ⬇️
aiida/engine/processes/workchains/awaitable.py 90.00% <0.00%> (-5.00%) ⬇️
aiida/restapi/common/utils.py 74.08% <0.00%> (-4.90%) ⬇️
aiida/engine/daemon/execmanager.py 62.93% <0.00%> (-4.01%) ⬇️
... and 94 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f99f1e8...f9d3a0e. Read the comment docs.

unkcpz
unkcpz previously approved these changes Jul 13, 2021
Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dev-zero thanks, it looks good to me.

I have another thought about this delete_stale_pid_file check. Since it always sticks right after the client creation (except in cmd_daemon::stop show following, but I guess it's fine to move the line backwards), maybe it is safe to put this check inside get_daemon_client? (discussion required, not required for this PR though)

:

if not client.is_daemon_running:
echo.echo('Daemon was not running')
continue
delete_stale_pid_file(client)

the pid of the stale process could be recycled as the daemon of someone else
@dev-zero
Copy link
Contributor Author

@chrisjsewell can this be merged despite the coverage fail?

@chrisjsewell chrisjsewell changed the title cmdline/daemon: make sure found pid is from this user 👌 IMPROVE: Check for recycled circus PID Jul 28, 2021
@chrisjsewell chrisjsewell merged commit 61b893b into aiidateam:develop Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants