Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix daemon in docker #1246

Merged

Conversation

giovannipizzi
Copy link
Member

@giovannipizzi giovannipizzi commented Mar 8, 2018

Daemonization was not done correctly. This fixes #1068

There were the following problems:

  • File units were attached with a PIPE and therefore if the parent was killed, the child was as well
  • [important] new process (celery) was not put in a different process group, so when the caller bash script was ending, a SIGHUP was sent, that was propagated to celery, that therefore was shutting down itself (by the way, this was triggering a weird internal error: celery was trying to restart, but internally this created a bug in billiard that indeed exited with an error code 70 that should never happen, see https://github.com/celery/billiard/blob/78a5b4592446466afe1020b49b01918cdeaeb9f0/billiard/common.py#L122 ). Indeed, the problem of Daemon not running in Docker with aiida 0.11.0 #1068 was intermittent without this, sometimes celery was able to recover and sometimes not, in a fully non-reproducible fashion

Also, this solves an old existing issue, that things printed to STDOUT or STDERR were swallowed and were disappearing from all logs (cause: the way celery and the logger were interfering and removing/replacing log handlers). By the way, this exposed the fact that there was an internal error with error code 70 as discussed before and helped me debug the problem.

For reference, the logs occurring at end of the top process (when the docker exec with the bash script to do the verdi setup, computer setup, verdi daemon start, ...) were:

Restarting celery worker (/home/aiida/.local/bin/celery worker --app tasks --loglevel INFO --beat --schedule /home/aiida/.aiida/daemon/celerybeat-schedule --pidfile /home/aiida/.aiida/daemon/log/celery.pid)
[2018-03-07 23:09:26,766: ERROR/MainProcess] Process 'Worker-5' pid:82 exited with 'exitcode 70'
[2018-03-07 23:09:26,766: ERROR/MainProcess] Process 'Worker-4' pid:81 exited with 'exitcode 70'
[2018-03-07 23:09:26,766: ERROR/MainProcess] Process 'Worker-3' pid:80 exited with 'exitcode 70'
[2018-03-07 23:09:26,766: ERROR/MainProcess] Process 'Worker-2' pid:79 exited with 'exitcode 70'
[2018-03-07 23:09:29,389: INFO/MainProcess] beat: Shutting down...

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thanks a lot!

@ltalirz ltalirz merged commit dfadd60 into aiidateam:release_v0.11.1 Mar 8, 2018
@giovannipizzi giovannipizzi deleted the fix_1068_daemon_in_docker branch April 26, 2018 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants