Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[process.py] Handle scenario where system clock rolls backward #1047

Merged
merged 5 commits into from
Mar 13, 2019

Conversation

jleveque
Copy link
Contributor

  • Resolves State transition can take far too long if system clock is set to an earlier time #1043

  • If the system clock is set to an earlier time after a process is started but before it has entered the RUNNING state, supervisorctl start <process_name> can hang for a very long time waiting for the current system time to become greater than self.laststart + self.config.startsecs, depending on how far backward the system time has moved.

  • This pull request resolves the issue above and also attempts to address a couple other potential issues in process.py with system time moving backward.

  • Also added related unit tests

@mnaberez
Copy link
Member

mnaberez commented Feb 7, 2018

Thanks for looking into this issue and for your work on this pull request (with tests!). I definitely think we need to fix this issue. A concern that I have reading this diff is the way these checks are inlined. There is now as much code for checking time rollback as there is actual state transition logic. I think this makes the transition logic more difficult to follow. Can we separate these concerns somehow, e.g. can we do something like check for rollback once at the top of transition and then call a separate method to fix up all the timestamps if rollback is detected?

@jleveque jleveque force-pushed the system_clock_rollback_fix branch from 39bab51 to 8b0d1a3 Compare February 13, 2018 00:29
@jleveque jleveque force-pushed the system_clock_rollback_fix branch from 8b0d1a3 to 75e9804 Compare February 13, 2018 00:30
@jleveque
Copy link
Contributor Author

jleveque commented Mar 6, 2018

I pushed some changes a few weeks ago in an attempt to address your comments above. Please review the latest changes when you have a moment.

@jleveque
Copy link
Contributor Author

Update: It appears I missed at least one case where system clock rollback can still cause issues. With this patch, the issue is encountered far less frequently, but it appears it is not complete.

@mnaberez: What are your thoughts on using monotonic time throughout supervisor? I think this would prevent the need for checking for system clock rollback and subsequent adjustments.

@mnaberez
Copy link
Member

@mnaberez: What are your thoughts on using monotonic time throughout supervisor? I think this would prevent the need for checking for system clock rollback and subsequent adjustments.

I think it's a good idea but it's not available on all versions of Python that Supervisor supports. There was a PR submitted that added monotonic time for old Python versions using ctypes hacking. I didn't think that was a good idea for production but I suggested we could use monotonic time if it was available (#468 (comment)). I had solicited feedback on that at that time but didn't get any.

@nikos-github
Copy link

@mnaberez If you believe it's a must to keep supporting Python versions older than 3 when 3 has been around for 10 years now, then no argument there (I personally would drop the support but my take on this has no bearing in that decision).

It's important however to bring stability to supervisord in my opinion hence I would recommend moving ahead with the suggestion you made in #468 (comment). Is this something acceptable and can be done relatively quickly?

@mnaberez
Copy link
Member

If you believe it's a must to keep supporting Python versions older than 3 when 3 has been around for 10 years now,

Supervisor is over 14 years old and for most of that time, it ran only on Python 2. Consequently, it still has a large installed base of Python 2 users today. We've made significant efforts to support both 2 and 3 due to our installed base, and we'll continue to support them so they receive bug fixes until they can transition to 3.

It's important however to bring stability to supervisord in my opinion hence I would recommend moving ahead with the suggestion you made in #468 (comment).

This patch (#1047) or something like it should probably done to fix the issue on installations where monotonic time is not available. It should be fine to also add something like #468 (comment). When monotonic time is available, the checks here would just not be triggered.

@jleveque
Copy link
Contributor Author

@mnaberez: Have you looked into the package I linked above: https://pypi.org/project/monotonic/? I do not have experience with it, but apparently it is an implementation of monotonic time for both Python 2 and 3:

On Python 3.3 or newer, monotonic will be an alias of time.monotonic from the standard library. On older versions, it will fall back to an equivalent implementation:

Linux, BSD, AIX: clock_gettime(3)
Windows: GetTickCount or GetTickCount64
OS X: mach_absolute_time

@mnaberez
Copy link
Member

mnaberez commented Dec 3, 2018

Have you looked into the package I linked above

I did. I looked at the source and it uses the same ctypes hacking as the PR that I mentioned. Supervisor runs on a large number of production servers so I'm hesitant to do that sort of thing. It might be fine but I don't really have a way to know. I'd rather be conservative for production use. I'm more inclined do something like this patch for Python 2 and use the built-in monotonic time on Python 3.

@jleveque
Copy link
Contributor Author

jleveque commented Mar 1, 2019

Added a patch (and related test) to prevent a crash when a process has already entered the RUNNING state but then clock rolled back to be less than startsecs. The crash would present itself with log messages similar to the following:

Traceback (most recent call last):
File "/usr/bin/supervisord", line 9, in <module>
load_entry_point('supervisor==3.3.3', 'console_scripts', 'supervisord')()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 363, in main
go(options)
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 373, in go
d.main()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 84, in main
self.run()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 101, in run
self.runforever()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 253, in runforever
self.reap()
File "/usr/lib/python2.7/dist-packages/supervisor/supervisord.py", line 289, in reap
process.finish(pid, sts)
File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 527, in finish
self._assertInState(ProcessStates.STARTING)
File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 179, in _assertInState
self.config.name, current_state, allowable_states))
AssertionError: Assertion failed for start.sh: RUNNING not in STARTING

@mnaberez mnaberez merged commit c04d3d5 into Supervisor:master Mar 13, 2019
@jleveque jleveque deleted the system_clock_rollback_fix branch March 18, 2019 22:26
saiarcot895 added a commit to saiarcot895/sonic-buildimage that referenced this pull request Oct 14, 2024
…tart before ntp-config service (sonic-net#2335)"

Supervisor now handles the case where the time jumps back, now that
Supervisor/supervisor#1047 has been merged.

This reverts commit 298d2ad.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

State transition can take far too long if system clock is set to an earlier time
3 participants