Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[supervisor] Add patch to prevent 'supervisorctl start' command from hanging if system time has rolled backward #1311

Merged
merged 2 commits into from
Jan 18, 2018
Merged

[supervisor] Add patch to prevent 'supervisorctl start' command from hanging if system time has rolled backward #1311

merged 2 commits into from
Jan 18, 2018

Conversation

jleveque
Copy link
Contributor

@jleveque jleveque commented Jan 16, 2018

If the system time rolls backwards after supervisorctl start <process_name> has been called but while the process is still in the STARTING state (i.e., it has not yet entered the RUNNING state), then, depending on how far backward the system time has rolled, the supervisorctl start <process_name> command can hang a very long time waiting for the system time to reach self.laststart + startsecs.

This patch creates a temporary workaround to mitigate this issue by resetting self.laststart to the current system time if it is ever determined that the system time has rolled backward.

I have opened an issue on the Supervisor GitHub repo (Supervisor/supervisor#1043). Once the Supervisor folks merge an official solution, I will remove this patch and pull in the latest upstream changes.

@jleveque jleveque self-assigned this Jan 16, 2018
@jleveque jleveque requested review from lguohan and stcheng January 16, 2018 21:04

+ # If system clock has moved backward, reset self.laststart to current system time
+ if now < self.laststart:
+ self.laststart = now;
Copy link
Collaborator

@lguohan lguohan Jan 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this pariticular fix. however, I think the time travel can cause other problems in the code.

i checked the supervisord code, it looks like there are other places that save the current time to a value and use it later, like self.delay, it could also have the same problem in the back-off process.

if you are in the backoff state, and the system clock moved backward, then you need wait for a long time to catch up and the process won't restart for a long time.

another example is last_dispatch.

this does not sound like a trivial problem. I also see there are some unittest added. I think we need to fix all the problems I see above and add unit test code and add unit test code.

Copy link
Collaborator

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a more comprehensive fix and unit test code

@jleveque jleveque merged commit 0fa64cc into sonic-net:master Jan 18, 2018
@jleveque jleveque deleted the supervisor_time_rollback_workaround branch January 18, 2018 19:44
zhenggen-xu added a commit to zhenggen-xu/sonic-buildimage that referenced this pull request Oct 17, 2019
* github:
  [minigraph]: Set hostname in all default minigraphs to 'sonic' (sonic-net#1333)
  Install sonic-platform-common package in platform-monitor docker for ledd (sonic-net#1330)
  Prevent supervisor from restarting configdb-load.sh (sonic-net#1324)
  [scripts]: Fix issues with checking status of the DB. Use one approach everywhere. (sonic-net#1323)
  [Arista7260cx3] Add platform specific reboot tool (sonic-net#1318)
  Install azure cli into docker-sonic-mgmt (sonic-net#1322)
  [sonic-py-swsssdk]: Update submodule pointer (sonic-net#1319)
  [supervisor] Add patch to prevent 'supervisorctl start' command from hanging if system time has rolled backward (sonic-net#1311)
  Move platform-specific hardware plugin base packages to sonic-platform-common submodule (sonic-net#1301)
  [baseimage]: Add missing dependency of igb & ixgbe (sonic-net#1316)
  [snmpagent]: Update sonic-snmpagent submodule (sonic-net#1314)
  Run docker containers with /tmp and /var/tmp mounted to tmpfs (sonic-net#1313)
  [Broadcom]: Update Boradcom SAI package to 3.0.3.3-3 (sonic-net#1312)
  [submodule]: Update sairedis (sonic-net#1310)
  [snmpagent]: Update sonic-snmpagent submodule (sonic-net#1308)
  [baseimage]: add mkfs.ext3 and fsck.ext3 in initrd to support ext3 partition (sonic-net#1306)
  [submodule]: update sonic-sairedis to enable syncd-rpc (sonic-net#1304)
  [device]: Fix Mellanox sku check (sonic-net#1303)
  Add support for Accton AS7712-32X platform (sonic-net#1299)
  [build]: build libsaithrift-dev and docker-ptf-[platform] (sonic-net#1300)
  [libsaithrift-dev]: Enable building libsaithrift-dev and pythonthrift libraries (sonic-net#1296)
  [Platform] Update switch configuration files and download link for Ingrasys S9130-32X/S9230-64X (sonic-net#1295)
  [Delta]: Add psuutil support for ag9032v1 (sonic-net#1298)
  Revert "[Dell S6100, Z9100] psusutil sysfs attribute changes for hwmon (sonic-net#1264)" (sonic-net#1297)
  [Dell S6100, Z9100] psusutil sysfs attribute changes for hwmon (sonic-net#1264)
  [Platform]As7712-32x update for sensors test (sonic-net#1292)
  Revert "[DHCP relay]: Add patch to always undef VLAN_TCI_PRESENT so as not to treat VLAN-tagged packets differently (sonic-net#1254)" (sonic-net#1291)
  [[submodule]: Update swss-common (sonic-net#1289)
  [baseimage]: Install sysfsutils package into SONiC host system (sonic-net#1290)
  Add caclmgrd and related files to translate and install control plane ACL rules (sonic-net#1240)
  [mellanox]: Update Mellanox buffers configuration (sonic-net#1263)
  [platform]: chmod 0644 for *.mk files (sonic-net#1284)
  [arista]: Update Arista platform modules and mount libraries to snmp docker (sonic-net#1283)
  [platform]: chmod a+x for debian/rules for platform-modules-delta (sonic-net#1282)
  Let debootstrap uses the same sources link as apt (sonic-net#1279)
  [doc]: update sonic-buildimage clone instructions (sonic-net#1278)
  [image]: Explicitly specify kernel_version as string (sonic-net#1280)
  Disable autosuspend for USB devices, preventing usb drives to be stopped and then renamed (sonic-net#1275)
  [platform]: As7712 32x add fancontrol (sonic-net#1270)
  [Platform] Add psuutil support for Ingrasys S9130-32X (sonic-net#1273)
  [submodules]: Update swss and utilitiles modules (sonic-net#1276)
  [Platform] Add psuutil and update submodule for Ingrasys S9100-32X, S8810-32Q, S9200-64X on master branch (sonic-net#1271)
  [centec]: support sai1.0 (sonic-net#1268)
  [build]: add build badge for nephos platform (sonic-net#1267)
  [build]: allow to use http(s) proxy in the build (sonic-net#1265)
  [Accton AS7816-64X] Add new platform and device for AS7816-64X. (sonic-net#1260)
  [Platform] Add Ingrasys S9130-32X and S9230-64X with Nephos Switch ASIC (sonic-net#1245)
  Add 'make reset' target with warning prompt to reset git repo and submodules (sonic-net#1258)
  [sudoers] Add 'docker ps' to READ_ONLY_CMDS (sonic-net#1259)
  Add set/get lpmode and mode_rst feature for qsfp (sonic-net#1261)
  [build] allow user to override the default number of build jobs (sonic-net#1255)
  [build] make second Accton Debian package extra package of the first one (sonic-net#1257)
  [arista] Delete sysfs entries for all Arista Digital Power Monitor/Management devices (sonic-net#1256)
  [DHCP relay]: Add patch to always undef VLAN_TCI_PRESENT so as not to treat VLAN-tagged packets differently (sonic-net#1254)
  [snmp]: Save S/N in state DB prior to starting service (sonic-net#1246)
  [device/accton] Correct exception function name (sonic-net#1249)
  [DHCP relay]: Fix circuit ID and remote ID bugs (sonic-net#1248)
  [sonic-py-swsssdk]: Update submodule pointer (sonic-net#1253)
  [swss]: update swss submodule (sonic-net#1244)
  [broadcom]: update sai to 3.0.3.3-1 (sonic-net#1243)
mssonicbld added a commit that referenced this pull request Oct 25, 2023
…tically (#16979)

#### Why I did it
src/sonic-sairedis
```
* eaa2bda - (HEAD -> master, origin/master, origin/HEAD) Update SAI submodule to latest (#1311) (12 hours ago) [Kamil Cudnik]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants