Skip to content

Commit

Permalink
[Services] Restart database service upon unexpected critical process …
Browse files Browse the repository at this point in the history
…exit. (sonic-net#4138)

* [database] Implement the auto-restart feature for database container.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [database] Remove the duplicate dependency in service files. Since we
already have updategraph ---> config_setup ---> database, we do not need
explicitly add database.service in all other container service files.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [event listener] Reorganize the line 73 in event listener script.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [database] update the file sflow.service.j2 to remove the duplicate
dependency.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [event listener] Add comments in event listener.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [event listener] Update the comments in line 56.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

* [event listener] Add parentheses for if statement in line 76 in event listener.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
  • Loading branch information
yozhao101 authored and pphuchar committed Mar 9, 2020
1 parent 7924df6 commit 08f96f0
Show file tree
Hide file tree
Showing 6 changed files with 37 additions and 18 deletions.
2 changes: 2 additions & 0 deletions dockers/docker-database/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -36,5 +36,7 @@ COPY ["supervisord.conf.j2", "/usr/share/sonic/templates/"]
COPY ["docker-database-init.sh", "/usr/local/bin/"]
COPY ["ping_pong_db_insts", "/usr/local/bin/"]
COPY ["database_config.json", "/etc/default/sonic-db/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor"]

ENTRYPOINT ["/usr/local/bin/docker-database-init.sh"]
1 change: 1 addition & 0 deletions dockers/docker-database/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
redis
7 changes: 7 additions & 0 deletions dockers/docker-database/supervisord.conf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener --container-name database
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected


[program:rsyslogd]
command=/bin/bash -c "rm -f /var/run/rsyslogd.pid && /usr/sbin/rsyslogd -n"
priority=1
Expand Down
4 changes: 4 additions & 0 deletions files/build_templates/single_instance/database.service.j2
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,16 @@ Description=Database container
Requires=docker.service
After=docker.service
After=rc-local.service
StartLimitIntervalSec=1200
StartLimitBurst=3

[Service]
User=root
ExecStartPre=/usr/bin/{{docker_container_name}}.sh start
ExecStart=/usr/bin/{{docker_container_name}}.sh wait
ExecStop=/usr/bin/{{docker_container_name}}.sh stop
Restart=always
RestartSec=30

[Install]
WantedBy=multi-user.target
40 changes: 22 additions & 18 deletions files/scripts/supervisor-proc-exit-listener
Original file line number Diff line number Diff line change
Expand Up @@ -52,24 +52,28 @@ def main(argv):
processname = payload_headers['processname']
groupname = payload_headers['groupname']

config_db = swsssdk.ConfigDBConnector()
config_db.connect()
container_features_table = config_db.get_table(CONTAINER_FEATURE_TABLE_NAME)
if not container_features_table:
syslog.syslog(syslog.LOG_ERR, "Unable to retrieve container features table from Config DB. Exiting...")
sys.exit(2)

if not container_features_table.has_key(container_name):
syslog.syslog(syslog.LOG_ERR, "Unable to retrieve features for container '{}'. Exiting...".format(container_name))
sys.exit(3)

restart_feature = container_features_table[container_name].get('auto_restart')
if not restart_feature:
syslog.syslog(syslog.LOG_ERR, "Unable to determine auto-restart feature status for container '{}'. Exiting...".format(container_name))
sys.exit(4)

# If auto-restart feature is enabled and a critical process exited unexpectedly, terminate supervisor
if restart_feature == 'enabled' and expected == 0 and (processname in critical_processes or groupname in critical_processes):
# Read the status of auto-restart feature from Config_DB.
if container_name != 'database':
config_db = swsssdk.ConfigDBConnector()
config_db.connect()
container_features_table = config_db.get_table(CONTAINER_FEATURE_TABLE_NAME)
if not container_features_table:
syslog.syslog(syslog.LOG_ERR, "Unable to retrieve container features table from Config DB. Exiting...")
sys.exit(2)

if not container_features_table.has_key(container_name):
syslog.syslog(syslog.LOG_ERR, "Unable to retrieve features for container '{}'. Exiting...".format(container_name))
sys.exit(3)

restart_feature = container_features_table[container_name].get('auto_restart')
if not restart_feature:
syslog.syslog(syslog.LOG_ERR, "Unable to determine auto-restart feature status for container '{}'. Exiting...".format(container_name))
sys.exit(4)

# If container is database or auto-restart feature is enabled and at the same time
# a critical process exited unexpectedly, terminate supervisor
if ((container_name == 'database' or restart_feature == 'enabled') and expected == 0 and
(processname in critical_processes or groupname in critical_processes)):
MSG_FORMAT_STR = "Process {} exited unxepectedly. Terminating supervisor..."
msg = MSG_FORMAT_STR.format(payload_headers['processname'])
syslog.syslog(syslog.LOG_INFO, msg)
Expand Down
1 change: 1 addition & 0 deletions rules/docker-database.mk
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ $(DOCKER_DATABASE)_RUN_OPT += -v /etc/sonic:/etc/sonic:ro

$(DOCKER_DATABASE)_BASE_IMAGE_FILES += redis-cli:/usr/bin/redis-cli
$(DOCKER_DATABASE)_BASE_IMAGE_FILES += monit_database:/etc/monit/conf.d
$(DOCKER_DATABASE)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)

0 comments on commit 08f96f0

Please sign in to comment.