Skip to content

Commit

Permalink
Improve container checker for gnmi/telemetry container (#18529)
Browse files Browse the repository at this point in the history
### Why I did it
We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade.
container_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container.
It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py.

### How I did it
I modify container_checker script:
If there's docker-sonic-telemetry image, check telemetry container.
If there's no docker-sonic-telemetry image, check gnmi container instead.
If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry.

#### How to verify it
Run end to end test with cold-reboot and warm-reboot.
  • Loading branch information
ganglyu authored and mssonicbld committed Apr 9, 2024
1 parent 68a0991 commit 5d35e6b
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions files/image_config/monit/container_checker
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,19 @@ from swsscommon import swsscommon
EVENTS_PUBLISHER_SOURCE = "sonic-events-host"
EVENTS_PUBLISHER_TAG = "event-down-ctr"

def check_docker_image(image_name):
"""
@summary: This function will check if docker image exists.
@return: True if the image exists, otherwise False.
"""
try:
DOCKER_CLIENT = docker.DockerClient(base_url='unix://var/run/docker.sock')
DOCKER_CLIENT.images.get(image_name)
return True
except (docker.errors.ImageNotFound, docker.errors.APIError) as err:
print("Failed to get image '{}'. Error: '{}'".format(image_name, err))
return False

def get_expected_running_containers():
"""
@summary: This function will get the expected running & always-enabled containers by following the rule:
Expand Down Expand Up @@ -55,7 +68,24 @@ def get_expected_running_containers():
# it will be removed from exception list.
run_all_instance_list = ['database', 'bgp']

container_list = []
for container_name in feature_table.keys():
# slim image does not have telemetry container and corresponding docker image
if container_name == "telemetry":
ret = check_docker_image("docker-sonic-telemetry")
if not ret:
# If telemetry container image is not present, check gnmi container image
# If gnmi container image is not present, ignore telemetry container check
# if gnmi container image is present, check gnmi container instead of telemetry
ret = check_docker_image("docker-sonic-gnmi")
if not ret:
print("Ignoring telemetry container check on image which has no corresponding docker image")
else:
container_list.append("gnmi")
continue
container_list.append(container_name)

for container_name in container_list:
if feature_table[container_name]["state"] not in ["disabled", "always_disabled"]:
if multi_asic.is_multi_asic():
if feature_table[container_name].get("has_global_scope", "True") == "True":
Expand Down

0 comments on commit 5d35e6b

Please sign in to comment.