Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytest] Test the feature of monitoring critical processes by Supervisord. #2955

Merged
merged 25 commits into from
Feb 27, 2021

Conversation

yozhao101
Copy link
Contributor

@yozhao101 yozhao101 commented Feb 8, 2021

Signed-off-by: Yong Zhao yozhao@microsoft.com

Description of PR

Summary:
This PR aims to test the feature of monitoring the critical processes by Supervisord.

Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • [ x] Test case(new/improvement)

Approach

What is the motivation for this PR?

The motivation for this PR is to test the feature of monitoring the critical processes by Supervisord. This monitoring feature required Supervisord should write an alerting message into syslog if a critical process exited unexpectedly and the autorestart of container hosting this process was disabled.

How did you do it?

The logic of this PR is as follows:
Step 1: Records the autorestart status of all containers.
Step 2: Gets the number of ASICs on DUT.
Step 3: Gets all the containers which were enabled on DUT and at the same time for each enabled container, retrieve a list of namespaces which this container should be running.
Step 4: Disables the autorestart of all enabled containers.
Step 5: Manually stops all the critical process in all enabled containers on DUT.
Step 6: Waits for 70 seconds for the alerting messages to appear.
Step 7: Checks whether the expected alerting messages appeared in syslog or not.
Step 8: Executes the config_reload(...) to restart containers.
Step 9: Checks whether all critical processes in enabled containers are running.
Step 10: Restores the autorestart status of all containers.
Step 11: Does the post-check.

How did you verify/test it?

I tested this PR on a physical testbed with single ASIC.

Any platform specific information?

N/A

Supported testbed topology if it's a new test case?

N/A

Documentation

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 changed the title [pytest] Test the feature of monitoring critical process by Supervisord. [pytest] Test the feature of monitoring critical processes by Supervisord. Feb 8, 2021
@lgtm-com
Copy link

lgtm-com bot commented Feb 8, 2021

This pull request introduces 1 alert when merging 9af1b25 into 2b00e37 - view on LGTM.com

new alerts:

  • 1 for Syntax error

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 marked this pull request as ready for review February 9, 2021 23:14
@yozhao101 yozhao101 requested a review from a team as a code owner February 9, 2021 23:14
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
"""
logger.info("Checking the alerting messages from syslog...")
command_output = duthost.shell("sudo cat /var/log/syslog | grep '.*ERR.*supervisor-proc-exit-listener'",
module_ignore_errors=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use log analyzer to achieve this?

Copy link
Contributor Author

@yozhao101 yozhao101 Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix. Another pytest PR has the same issue and will also fix.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
1.Use `num_asics()` to get number of asics.
2.Use `sonic-db-cli CONFIG_DB keys 'FEATURE|*' to get the container
list.
3.Move two functions to `tests/common/helpers/dut_utils.py`.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 18, 2021

This pull request introduces 1 alert when merging 418ebed into e893f73 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

the DUTs installed with 201911 or old image version.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 18, 2021

This pull request introduces 1 alert when merging f0c549c into 856c106 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 19, 2021

This pull request introduces 1 alert when merging b0ae37a into cb81ec8 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 19, 2021

This pull request introduces 1 alert when merging 9cfca51 into a1a713e - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 19, 2021

This pull request introduces 1 alert when merging f0af91f into a1a713e - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 24, 2021

This pull request introduces 1 alert when merging 2bf80f0 into 795d794 - view on LGTM.com

new alerts:

  • 1 for Variable defined multiple times

@lguohan
Copy link
Contributor

lguohan commented Feb 25, 2021

check the lgtm alerts?

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
return is_enabled, has_global_scope, has_per_asic_scope


def parse_feature_table(duthost, num_asics, skip_containers):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function parse_feature_table(...) was used to get running containers list and which namespaces currently they are running. Currently this function was already removed and replaced by the function get_containers_namespace_ids(...) in which I used the get_feature_status(...) (https://github.com/Azure/sonic-mgmt/blob/67d5be22c740636c132047340d41fce163b9cee1/tests/common/devices.py#L972) to get the running container list.

tests/kvmtest.sh Outdated
@@ -154,7 +155,8 @@ test_t1_lag() {
lldp/test_lldp.py \
route/test_default_route.py \
platform_tests/test_cpu_memory_usage.py \
bgp/test_bgpmon.py"
bgp/test_bgpmon.py \
monitoring_processes/test_monitoring_critical_processes.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming the directory to process_monitoring and the file to test_critical_process_monitoring.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
critical process monitoring in kvmtest.sh.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 merged commit eccce7d into sonic-net:master Feb 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants