-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pmon daemon regression test] Correct the log messages with right daemon name and fix term_and_start_status failure #3922
Conversation
can you update your pr description and title to reflect the update? |
@sujinmkang could you please specify why is it needed to disable all features' autostart before running these tests ? |
@antonptashnik originally I don't want to make the PMON container restart during this test since this regression test of some daemons in the critical_process can make the restart the PMON container. I just want to focus on the daemon behavior when it has its running status change. Do you have any concern or suggestion? |
So you say some of these tests can cause PMON to restart and for the tests to notice this moment you disable PMON autorestart for it to stay stopped, right ? If so I understand why we disable autorestart for PMON but I suppose autorestart for other features can be left as is |
since we want to make sure that pmon and any other containers are not restarted during this test.
@@ -145,26 +113,27 @@ def test_pmon_ledd_term_and_start_status(check_daemon_status, duthosts, rand_one | |||
@summary: This test case is to check the ledd terminated and restarted status | |||
""" | |||
duthost = duthosts[rand_one_dut_hostname] | |||
|
|||
if "201811" in duthost.os_version or "201911" in duthost.os_version: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sujinmkang what is the difference in behavior for SIGTERM across different version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor master and 202012 daemon has the signal_handler implemented to return the exitcode (128 + signal_number) but previous version just return exit code 0 for sigterm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to skip this test for 201811 and 201911? If not, it would be better we handle them differently here or make the exitcode same across 201811, 201911. What about 202012?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor this change was based on the discussion during 202012 issue triage. If there would be any back port happening, then we will need another changes for this.
"Pcied expected terminated status is {} but is {}".format(expected_exited_status, daemon_status)) | ||
pytest_assert(daemon_pid == -1, | ||
"Pcied expected pid is -1 but is {}".format(daemon_pid)) | ||
logger.info("{} daemon got killed unexpectedly and it is in {} with pid {}".format(daemon_name, daemon_status, daemon_pid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you logging the info without checking the status if the daemon actually killed or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The daemon_status check is followed right after but this is just added for debugging purpose. I can remove this part.
@@ -180,16 +149,17 @@ def test_pmon_ledd_kill_and_start_status(check_daemon_status, duthosts, rand_one | |||
duthost.stop_pmon_daemon(daemon_name, SIG_KILL, pre_daemon_pid) | |||
|
|||
daemon_status, daemon_pid = duthost.get_pmon_daemon_status(daemon_name) | |||
logger.info("{} daemon got killed unexpectedly and it is {} with pid {}".format(daemon_name, daemon_status, daemon_pid)) | |||
logger.info("{} daemon got killed unexpectedly and it is in {} with pid {}".format(daemon_name, daemon_status, daemon_pid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the daemon_status before logging "daemon got killed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The daemon_status check is followed right after but this is just added for debugging purpose. I can remove this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better put a log "sending SIG_KILL to daemon {}" before calling stop_pmon_daemon() for debugging and remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name(test_pmon_ledd_kill_and_start_status) is already indicating this is sending SIG_KILL command.
@@ -154,26 +121,27 @@ def test_pmon_pcied_term_and_start_status(check_daemon_status, duthosts, rand_on | |||
@summary: This test case is to check the pcied terminated and restarted status | |||
""" | |||
duthost = duthosts[rand_one_dut_hostname] | |||
|
|||
if "201811" in duthost.os_version or "201911" in duthost.os_version: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIGTERM signal is standard signal and the behavior should be same for all POSIX compliant systems. Can you elaborate what different behavior are you seeing across different os version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor please see the previous comment.
…mon name and fix term_and_start_status failure (sonic-net#3922) What is the motivation for this PR? Needs to fix pmon daemon regression test issues How did you do it? Use daemon_name instead of the specific daemon name for log message Support for both old/new db key support for checking the db value. Skip term_and_start_status test cases for 201911 and 201811 since it behaves differently Fix the expected daemon status after sending sig_term based on latest daemon behavior for term_and_start_status Remove disable_and_enable_autorestart to make sure the daemon regression test doesn't affect any other containers. How did you verify/test it? Run test_ledd.py on arista 7050 platform which reports issue.
Description of PR
Summary:
Fixes #(sonic-net/sonic-buildimage#8239)
Type of change
Back port request
Approach
What is the motivation for this PR?
Needs to fix pmon daemon regression test issues
How did you do it?
daemon_name
instead of the specific daemon name for log messageHow did you verify/test it?
Run test_ledd.py on arista 7050 platform which reports issue.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation