Skip to content

Commit

Permalink
Improve orchagent watchdog UT to handle orchagent not pause case. (#1…
Browse files Browse the repository at this point in the history
…1136)

### Description of PR
Improve orchagent watchdog UT to handle orchagent not pause case.

Summary:
Improve orchagent watchdog UT to handle orchagent not pause case.

### Type of change

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [x] Test case(new/improvement)

## Approach
#### What is the motivation for this PR?
system_health/test_watchdog.py::test_orchagent_watchdog randomly failed on some device because orchagent does not paused by UT.
Still not found reason, add debug information also improve code.

#### How did you do it?
Improve get orchagent PID code, only get orchagent running by root user.
Add debug log when orchageent pause failed.

#### How did you verify/test it?
Pass all UT
  • Loading branch information
liuh-80 authored Jan 6, 2024
1 parent ba8303b commit 818f10b
Showing 1 changed file with 40 additions and 8 deletions.
48 changes: 40 additions & 8 deletions tests/system_health/test_watchdog.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,46 @@

@pytest.fixture
def pause_orchagent(duthost):
# find orchagent pid
pid = duthost.shell(
r"pgrep orchagent",
module_ignore_errors=True)['stdout']
logger.info('Get orchagent pid: {}'.format(pid))

# pause orchagent and clear syslog
duthost.shell(r"sudo kill -STOP {}".format(pid), module_ignore_errors=True)
pid = None
retry = 3
while True:
retry -= 1
# find orchagent pid: https://www.man7.org/linux/man-pages/man1/pidof.1.html
pid_result = duthost.shell(
r"pidof orchagent",
module_ignore_errors=True)

rc = pid_result['rc']
if rc == 1:
logger.info('Get orchagent pid failed: {}'.format(pid_result))

if retry <= 0:
# break UT because orchagent pause failed
pytest.fail("Can't pause Orchagent by pid.")
else:
continue

pid = pid_result['stdout']
logger.info('Get orchagent pid: {}'.format(pid))

# pause orchagent
duthost.shell(r"sudo kill -STOP {}".format(pid), module_ignore_errors=True)

# validate orchagent paused, the stat colum should be Tl:
# root 124 0.3 1.6 596616 63600 pts/0 Tl 02:33 0:06 /usr/bin/orchagent
result = check_process_status(duthost, "'Tl.*/usr/bin/orchagent''")
if result:
# continue UT when Orchagent paused
break
else:
# collect log for investigation not paused reason
duthost.shell(r"sudo ps -auxww", module_ignore_errors=True)
duthost.shell(r"sudo cat /var/log/syslog | grep orchagent", module_ignore_errors=True)

if retry <= 0:
# break UT because orchagent pause failed
pytest.fail("Can't pause Orchagent by pid.")

duthost.shell(r"sudo truncate -s 0 /var/log/syslog", module_ignore_errors=True)

yield
Expand Down

0 comments on commit 818f10b

Please sign in to comment.