[bug fix][test_container_checker] change config of monit to stablize the test #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Because the Monit sampling interval is too long (60s), and the syncd container restart time is rather short (sometimes it just needs about 30s), and the alert message rule is too strict, so sometimes Monit can not monitoring syncd down for 2 times for 2 mins and there are no syncd alert messages in syslog. By changing the relevant config of Monit, we can stabilize the test.
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
Stabilize test_container_checker by changing some config of Monit.
How did you do it?
Changing the sampling intervals to 10 in /etc/monit/monitrc ensures that the Monit can monitor syncd container down.
Changing the start delay to 10 in /etc/monit/monitrc ensures that the Monit start quicker than syncd start.
Changing the rule of alerting messages in /etc/monit/conf.d/sonic-host makes it is easy to send alert messages.
How did you verify/test it?
run test:
py.test container_checker/test_container_checker.py --inventory "../ansible/inventory, ../ansible/veos" --host-pattern arc-switch1025 --module-path ../ansible/library/ --testbed arc-switch1025-t0 --testbed_file ../ansible/testbed.csv --allow_recover
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation