You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On T2 / multi-asic chassis, it can be seen when coming out of reboot, that sometimes eventd fails to start, leading to a degraded system according to systemd. The reason for this failure is attributed to rsyslogd failing to initially start, which cascades into eventd failing to start. rsyslogd eventually restarts and comes up fine, but eventd does not have auto-restart configured so it stays down. Manually restarting the service recovers the system.
rsyslogd is failing to initially start due to "network unreachable", as it's likely coming up before the docker service is ready / racing with docker (for multi-asic chassis, rsyslogd attaches to docker0 interface to pull the logs). Once rsyslogd is auto-restarted (which happens after the docker interface / service is up), it starts up fine.
See below for journalctl output for rsyslogd:
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd/udp: socket 11: sendto() error: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd: socket 11: error 101 sending via udp: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: action 'action-19-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2302.0 try https://www.rsyslog.com/e/2007 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: action 'action-19-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2302.0 try https://www.rsyslog.com/e/2359 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd/udp: socket 11: sendto() error: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: omfwd: socket 11: error 101 sending via udp: Network is unreachable [v8.2302.0 try https://www.rsyslog.com/e/2354 ]
Nov 12 06:14:38 str2-7250-lc2-1 rsyslogd[988]: action 'action-19-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2302.0 try https://www.rsyslog.com/e/2007 ]
Steps to reproduce the issue:
Reboot a t2 / multiasic chassis
run systemctl and see that eventd is down
note: can be semi-reliably reproduced with platform_tests/test_reload_config::test_reload_configuration_checks test, although is flaky as its a race condition
Describe the results you received:
eventd fails to start due to rsyslogd failing on initial start
Describe the results you expected:
eventd either autorestarts on failure, or services are sequenced such that rsyslogd starts on first invocation (dependencies can be tricky here as we dont want to drop logs on boot)
Output of show version:
On nokia 7250 chassis
SONiC Software Version: SONiC.internal-202405.107587339-62ab6b6719 SONiC OS Version: 12 Distribution: Debian 12.6 Kernel: 6.1.0-22-2-amd64 Build commit: 62ab6b6719
/ 202405
Output of show techsupport:
(paste your output here or download and attach the file here )
Additional information you deem important (e.g. issue happens only occasionally):
This is probably related to issues #20544 and #20521
The text was updated successfully, but these errors were encountered:
Hi @wumiaont, what is the status of this? I see you have closed your PR without merging - is there another fix in the works? Thanks.
We believe ##20248 is the culprit to cause the issue and #20947 reverted the changes of 20248. We did not see this issue after #20947. In that case my fix is not needed.
Description
On T2 / multi-asic chassis, it can be seen when coming out of reboot, that sometimes eventd fails to start, leading to a degraded system according to systemd. The reason for this failure is attributed to rsyslogd failing to initially start, which cascades into eventd failing to start. rsyslogd eventually restarts and comes up fine, but eventd does not have auto-restart configured so it stays down. Manually restarting the service recovers the system.
rsyslogd is failing to initially start due to "network unreachable", as it's likely coming up before the docker service is ready / racing with docker (for multi-asic chassis, rsyslogd attaches to docker0 interface to pull the logs). Once rsyslogd is auto-restarted (which happens after the docker interface / service is up), it starts up fine.
See below for journalctl output for rsyslogd:
Steps to reproduce the issue:
platform_tests/test_reload_config::test_reload_configuration_checks
test, although is flaky as its a race conditionDescribe the results you received:
eventd fails to start due to rsyslogd failing on initial start
Describe the results you expected:
eventd either autorestarts on failure, or services are sequenced such that rsyslogd starts on first invocation (dependencies can be tricky here as we dont want to drop logs on boot)
Output of
show version
:On nokia 7250 chassis
/ 202405
Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
This is probably related to issues #20544 and #20521
The text was updated successfully, but these errors were encountered: