Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue that rsyslog-config service starts failed #20840

Closed
wants to merge 5 commits into from

Conversation

wumiaont
Copy link
Contributor

@wumiaont wumiaont commented Nov 18, 2024

Why I did it

rsyslog-config service is a one shot service. It's found that sometimes rsyslog-config service could fail after config reload. This issue is found in our OC testing bed on supervisor card which has fully loaded fabric cards. Test case is platform_tests/test_reload_config.py::test_reload_configuration_checks

assert wait_until(300, 20, 0, config_system_checks_passed, duthost, delayed_services)
E AssertionError

After config reload:
admin@ixre-cpm-chassis19:~$ sudo systemctl status rsyslog-config.service
× rsyslog-config.service - Update rsyslog configuration
Loaded: loaded (/lib/systemd/system/rsyslog-config.service; enabled-runtime; preset: enabled)
Active: failed (Result: exit-code) since Thu 2024-11-14 17:06:12 UTC; 9min ago
Main PID: 11123 (code=exited, status=1/FAILURE)

Nov 14 17:06:11 ixre-cpm-chassis19 systemd[1]: Starting rsyslog-config.service - Update rsyslog configuration...
Nov 14 17:06:12 ixre-cpm-chassis19 systemctl[11277]: Job for rsyslog.service failed because the control process exited with error code.
Nov 14 17:06:12 ixre-cpm-chassis19 systemctl[11277]: See "systemctl status rsyslog.service" and "journalctl -xeu rsyslog.service" for details.
Nov 14 17:06:12 ixre-cpm-chassis19 systemd[1]: rsyslog-config.service: Main process exited, code=exited, status=1/FAILURE
Nov 14 17:06:12 ixre-cpm-chassis19 systemd[1]: rsyslog-config.service: Failed with result 'exit-code'.
Nov 14 17:06:12 ixre-cpm-chassis19 systemd[1]: Failed to start rsyslog-config.service - Update rsyslog configuration.

admin@ixre-cpm-chassis19:~$ sudo systemctl list-units --state=failed
UNIT LOAD ACTIVE SUB DESCRIPTION
? rsyslog-config.service loaded failed failed Update rsyslog configuration

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Add "Restart=on-failure" to rsyslog-config.service so rsyslog-config service can restart if the service ever fails to start.

How to verify it

With the fix issue was not seen anymore on the same setup where we observed the issues.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202405

Tested branch (Please provide the tested image version)

202405

@wumiaont wumiaont requested a review from lguohan as a code owner November 18, 2024 18:55
@wumiaont
Copy link
Contributor Author

After config reload, the failure of rsyslog-config service is because of rsyslog failed. There are issues of #20775 and #20544 and #20521 reported already about eventd failed to start because of rsyslog start failed.

Solution could be:

  1. As this PR fix to make rsyslog-config service retries on failure. With the fix in this PR we are not seeing rsyslog-config service fail anymore by checking "sudo systemctl list-units --state=failed".

  2. Resolve why rsyslog failed on config reload or reboot. If rsyslog never fails this rsyslog-config will not fail.

@wumiaont wumiaont closed this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants