Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rsyslogd in teamd container start failed with empty rsyslog.conf caused by sonic-cfggen generate conf failed #13674

Open
lerry-lee opened this issue Feb 6, 2023 · 7 comments
Labels
Triaged this issue has been triaged

Comments

@lerry-lee
Copy link
Contributor

lerry-lee commented Feb 6, 2023

Description

In t0-sonic and multi-asic KVM-test, sometimes test_pretest.py::test_disable_rsyslog_rate_limit will fail. Error message like below:

test_pretest.py::test_disable_rsyslog_rate_limit[vlab-02] 
-------------------------------- live log call ---------------------------------
06:18:51 __init__.pytest_runtest_call             L0040 ERROR  | Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 1464, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/usr/local/lib/python2.7/dist-packages/pluggy/hooks.py", line 286, in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 93, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pluggy/manager.py", line 87, in <lambda>
    firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
  File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 208, in _multicall
    return outcome.get_result()
  File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 81, in get_result
    _reraise(*ex)  # noqa
  File "/usr/local/lib/python2.7/dist-packages/pluggy/callers.py", line 187, in _multicall
    res = hook_impl.function(*args)
  File "/usr/local/lib/python2.7/dist-packages/_pytest/python.py", line 174, in pytest_pyfunc_call
    testfunction(**testargs)
  File "/var/src/sonic-mgmt/tests/test_pretest.py", line 172, in test_disable_rsyslog_rate_limit
    duthost.modify_syslog_rate_limit(feature_name, rl_option='disable')
  File "/var/src/sonic-mgmt/tests/common/devices/multi_asic.py", line 476, in modify_syslog_rate_limit
    self.sonichost.shell_cmds(cmds=cmds)
  File "/var/src/sonic-mgmt/tests/common/devices/base.py", line 90, in _run
    raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res)
RunAnsibleModuleFail: run module shell_cmds failed, Ansible Results =>
{"changed": false, "cmds": ["docker exec -i teamd sed -i 's/^\\$SystemLogRateLimit/#\\$SystemLogRateLimit/g' /etc/rsyslog.conf", "docker exec -i teamd supervisorctl restart rsyslogd"], "delta": "0:00:02.559184", "end": "2023-02-02 06:18:50.970187", "failed": true, "msg": "At least running one of the commands failed", "results": [{"cmd": "docker exec -i teamd sed -i 's/^\\$SystemLogRateLimit/#\\$SystemLogRateLimit/g' /etc/rsyslog.conf", "rc": 0, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, {"cmd": "docker exec -i teamd supervisorctl restart rsyslogd", "rc": 7, "stderr": "", "stderr_lines": [], "stdout": "rsyslogd: ERROR (not running)\nrsyslogd: ERROR (spawn error)\n", "stdout_lines": ["rsyslogd: ERROR (not running)", "rsyslogd: ERROR (spawn error)"]}], "start": "2023-02-02 06:18:48.411003"}

The reason is that rsyslogd process can not enter running state because its' rsyslog.conf is empty. Further, we found that, in teamd.sh, when start teamd, it will update rsyslog.conf use sonic-cfggen, related code is below:

sonic-cfggen -d -t /usr/share/sonic/templates/rsyslog-container.conf.j2 -a "{\"target_ip\": \"$TARGET_IP\", \"container_name\": \"$CONTAINER_NAME\" }"  > $TMP_FILE
...
docker cp $TMP_FILE ${DOCKERNAME}:/etc/rsyslog.conf

And if this command fails, it will generate an empty file and copy it to teamd container.
Currently, it uses -d parameter which means read from configDB. We see the sonic-cfggen failed reason is:
image

But in the aspect of the service dependency control, we found that teamd will start after database. So, the connection to Redis meets some problems.

Steps to reproduce the issue:

Describe the results you received:

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@lerry-lee
Copy link
Contributor Author

PR #13669 is a hotfix to unblock some PR tests.

@stephenxs
Copy link
Collaborator

Hi @lerry-lee
Did you see the traceback from the syslog when the issue occurred?
I observed a similar issue #13791 which I also suspect sonic-cfggen aborted somehow but I didn't see the traceback from the log.

@lerry-lee
Copy link
Contributor Author

lerry-lee commented Feb 14, 2023

Hi @lerry-lee Did you see the traceback from the syslog when the issue occurred? I observed a similar issue #13791 which I also suspect sonic-cfggen aborted somehow but I didn't see the traceback from the log.
@stephenxs
After issue occurred, I checked the syslog and saw the error msg like the screenshot in the top description. I saw your observed issue also related to sonic-cfggen -d, I think it might be some related cause, because -d requires Redis, so I think some problems occurred when getting connection to Redis.

@stephenxs
Copy link
Collaborator

Hi @lerry-lee Did you see the traceback from the syslog when the issue occurred? I observed a similar issue #13791 which I also suspect sonic-cfggen aborted somehow but I didn't see the traceback from the log.
@stephenxs
After issue occurred, I checked the syslog and saw the error msg like the screenshot in the top description. I saw your observed issue also related to sonic-cfggen -d, I think it might be some related cause, because -d requires Redis, so I think some problems occurred when getting connection to Redis.

Thanks.
I also suspect they share the same cause. The only difference is that I didn't see the traceback from the dumps.
I'm going to try using -a to avoid redis db access

@arlakshm
Copy link
Contributor

can we run the tcpdump on lo interface when the team starts to check the packet dump on why the connection to redis is failing.

@arlakshm
Copy link
Contributor

please update the description with the testcase where this failure is seen.

@arlakshm arlakshm added the Triaged this issue has been triaged label Feb 15, 2023
@lerry-lee
Copy link
Contributor Author

please update the description with the testcase where this failure is seen.

Refined.

@lerry-lee lerry-lee changed the title Rsyslogd start failed cause by rsyslog.conf is emtpy in teamd container Rsyslogd in teamd container start failed with empty rsyslog.conf caused by sonic-cfggen generate conf failed Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

3 participants