-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sonic-cfggen is consuming a lot of CPU during switch startup #4553
Comments
@tahmed-dev has been working to decrease CPU load at boot time due to sonic-cfggen. Reassigning this issue. |
@tahmed-dev , are all fixes for this in 201911 branch and ready for verification by @stepanblyschak ? Thanks. |
@rlhui Yes! all low hanging fruit fixes went into master. Do we have plans to port those fixes to 201911 branch? |
@tahmed-dev, I believe some PRs are in 201911 branch already. are you saying this is still a issue for 201911? |
@rlhui, I'll defer it to @stepanblyschak to answer if he still see this issue on 201911. Here are the remianing PRs: /pull/5250, /pull/5203, /pull/5200, /pull/5178, /pull/5176, /pull/5175, /pull/5174, /pull/5166, /pull/4937, and this commit |
Following tests on 201911_T0 by @stepanblyschak we definitely see major improvements comparing to the time the issue was raised. All the above PRs were cherry picked to 201911_T0 branch but not yet in 201911. |
@liat-grozovik, great, good to know. 201911 branch needs to be in a bit tighter control to accept critical bug fixes only at this moment. We can assess this late this week/early next week. Thanks. |
Description
During switch bootup sonic-cfggen is called over 100 of times from different places from different SONiC containers. It consumes a lot of CPU mainly because of jinja2 and natsort python packages which which compile a lot of regular expressions on import time. It makes containers to start very slow and has impact on cold/fast/warm boot timings.
Steps to reproduce the issue:
No specific steps, just perform any kind of reload/reboot and start some profiling tool (bootchart, perf&flamegraphs)
Describe the results you received:
sonic-cfggen is a very CPU intensive utility, however it is used everywhere, causing slow start.
Fast boot suffers because platform SDK may not be able to perfrom switch init and reconfiguration fast enough if other CPU intensive tasks are running in parallel.
Fast/Warm boot suffers because switch control plane downtime is increased.
Describe the results you expected:
sonic-cfggen should be optimized. More templates to be generated will delay other tasks in the system.
Additional information you deem important (e.g. issue happens only occasionally):
This is very platform specific, depending on platform CPU you may have different results.
Output of
show version
:The version is debug version compiled with SONIC_PROFILING_ON=y and '-fno-omit-frame-pointer':
Attached is system perf recording and generated flamegraph during bootup. Perf was started at /etc/rc.local phase with command:
perf_4.9 record -F 99 -a -g -o /home/admin/perf -- sleep 100 &
system-perf.svg.gz
We can see a lot of sonic-cfggen samples collected, more than any critical SONiC component, like SDK, syncd, orchagent or redis-server.
Bootchart plot (https://elinux.org/Bootchart)
We can see sonic-cfggen executions during SDK start and configuration.
Attach debug file
sudo generate_dump
:sonic_dump.tar.gz
The text was updated successfully, but these errors were encountered: