Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-upgrade][202012] Slow Celestica platform init in rc.local causes lacp-teardown #10152

Closed
vaibhavhd opened this issue Mar 4, 2022 · 3 comments

Comments

@vaibhavhd
Copy link
Contributor

Description

202012 Warm upgrade failure on Dx010 TOR.

Steps to reproduce the issue:

  1. Warm upgrade 6100 device from any older image to new 202012 image.
  2. If running test, the failure will be caught by test. Otherwise, to catch this manually, check for LAG flap signs in syslog.

Describe the results you received:

are hitting issues in warm-upgrading Celestica devices running SONiC from any image to 202012 branch image.

Short description of the issue:

  1. Warm upgrade fails on TOR due to LAG(s) flap.
  2. LAGs flap due to 90s lacp-session timeout, and lacp-teardown is initiated from the T1 neighbors.
  3. LACP session takes more than 90s as the reboot process is taking longer than before in 202012 warm bootup path.
  4. When investigating this I found that:
    a. Degradation is seen specifically in first boot steps in rc.local:
    b. installing and enabling platform-modules takes a lot of time – in 202012 branch.
    c. For comparison, time taken for rc.local processing.
    i. Same image warm reboot: ~3s.
    ii. Cross branch or in-branch warm “upgrades” to 202012 image: ~30s.
    d. The difference in the boot up path is degradation in 202012 upgrade scenario, which caused points 1, 2 above.

Note that this is a 202012 branch specific – I tried 201811 in-branch upgrade, and see that rc.local processing time is much lesser.

This is a blocker for warm upgrades, hence we need a faster resolution for this.

Questions:

  1. Why are we taking longer in 202012 (vs 201811) platform initialization (enable platform-modules-dx010).
  2. Can we reduce this time - is it possible to delay some of the operations in this step to later (when warmboot completes?).
  3. There is an error seen ion installing Python2 package – a) do we need an installation b) why is ERROR seen?

Describe the results you expected:

No LAG should flap after warmreboot.

Unblocked, shorter rc.local processing.

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

dx010-202012-54-54-warm.txt
dx010-202012-53-54-warm.txt

@vaibhavhd
Copy link
Contributor Author

Bad case: 202012 image 53 to 202012 image 54:

Mar  1 22:13:41 sonic systemd[1]: Starting /etc/rc.local Compatibility...
..
Mar  1 22:13:46 sonic rc.local[559]: Unpacking platform-modules-dx010 (0.9) ...
Mar  1 22:13:46 sonic rc.local[559]: Setting up platform-modules-dx010 (0.9) ...
Mar  1 22:13:47 sonic rc.local[612]: Synchronizing state of platform-modules-dx010.service with SysV service script with /lib/systemd/systemd-sysv-install.

>>>>   Mar  1 22:13:47 sonic rc.local[612]: Executing: /lib/systemd/systemd-sysv-install enable platform-modules-dx010
>>>>   Mar  1 22:14:05 sonic rc.local[994]: ERROR: sonic_platform-1.0-py2-none-any.whl is not a supported wheel on this platform.

Mar  1 22:14:07 sonic rc.local[999]: Processing /usr/share/sonic/device/x86_64-cel_seastone-r0/sonic_platform-1.0-py3-none-any.whl
Mar  1 22:14:07 sonic rc.local[999]: Installing collected packages: sonic-platform
Mar  1 22:14:08 sonic rc.local[999]: Successfully installed sonic-platform-1.0
Mar  1 22:14:09 sonic rc.local[434]: + sync
...
Mar  1 22:14:10 sonic rc.local[434]: + exit 0

@qnos
Copy link
Contributor

qnos commented Mar 15, 2022

Questions:

  1. Why are we taking longer in 202012 (vs 201811) platform initialization (enable platform-modules-dx010).
    A: More drivers added in 202012, which add more sleeps in the init script, and it causes the 202012 branch take more time than 201811.
  2. Can we reduce this time - is it possible to delay some of the operations in this step to later (when warmboot completes?).
    A: I have optimized the sleep time in dx010 sonic platform init script, reduce the sleep duration and keep driver work well.
  3. There is an error seen ion installing Python2 package – a) do we need an installation b) why is ERROR seen?
    A: 202012 branch DX010 SONiC platform codes were already ported to python3, python2 is deprecated, so we can remove installing of the python2 wheel. And it would save about 1-2 seconds in init script.

PR#10237 raised to fix the issue:
#10237

This fix reduced about 8-9s of time in setup dx010 platform modules.

How to verify it

  1. Check the warm reboot log, warm reboot is 8-9s faster than before.
[945232.436622] kexec_core: Starting new kernel                                                                                                                                                          
[    5.838762] rc.local[461]: + sed -e s/build_version: //g;s/'//g                                                                                                                                       
[    5.863209] rc.local[460]: + grep build_version                                                                                                                                                       
[    5.870373] rc.local[459]: + cat /etc/sonic/sonic_version.yml          
[    5.882392] rc.local[444]: + SONIC_VERSION=CLS-202012-f93d1f64a2_220311_0001                                                                                                                          
[    5.900265] rc.local[444]: + FIRST_BOOT_FILE=/host/image-CLS-202012-f93d1f64a2_220311_0001/platform/firsttime                                                                                         
[    5.919713] rc.local[444]: + SONIC_CONFIG_DIR=/host/image-CLS-202012-f93d1f64a2_220311_0001/sonic-config                                                                                              
[    5.939963] rc.local[444]: + SONIC_ENV_FILE=/host/image-CLS-202012-f93d1f64a2_220311_0001/sonic-config/sonic-environment                                                                              
[    5.959775] rc.local[444]: + [ -d /host/image-CLS-202012-f93d1f64a2_220311_0001/sonic-config -a -f /host/image-CLS-202012-f93d1f64a2_220311_0001/sonic-config/sonic-environment ]                     
[    5.991680] rc.local[444]: + echo moving file /host/image-CLS-202012-f93d1f64a2_220311_0001/sonic-config/sonic-environment to /etc/sonic                                                              
[    6.015772] rc.local[444]: moving file /host/image-CLS-202012-f93d1f64a2_220311_0001/sonic-config/sonic-environment to /etc/sonic                                                                     
[    6.039698] rc.local[444]: + mv /host/image-CLS-202012-f93d1f64a2_220311_0001/sonic-config/sonic-environment /etc/sonic                                                                               
[    6.063968] rc.local[444]: + logger SONiC version CLS-202012-f93d1f64a2_220311_0001 starting up...                                                                                                    
[    6.075689] rc.local[444]: + grub_installation_needed=                                                                                                                                                
[    6.084995] rc.local[444]: + [ ! -e /host/machine.conf ]                                                                                                                                              
[    6.099820] rc.local[444]: + migrate_nos_configuration                                                                                                                                                
[    6.115707] rc.local[444]: + rm -rf /host/migration                                                                                                                                                   
[    6.136062] rc.local[444]: + mkdir -p /host/migration                                                                                                                                                 
[    6.154299] kdump-tools[428]: Starting kdump-tools:                                                                                                                                                   
[    6.169527] rc.local[502]: + cat /proc/cmdline                                                                                                                                                        
[    6.175936] kdump-tools[465]: no crashkernel= parameter in the kernel cmdline ...                                                                                                                     
[    6.199530] kdump-tools[512]:  failed!                                                                                                                                                                
[    6.214259] rc.local[444]: + set -- BOOT_IMAGE=/image-CLS-202012-f93d1f64a2_220311_0001/boot/vmlinuz-4.19.0-12-2-amd64 root=UUID=9695eb38-b94c-46d6-ada1-c2a790617c0e rw console=tty0 console=ttyS0,11
5200n8 quiet intel_idle.max_cstate=0 net.ifnames=0 biosdevname=0 loop=image-CLS-202012-f93d1f64a2_220311_0001/fs.squashfs loopfstype=squashfs systemd.unified_cgroup_hierarchy=0 apparmor=1 security=appa
rmor varlog_size=4096 usbcore.autosuspend=-1 module_blacklist=gpio_ich SONIC_BOOT_TYPE=warm                                                                                                              
[    6.271805] rc.local[444]: + [ -n  ]                                                                                                                                                                  
[    6.283662] rc.local[444]: + . /host/machine.conf                                            
[    6.299681] rc.local[444]: + onie_version=2014.08.0.0.6                                                                                                                                               
[    6.315716] rc.local[444]: + onie_vendor_id=12244                                                                                                                                                     
[    6.332318] rc.local[444]: + onie_platform=x86_64-cel_seastone-r0                                                                                                                                     
[    6.349204] rc.local[444]: + onie_machine=cel_seastone                                                                                                                                                
[    6.363709] rc.local[444]: + onie_machine_rev=0                                                                                                                                                       
[    6.379679] rc.local[444]: + onie_arch=x86_64                                                                                                                                                         
[    6.400461] rc.local[444]: + onie_config_version=1                                                                                                                                                    
[    6.415694] rc.local[444]: + onie_build_date=2016-07-19T12:10-0400                                                                                                                                    
[    6.431697] rc.local[444]: + onie_partition_type=gpt                                                                                                                                                  
[    6.447759] rc.local[444]: + onie_kernel_version=3.2.35                                                                                                                                               
[    6.463741] rc.local[444]: + program_console_speed                                                                                                                                                    
[    6.472723] rc.local[505]: + grep -Eo console=ttyS[0-9]+,[0-9]+                                                                                                                                       
[    6.492563] rc.local[504]: + cat /proc/cmdline                                                                                                                                                        
[    6.508658] rc.local[506]: + cut -d , -f2                                                                                                                                                             
[    6.526088] rc.local[444]: + speed=115200                                                                                                                                                             
[    6.543885] rc.local[444]: + [ -z 115200 ]                                                                                                                                                            
[    6.555790] rc.local[444]: + CONSOLE_SPEED=115200                                                                                                                                                     
[    6.562351] rc.local[507]: + grep agetty /lib/systemd/system/serial-getty@.service                                                                                                                    
[    6.585391] rc.local[508]: + grep keep-baud                                                                                                                                                           
[    6.603647] rc.local[508]: ExecStart=-/sbin/agetty -o '-p -- \\u' --keep-baud 115200,57600,38400,9600 %I $TERM                                                                                        
[    6.624179] rc.local[444]: + [ 0 = 0 ]                                                                                                                                                                
[    6.639718] rc.local[444]: + sed -i s|\-\-keep\-baud .* %I| 115200 %I|g /lib/systemd/system/serial-getty@.service                                                                                     
[    6.659655] rc.local[444]: + systemctl daemon-reload                                                                                                                                                  
[    6.675673] rc.local[444]: + [ -f /host/image-CLS-202012-f93d1f64a2_220311_0001/platform/firsttime ]      
[    6.699699] rc.local[444]: + echo First boot detected. Performing first boot tasks...                                                                                                                 
[    6.719684] rc.local[444]: First boot detected. Performing first boot tasks...                                                                                                                        
[    6.739674] rc.local[444]: + [ -n  ]                                                                                                                                                                  
[    6.751663] rc.local[444]: + [ -n x86_64-cel_seastone-r0 ]                                                                                                                                            
[    6.767624] rc.local[444]: + platform=x86_64-cel_seastone-r0                                                                                                                                          
[    6.783625] rc.local[444]: + [ -d /host/old_config ]                                                                                                                                                  
[    6.799618] rc.local[444]: + mv -f /host/old_config /etc/sonic/                                                                                                                                       
[    6.816864] rc.local[444]: + rm -rf /etc/sonic/old_config/old_config                                                                                                                                  
[    6.835670] rc.local[444]: + touch /tmp/pending_config_migration                                                                                                                                      
[    6.851630] rc.local[444]: + touch /tmp/notify_firstboot_to_platform                                                                                                                                  
[    6.867609] rc.local[444]: + [ ! -d /host/reboot-cause/platform ]                                                                                                                                     
[    6.884516] rc.local[444]: + [ -d /host/image-CLS-202012-f93d1f64a2_220311_0001/platform/x86_64-cel_seastone-r0 ]                                                                                     
[    6.907732] rc.local[444]: + dpkg -i /host/image-CLS-202012-f93d1f64a2_220311_0001/platform/x86_64-cel_seastone-r0/platform-modules-dx010_0.9_amd64.deb                                               
[    6.933086] rc.local[551]: Selecting previously unselected package platform-modules-dx010.                                                                                                            
[    7.518234] rc.local[551]: (Reading database ... 30055 files and directories currently installed.)                                                                                                    
[    7.539760] rc.local[551]: Preparing to unpack .../platform-modules-dx010_0.9_amd64.deb ...                                                                                                           
[    7.563705] rc.local[551]: Unpacking platform-modules-dx010 (0.9) ...                                                                                                                                 
[    8.090707] rc.local[551]: Setting up platform-modules-dx010 (0.9) ...                                                                                                                                
[   12.777431] rc.local[611]: Synchronizing state of platform-modules-dx010.service with SysV service script with /lib/systemd/systemd-sysv-install.                                                     
[   12.807716] rc.local[611]: Executing: /lib/systemd/systemd-sysv-install enable platform-modules-dx010                                                                                                 
[   23.526475] rc.local[993]: Processing /usr/share/sonic/device/x86_64-cel_seastone-r0/sonic_platform-1.0-py3-none-any.whl                                                                              
[   24.213094] rc.local[993]: Installing collected packages: sonic-platform                                                                                                                              
[   24.376318] rc.local[993]: Successfully installed sonic-platform-1.0 
[   24.391815] rc.local[993]: WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv                                                                                                                                                       │
[   26.458719] rc.local[444]: + sync                                                                                                                                                                     
[   27.503835] rc.local[444]: + [ -n x86_64-cel_seastone-r0 ]                                                                                                                                            
[   27.519767] rc.local[444]: + [ -n  ]                                                                                                                                                                  
[   27.531894] rc.local[444]: + mkdir -p /var/platform                                                                                                                                                   
[   27.547755] rc.local[444]: + ebtables_config                                                                                                                                                          
[   27.563785] rc.local[444]: + /usr/sbin/ebtables-restore                                                                                                                                               
[   27.579849] rc.local[444]: + /usr/sbin/ebtables -t filter --atomic-file /etc/ebtables.filter --atomic-save                                                                                            
[   27.599756] rc.local[444]: + sed -i -e s/__PLATFORM__/x86_64-cel_seastone-r0/g /etc/default/kdump-tools                                                                                               
[   27.619761] rc.local[444]: + firsttime_exit                                                                                                                                                           
[   27.635769] rc.local[444]: + rm -rf /host/image-CLS-202012-f93d1f64a2_220311_0001/platform/firsttime                                                                                                  
[   27.655752] rc.local[444]: + exit 0
  1. Setup up Port-channel between two DUT, warm upgrade one DUT, and do not see LAG flap in syslog.

sujinmkang pushed a commit that referenced this issue Mar 21, 2022
…#10237)

Why I did it
To fix issue #10152 for dx010.
202012 Warm upgrade causes lacp-teardown on Dx010 TOR. platform code initialize slow causing lacp timeout.

How I did it
Remove the python2 sonic platform wheel which is deprecated.
Optimize the dx010 sonic platform script to speed up the init process.

How to verify it
Check the warm reboot log, warm reboot is 8-9s faster than before.

Signed-off-by: Eric Zhu <erzhu@celestica.com>
@vaibhavhd
Copy link
Contributor Author

#10237 closes this issue. Thanks Celestica team!

Blueve pushed a commit that referenced this issue Apr 22, 2022
…#10313)

* Optimize dx010 sonic platform init script to speed up init process
* Merge issue #10152: [warm-upgrade][202012] Slow Celestica platform init
in rc.local causes lacp-teardown fix into master branch

Signed-off-by: Eric Zhu <erzhu@celestica.com>
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this issue Jun 20, 2022
Related work items: #49, #58, #107, sonic-net#247, sonic-net#249, sonic-net#277, sonic-net#593, sonic-net#597, sonic-net#1035, sonic-net#2130, sonic-net#2150, sonic-net#2165, sonic-net#2169, sonic-net#2178, sonic-net#2179, sonic-net#2187, sonic-net#2188, sonic-net#2191, sonic-net#2195, sonic-net#2197, sonic-net#2198, sonic-net#2200, sonic-net#2202, sonic-net#2206, sonic-net#2209, sonic-net#2211, sonic-net#2216, sonic-net#7909, sonic-net#8927, sonic-net#9681, sonic-net#9733, sonic-net#9746, sonic-net#9850, sonic-net#9967, sonic-net#10104, sonic-net#10152, sonic-net#10168, sonic-net#10228, sonic-net#10266, sonic-net#10288, sonic-net#10294, sonic-net#10313, sonic-net#10394, sonic-net#10403, sonic-net#10404, sonic-net#10421, sonic-net#10431, sonic-net#10437, sonic-net#10445, sonic-net#10457, sonic-net#10458, sonic-net#10465, sonic-net#10467, sonic-net#10469, sonic-net#10470, sonic-net#10474, sonic-net#10477, sonic-net#10478, sonic-net#10482, sonic-net#10485, sonic-net#10488, sonic-net#10489, sonic-net#10492, sonic-net#10494, sonic-net#10498, sonic-net#10501, sonic-net#10509, sonic-net#10512, sonic-net#10514, sonic-net#10516, sonic-net#10517, sonic-net#10523, sonic-net#10525, sonic-net#10531, sonic-net#10532, sonic-net#10538, sonic-net#10555, sonic-net#10557, sonic-net#10559, sonic-net#10561, sonic-net#10565, sonic-net#10572, sonic-net#10574, sonic-net#10576, sonic-net#10578, sonic-net#10581, sonic-net#10585, sonic-net#10587, sonic-net#10599, sonic-net#10607, sonic-net#10611, sonic-net#10616, sonic-net#10618, sonic-net#10619, sonic-net#10623, sonic-net#10624, sonic-net#10633, sonic-net#10646, sonic-net#10655, sonic-net#10660, sonic-net#10664, sonic-net#10680, sonic-net#10683
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants