-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[warm-reboot] Use kexec_file_load instead of kexec_load when available #2608
Merged
saiarcot895
merged 1 commit into
sonic-net:master
from
saiarcot895:kexec-use-file-load-when-available
Jan 19, 2023
Merged
[warm-reboot] Use kexec_file_load instead of kexec_load when available #2608
saiarcot895
merged 1 commit into
sonic-net:master
from
saiarcot895:kexec-use-file-load-when-available
Jan 19, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On some dev VMs, warm reboot on a VS image fails. Specifically, after kexec is called and the new kernel starts, the new kernel tries to load the initramfs, but fails to do so for whatever reason. There may be messages about gzip decompression failing and that it's corrupted. After some experimentation, it was found that when first loading the new kernel and initramfs into memory, using the `kexec_file_load` syscall (`-s` flag in kexec) worked fine, whereas using the default `kexec_load` syscall resulted in a failure. It's unknown why `kexec_file_load` worked fine when `kexec_load` didn't; there shouldn't be any difference for non-secure boot kernels, as far as I can tell. What was seen, however, was that when taking a KVM dump in the failure case, the memory that stored the initramfs had differences compared to what was on disk. It's unknown what caused these differences. As a workaround (and as a bit of a feature enhancement), use the `-a` flag with kexec, which tells it to use `kexec_file_load` if available, and `kexec_load` if it's not available or otherwise fails. armhf doesn't support `kexec_file_load`, whereas arm64 gained support for `kexec_file_load` in the 5.19 kernel (we're currently on 5.10). `amd64` has supported `kexec_file_load` since 3.17. This also makes it possible to do kexec on secure boot systems, where the kernel image must be loaded via `kexec_file_load`. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
yxieca
approved these changes
Jan 19, 2023
8 tasks
vadymhlushko-mlnx
added a commit
to vadymhlushko-mlnx/sonic-buildimage
that referenced
this pull request
Jan 24, 2023
Update sonic-utilities submodule pointer to include the following: * fba87f4 Revert ([sonic-net#2599](sonic-net/sonic-utilities#2599)) * d6d7ab3 [warm-reboot] Use kexec_file_load instead of kexec_load when available ([sonic-net#2608](sonic-net/sonic-utilities#2608)) * db4683d fix show techsupport error ([sonic-net#2597](sonic-net/sonic-utilities#2597)) * 3d8e9c6 [GCU] Prohibit removal of PFC_WD POLL_INTERVAL field ([sonic-net#2545](sonic-net/sonic-utilities#2545)) * 163e766 [techsupport] include APPL_STATE_DB dump ([sonic-net#2607](sonic-net/sonic-utilities#2607)) * 8703773 YANG Validation for ConfigDB Updates: RADIUS_SERVER ([sonic-net#2604](sonic-net/sonic-utilities#2604)) * c2d746d Remove TODO comment which is no longer relevant ([sonic-net#2600](sonic-net/sonic-utilities#2600)) * f09da99 [show] Add bgpraw to show run all ([sonic-net#2537](sonic-net/sonic-utilities#2537)) * 39ac564 Extend fast-reboot STATE_DB entry timer ([sonic-net#2577](sonic-net/sonic-utilities#2577)) Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
8 tasks
vadymhlushko-mlnx
added a commit
to vadymhlushko-mlnx/sonic-buildimage
that referenced
this pull request
Jan 25, 2023
Update sonic-utilities submodule pointer to include the following: * f4f857e [GCU] Ignore bgpraw in GCU applier ([sonic-net#2623](sonic-net/sonic-utilities#2623)) * b5ac600 [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when service is stopped ([sonic-net#2622](sonic-net/sonic-utilities#2622)) * 981f953 [chassis][voq] Add show fabric reachability command. ([sonic-net#2528](sonic-net/sonic-utilities#2528)) * fba87f4 Revert ([sonic-net#2599](sonic-net/sonic-utilities#2599)) * d6d7ab3 [warm-reboot] Use kexec_file_load instead of kexec_load when available ([sonic-net#2608](sonic-net/sonic-utilities#2608)) * db4683d fix show techsupport error ([sonic-net#2597](sonic-net/sonic-utilities#2597)) * 3d8e9c6 [GCU] Prohibit removal of PFC_WD POLL_INTERVAL field ([sonic-net#2545](sonic-net/sonic-utilities#2545)) * 163e766 [techsupport] include APPL_STATE_DB dump ([sonic-net#2607](sonic-net/sonic-utilities#2607)) * 8703773 YANG Validation for ConfigDB Updates: RADIUS_SERVER ([sonic-net#2604](sonic-net/sonic-utilities#2604)) * c2d746d Remove TODO comment which is no longer relevant ([sonic-net#2600](sonic-net/sonic-utilities#2600)) * f09da99 [show] Add bgpraw to show run all ([sonic-net#2537](sonic-net/sonic-utilities#2537)) * 39ac564 Extend fast-reboot STATE_DB entry timer ([sonic-net#2577](sonic-net/sonic-utilities#2577)) Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
qiluo-msft
pushed a commit
that referenced
this pull request
Mar 20, 2023
#2608) On some dev VMs, warm reboot on a VS image fails. Specifically, after kexec is called and the new kernel starts, the new kernel tries to load the initramfs, but fails to do so for whatever reason. There may be messages about gzip decompression failing and that it's corrupted. After some experimentation, it was found that when first loading the new kernel and initramfs into memory, using the `kexec_file_load` syscall (`-s` flag in kexec) worked fine, whereas using the default `kexec_load` syscall resulted in a failure. It's unknown why `kexec_file_load` worked fine when `kexec_load` didn't; there shouldn't be any difference for non-secure boot kernels, as far as I can tell. What was seen, however, was that when taking a KVM dump in the failure case, the memory that stored the initramfs had differences compared to what was on disk. It's unknown what caused these differences. As a workaround (and as a bit of a feature enhancement), use the `-a` flag with kexec, which tells it to use `kexec_file_load` if available, and `kexec_load` if it's not available or otherwise fails. armhf doesn't support `kexec_file_load`, whereas arm64 gained support for `kexec_file_load` in the 5.19 kernel (we're currently on 5.10). `amd64` has supported `kexec_file_load` since 3.17. This also makes it possible to do kexec on secure boot systems, where the kernel image must be loaded via `kexec_file_load`. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
qiluo-msft
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
Mar 21, 2023
This PR includes the following commits ``` 5b0f0fc [202012][dhcp_relay] Fix dhcp_relay restart error while add/del vlan (sonic-net/sonic-utilities#2688) 48fd842 [show][muxcable] increase timeout for displaying HW_STATUS (sonic-net/sonic-utilities#2712) f0a9f4f [dhcp_relay] Add show/clear/counter cli for dhcp_relay (sonic-net/sonic-utilities#2719) 8627944 Revert "[202012] Update load minigraph to load backend acl" (sonic-net/sonic-utilities#2736) 93c7d43 [warm-reboot] Use kexec_file_load instead of kexec_load when available (sonic-net/sonic-utilities#2608) cc78747 [warm/fast-reboot] Backup logs from tmpfs to disk during fast/warm shutdown (sonic-net/sonic-utilities#2714) ```
isabelmsft
pushed a commit
to isabelmsft/sonic-utilities
that referenced
this pull request
Mar 23, 2023
sonic-net#2608) On some dev VMs, warm reboot on a VS image fails. Specifically, after kexec is called and the new kernel starts, the new kernel tries to load the initramfs, but fails to do so for whatever reason. There may be messages about gzip decompression failing and that it's corrupted. After some experimentation, it was found that when first loading the new kernel and initramfs into memory, using the `kexec_file_load` syscall (`-s` flag in kexec) worked fine, whereas using the default `kexec_load` syscall resulted in a failure. It's unknown why `kexec_file_load` worked fine when `kexec_load` didn't; there shouldn't be any difference for non-secure boot kernels, as far as I can tell. What was seen, however, was that when taking a KVM dump in the failure case, the memory that stored the initramfs had differences compared to what was on disk. It's unknown what caused these differences. As a workaround (and as a bit of a feature enhancement), use the `-a` flag with kexec, which tells it to use `kexec_file_load` if available, and `kexec_load` if it's not available or otherwise fails. armhf doesn't support `kexec_file_load`, whereas arm64 gained support for `kexec_file_load` in the 5.19 kernel (we're currently on 5.10). `amd64` has supported `kexec_file_load` since 3.17. This also makes it possible to do kexec on secure boot systems, where the kernel image must be loaded via `kexec_file_load`. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
isabelmsft
added a commit
to isabelmsft/sonic-utilities
that referenced
this pull request
Mar 23, 2023
commit 1d54781a1f90bda156b06b0734805babfba88b6d Merge: 460c7f39 c704b71c Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 07:32:32 2023 +0000 Merge branch 'mux_mclag' of https://github.com/isabelmsft/sonic-utilities into mux_mclag commit 460c7f390d352b1a0090708fde2f5ca2ace99209 Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 07:22:54 2023 +0000 fix UT commit d3e7f22a806d238b20e7e9db1cdfb1afc5d04ae1 Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 05:32:03 2023 +0000 fix UT commit e2660efe7f6de2531d966a8bf207b04456747374 Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 04:37:26 2023 +0000 add UT commit 68cc589f4d20e60461bf76cbe67cad931f10c7c2 Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 00:55:15 2023 +0000 add UT commit f55ea00bb1fd1d4827c67110498de6d49990d4d1 Author: Mai Bui <maibui@microsoft.com> Date: Tue Mar 21 00:25:39 2023 -0400 Revert "Replace pickle by json (#2636)" (#2746) This reverts commit 54e26359fccf45d2e40800cf5598a725798634cd. Due to https://github.com/sonic-net/sonic-buildimage/issues/14089 Signed-off-by: Mai Bui <maibui@microsoft.com> commit 3b842c1b215020b24e5934b618d8cb51542e4088 Author: abdosi <58047199+abdosi@users.noreply.github.com> Date: Fri Mar 17 16:27:48 2023 -0700 Fix the `show interface counters` throwing exception on device with no external interfaces (#2703) Fix the `show interface counters` throwing exception issue where device do not have any external ports and all are internal links (ethernet or fabric) which is possible in chassis commit ce9245d90a3ccdf903d34ba6966224b29de5d15b Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Fri Mar 17 09:10:47 2023 +0200 [route_check] remove check-frr_patch mock (#2732) The test fails with python3.7 (works in 3.9) when stopping patch which hasn't been started. We can always mock check_output call and if FRR_ROUTES is not defined return empty dictionary by the mock. #### What I did Removed check_frr_patch mock to fix UT running on python3.7 #### How I did it Removed the mock #### How to verify it Run unit test in stretch env commit 370aa30fc3f51918d4d0c36c9dc2c79f54214e67 Author: Neetha John <nejo@microsoft.com> Date: Thu Mar 16 17:31:49 2023 -0700 Revert "Update load minigraph to load backend acl (#2236)" (#2735) This reverts commit 1518ca92df1e794222bf45100246c8ef956d7af6. commit e4415b5ed4ea3100580ee9aaf8060587b8f96611 Author: Vivek <vivekreddykarri98@gmail.com> Date: Tue Mar 14 17:55:40 2023 -0700 Update the ref guide to reflect the vlan brief output (#2731) What I did show vlan brief will only be showing dhcpv4 addresses and not dhcpv6 destination Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com> commit 093c964c576e28188ddb0181af1fcc6b7a3adfc5 Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com> Date: Tue Mar 14 22:13:51 2023 +0200 Fix fast-reboot DB migration (#2734) Fix DB migrator logic for migrating fast-reboot table, fixing #2621 db_migrator. How I did it Checking if fast-reboot table exists in DB. How to verify it Verified manually, migrating after fast-reboot and after cold/warm reboot. commit 16baa1a1ddac85ab1db559a27d47b566b65d78e8 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Tue Mar 14 21:01:52 2023 +0800 Enhance the logic to wait for all buffer tables to be removed in _clear_qos (#2720) - What I did This is an enhancement of PR #2503 - How I did it On top of waiting for BUFFER_POOL_TABLE to be cleared from APPL_DB, we need to wait for KEY_SET and DEL_SET as well. KEY_SET and DEL_SET are designed to accommodate the APPL_DB entries that were updated by manager daemons but have not yet been handled by the orchagent. In this case, even if the buffer tables are empty, entries in KEY_SET or DEL_SET will be in the buffer tables later on. So, we need to wait for key set tables as well. Do not delay for traditional buffer manager because it does not remove any buffer table. Provide a CLI option to print the detailed message if there is any table item which still exists - How to verify it Manually test and unit test - Previous command output (if the output of a command-line utility has changed) Running command: /usr/local/bin/sonic-cfggen -d --write-to-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/buffers_dynamic.json.j2,config-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/qos.json.j2,config-db -y /etc/sonic/sonic_version.yml - New command output (if the output of a command-line utility has changed) Only with option --verbose there are new output. Without the option, the output is the same as it is. admin@mtbc-sonic-01-2410:~$ sudo config qos reload --verbose Some entries matching BUFFER_*_TABLE:* still exist: BUFFER_QUEUE_TABLE:Ethernet108:0-2 Some entries matching BUFFER_*_SET still exist: BUFFER_PG_TABLE_KEY_SET Some entries matching BUFFER_*_TABLE:* still exist: BUFFER_QUEUE_TABLE:Ethernet108:0-2 Some entries matching BUFFER_*_SET still exist: BUFFER_PG_TABLE_KEY_SET Some entries matching BUFFER_*_TABLE:* still exist: BUFFER_QUEUE_TABLE:Ethernet108:0-2 Running command: /usr/local/bin/sonic-cfggen -d --write-to-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/buffers_dynamic.json.j2,config-db -t /usr/share/sonic/device/x86_64-mlnx_msn2410-r0/ACS-MSN2410/qos.json.j2,config-db -y /etc/sonic/sonic_version.yml commit 81b4fcaa7f79976fdd5da07077e21902679579fb Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com> Date: Fri Mar 10 18:41:30 2023 +0200 Remove timer from FAST_REBOOT STATE_DB entry and use finalizer (#2621) This should come along with sonic-buildimage PR (sonic-net/sonic-buildimage#13484) implementing fast-reboot finalizing logic in finalize-warmboot script and other submodules PRs utilizing the change. This PR should come along with the following PRs as well: sonic-net/sonic-swss-common#742 sonic-net/sonic-platform-daemons#335 sonic-net/sonic-sairedis#1196 This set of PRs solves the issue sonic-net/sonic-buildimage#13251 What I did Remove the timer used to clear fast-reboot entry from state-db, instead it will be cleared by fast-reboot finalize function implemented inside finalize-warmboot script (which will be invoked since fast-reboot is using warm-reboot infrastructure). As well instead of having "1" as the value for fast-reboot entry in state-db and deleting it when done it is now modified to set enable/disable according to the context. As well all scripts reading this entry should be modified to the new value options. How I did it Removed the timer usage in the fast-reboot script and adding fast-reboot finalize logic to warm-reboot in the linked PR. Use "enable/disable" instead of "1" as the entry value. How to verify it Run fast-reboot and check that the state-db entry for fast-reboot is being deleted after finalizing fast-reboot and not by an expiring timer. commit 9693c990191143605c74fe98c5a0f099598238fe Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Fri Mar 10 04:07:25 2023 +0200 [route_check] fix IPv6 address handling (#2722) *In case user has configured an IPv6 address on an interface in CONFIG DB in non simplified form like 2000:31:0:0::1/64 it is present in a simplified form in ASIC_DB. This leads to route_check failure since it just compares strings. commit e65ffce059fc4164a59c17774346764232f2c10d Author: jhli-cisco <93410383+jhli-cisco@users.noreply.github.com> Date: Wed Mar 8 18:03:50 2023 -0800 update fast-reboot (#2728) commit 4f24b1137a00f596bf520fdf159ac8c4c6bb63c6 Author: jingwenxie <jingwenxie@microsoft.com> Date: Thu Mar 9 09:12:19 2023 +0800 [GCU] Add vlanintf-validator (#2697) What I did Fix the bug of GCU vlan interface modification. It should call ip neigh flush dev after removing interface ip. The fix is basically following config CLI's tradition. How I did it Add vlanintf service validator to check if extra step of ip neigh flush is needed. How to verify it GCU E2E test in dualtor testbed. commit 40f4254c87f33145c121fc182601702df7fceced Author: Liu Shilong <shilongliu@microsoft.com> Date: Thu Mar 9 06:57:05 2023 +0800 Check SONiC dependencies before installation. (#2716) #### What I did SONiC related packages shouldn't be intalled from Pypi. It is security compliance requirement. Check SONiC related packages when using setup.py. commit 793b14ac75042e86f9f38852b9c2eafdf981ab18 Author: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com> Date: Wed Mar 8 13:28:59 2023 -0800 Improve show acl commands (#2667) * Add status for ACL_TABLE and ACL_RULE in STATE_DB commit 3d24b00fcf0159e77eab656f793e9267f323fcbb Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com> Date: Wed Mar 8 00:19:03 2023 -0800 [GCU] Add PFC_WD RDMA validator (#2619) commit dcccec9df35cd76045f0c623d058d0c87fcc3fe6 Author: vdahiya12 <67608553+vdahiya12@users.noreply.github.com> Date: Tue Mar 7 15:19:53 2023 -0800 [show][muxcable] increase timeout for displaying HW_STATUS (#2712) What I did probe mux direction not always return success. Sample output of: while [ 1 ]; do date; show mux hwmode muxdirection; show mux status; sleep 1; done Mon 27 Feb 2023 03:12:25 PM UTC Port Direction Presence ----------- ----------- ---------- Ethernet16 unknown True PORT STATUS HEALTH HWSTATUS LAST_SWITCHOVER_TIME ----------- -------- -------- ------------ --------------------------- Ethernet16 standby healthy inconsistent 2023-Feb-25 07:55:18.269177 If we increase the timeout to 0.5 secs to get the values back from ycabled, this will remove the inconsistency issue, and display the consistent values, because while telemetry is going on, the time to get actual mux value takes significantly longer than 0.1 seconds. PORT STATUS HEALTH HWSTATUS LAST_SWITCHOVER_TIME ----------- -------- -------- ------------ --------------------------- Ethernet16 standby healthy consistent 2023-Feb-25 07:55:18.269177 How I did it How to verify it Manually run changes on setup worst-case CLI return time could be 16 seconds for 32 ports. on avg each port is 200 mSec if telemetry is going, but on average show command will return in < 1 sec for all 32 ports. Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com> commit 75bb60fe4f22b2c0831e7b31e5675df0cd01ff7d Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com> Date: Tue Mar 7 14:42:50 2023 -0800 YANG validation for ConfigDB Updates: MIRROR_SESSION use case (#2430) commit cf3f0ce86b3fd4f7b7548331aab8cc3337663e5d Author: kellyyeh <42761586+kellyyeh@users.noreply.github.com> Date: Tue Mar 7 10:47:13 2023 -0800 Fix non-zero status exit on non secure boot system (#2715) What I did Warm-reboot fails on kvm due to non-zero exit upon command bootctl status 2>/dev/null | grep -c "Secure Boot: enabled" How I did it Added || true to return 0 when previous command fails. Added CHECK_SECURE_UPGRADE_ENABLED to check output of previous command Added debug logs How to verify it Run warm-reboot on kvm and physical device when increased verbosity. Expects debug log to indicate secure/non secure boot. Successful warm reboot commit 74d6d77c3ae6cc255bf18755bd902ff7d86ace67 Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Tue Mar 7 20:23:07 2023 +0200 [route_check] implement a check for FRR routes not marked offloaded (#2531) * [route_check] implement a check for FRR routes not marked offloaded * Implemented a route_check functioality that will check "show ip route json" output from FRR and will ensure that all routes are marked as offloaded. If some routes are not offloaded for 15 sec, this is considered as an issue and a mitigation logic is invoked. commit 36e98b3ddf584790a4f7e343c4fbe0895ef9bc85 Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Date: Mon Mar 6 10:56:51 2023 -0800 [warm/fast-reboot] Backup logs from tmpfs to disk during fast/warm shutdown (#2714) Goal: Preserve logs during TOR upgrades and shutdown Need: Below PRs moved logs from disk to tmpfs for specific hwskus. Due to these changes, shutdown path logs are now lost. The logs in shutdown path are crucial for debug purposes. sonic-net/sonic-buildimage#13805 sonic-net/sonic-buildimage#13587 sonic-net/sonic-buildimage#13587 How I did it Check if logs are on tmpfs. If yes, backup logs from /var/log How to verify it Verified on a physical device - logs on tmfs are backed up for past 30 minutes. commit a1c3bd55eea983aae197282e10ac8099492a6194 Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Date: Fri Mar 3 12:45:40 2023 -0800 [db_migrator] Add missing attribute 'weight' to route entries in APPL DB (#2691) Fixes: 201911 to 202205 warm upgrade failure in fpmsyncd reconciliation due to missing weight attr in routes. (sonic-net/sonic-buildimage#12625) How I did it Check for missing attribute weight in APPLDB route entries. If found missing this attribute is added with empty value. How to verify it Verified on physical device. 201911 to 202205 upgrade worked fine. commit 696da1878f2e275d8cf2fbb17881d63ca01df32a Author: Liu Shilong <shilongliu@microsoft.com> Date: Thu Mar 2 15:36:57 2023 +0800 [ci] Fix pipeline issue caused by sonic-slave-* change. (#2709) What I did These 3 packages maybe purged by default. Do not block pipeline. Download deb/whl packages only to accelerate download process. How I did it How to verify it commit bf24267fddc95e8d83ef5908e0eab30ddd6c3ac1 Author: Yaqiang Zhu <yaqiangzhu@microsoft.com> Date: Wed Mar 1 10:05:04 2023 +0800 [dhcp_relay] Fix dhcp_relay restart error while add/del vlan (#2688) Why I did In device that doesn't have dhcp_relay service, restart dhcp_relay after add/del vlan would encounter failed How I did it Add support to check whether device is support dhcp_relay service. How to verify it 1. Unit test 2. Build and install in device Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com> commit 484f5943931eef5ac1bd22467eca648aacbeabd3 Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com> Date: Mon Feb 27 23:49:01 2023 -0800 [GCU] Add Sample Unit Test for RDMA Headroom Pool Size Tuning (#2692) * add rdma gcu unit test * fix comment * clean unused code * clean format * extend to mock patchapplier, in place of changeapplier * replace tabs with spaces commit fa291e1078be3676130c99bcec840c88c221bf8e Author: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com> Date: Mon Feb 27 17:49:34 2023 +0800 Add begin logs to config reload/config minigraph/warm-reboot/fast-reboot (#2694) - What I did Add more logs for config reload/config minigraph/warm-reboot/fast/reboot to identify in the log (notice level) what was the command executed which could cause a service affect. - How I did it Add more logs for config reload/config minigraph/warm-reboot/fast/reboot. - How to verify it Manual test commit d58c4fbcbb5dd3b1be004926bf0584c2594049d7 Author: StormLiangMS <89824293+StormLiangMS@users.noreply.github.com> Date: Mon Feb 27 11:14:54 2023 +0800 Revert "Secure upgrade (#2337)" (#2675) This reverts commit 6fe8599216afb1c302e77c52235c4849be6042b2. commit 15a59c93093e779479a47e79f8bd4d5772d1fbdd Author: vdahiya12 <67608553+vdahiya12@users.noreply.github.com> Date: Fri Feb 24 12:46:36 2023 -0800 [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (#2414) This PR adds the support for adding some utility commands for muxacble This includes commands for health, operationtime, queueinfo, resetcause vdahiya@sonic:~$ show mux health Ethernet4 PORT ATTR HEALTH --------- --------------- -------- Ethernet4 health_check Ok vdahiya@sonic:~$ show mux health Ethernet4 --json { "health_check": "Ok" } vdahiya@sonic:~$ show mux operation Ethernet4 --json { "operation_time": "22:22" } vdahiya@sonic:~$ show mux operation Ethernet4 PORT ATTR OPERATION_TIME --------- -------------- ---------------- Ethernet4 operation_time 22:22 vdahiya@sonic:~$ vdahiya@sonic:~$ show mux resetcause Ethernet4 PORT ATTR RESETCAUSE --------- ----------- ------------ Ethernet4 reset_cause 0 vdahiya@sonic:~$ show mux resetcause Ethernet4 --json { "reset_cause": "0" } vdahiya@sonic:~$ show mux queueinfo Ethernet4 --json { "Remote": "{'VSC': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 0, 'node_size': 0}, 'UART1': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 209870, 'node_size': 1682183}, 'UART2': {'r_ptr': 13262, 'w_ptr': 3, 'total_count': 0, 'free_count': 0, 'buff_addr': 12, 'node_size': 0}}", "Local": "{'VSC': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 0, 'node_size': 0}, 'UART1': {'r_ptr': 0, 'w_ptr': 0, 'total_count': 0, 'free_count': 0, 'buff_addr': 209870, 'node_size': 1682183}, 'UART2': {'r_ptr': 13262, 'w_ptr': 3, 'total_count': 0, 'free_count': 0, 'buff_addr': 12, 'node_size': 0}}" } commit 07675feb09544f095e9a867634a16d1dee825a69 Author: Mai Bui <maibui@microsoft.com> Date: Fri Feb 24 12:26:32 2023 -0500 Replace pickle by json (#2636) Signed-off-by: maipbui <maibui@microsoft.com> #### What I did `pickle` can lead to lead to code execution vulnerabilities. Recommend to serializing the relevant data as JSON. #### How I did it Replace `pickle` by `json` #### How to verify it Pass UT Manual test commit 56a9d69bc79eda9d67953ed21fd42221b58ee04d Author: Yaqiang Zhu <yaqiangzhu@microsoft.com> Date: Thu Feb 16 02:31:01 2023 +0800 [dhcp_relay] Remove add field of vlanid to DHCP_RELAY table while add vlan (#2678) What I did Remove add field of vlanid to DHCP_RELAY table while add vlan which would cause conflict with yang model. How I did it Remove add field of vlanid to DHCP_RELAY table while add vlan How to verify it By unit tests Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com> commit 8f7f8bd1810328fc0faa85b23f2033aa3fc61191 Author: davidpil2002 <91657985+davidpil2002@users.noreply.github.com> Date: Tue Feb 14 11:38:53 2023 +0200 Add support of secure warm-boot (#2532) - What I did Add support of secure warm-boot to SONiC. Basically, warm-boot is supporting to load a new kernel without doing full/cold boot. That is by loading a new kernel and exec with kexec Linux command. As a result of that, even when the Secure Boot feature is enabled, still a user or a malicious user can load an unsigned kernel, so to avoid that we added the support of the secure warm boot. More Description about this feature can be found in the Secure Boot HLD: sonic-net/SONiC#1028 - How I did it In general, Linux support it, so I enabled this support by doing the follow steps: I added some special flags in Linux Kernel when user build the sonic-buildimage with secure boot feature enabled. I added a flag "-s" to the kexec command Note: more details in the HLD above. - How to verify it * Good flow: manually just install with sonic-installed a new secure image (a SONiC image that was build with Secure Boot flag enabled) after the secure image is installed, do: warm-reboot Check now that the new kernel is really loaded and switched. * Bad flow: Do the same steps 1-2 as a good flow but with an insecure image (SONiC image that was built without setting Secure Boot enabled) After the insecure image is installed, and triggered warm-boot you should get an error that the new unsigned kernel from the unsecured image was not loaded. Automation test - TBD commit a05ce562e37463a7ff8d8c012aca347c8bb45e03 Author: Yaqiang Zhu <yaqiangzhu@microsoft.com> Date: Tue Feb 14 09:18:37 2023 +0800 [doc] Add docs for dhcp_relay show/clear cli (#2649) What I did Add docs for dhcp_realy show/clear cli How I did it Add docs for dhcp_realy show/clear cli Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com> commit 3228979b2aa0de90444f385a8f6f1c8c66fd0e09 Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com> Date: Mon Feb 13 11:04:58 2023 -0800 [portstat CLI] don't print reminder if use json format (#2670) * no print if use json format * add print for chassis commit b741628f5f30283b40b75b784e1daf57671ae6d8 Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com> Date: Mon Feb 13 13:03:12 2023 +0200 [generate_dump] Revert "Revert generate_dump optimization PR's #2599", add fixes for empty /dump forder and symbolic links (#2645) - What I did 0ee19e5 Revert Revert the show-techsupport optimization PR's #2599 c8940ad Add a fix for the empty /dump folder inside the final tar archive generated by the show techsupport CLI command. 8a8668c Add a fix to not follow the symbolic links to avoid duplicate files inside the final tar archive generated by the show techsupport CLI command. - How I did it Modify the scripts/generate_dump script. - How to verify it 1. Manual verification do the show techsupport CLI command and save output original.tar.gz (with original generate_dump script) do the show techsupport CLI command and save output fixes.tar.gz (with the generate_dump script modified by this PR) unpack both archives original.tar.gz and fixes.tar.gz compare both directories with ncdu & diff --brief --recursive original fixes Linux utilities 2. Run the community tests sonic-mgmt/tests/show_techsupport Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com> commit 96d5c2d5fcc1967b0f5f517ccc490e3b95be3585 Author: Yaqiang Zhu <yaqiangzhu@microsoft.com> Date: Fri Feb 10 17:49:38 2023 +0800 [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan (#2660) What I did Currently, add/del a vlan doesn't change related dhcpv6_relay config, which is incorrect. How I did it 1. Add dhcp_relay table init entry while adding vlan 2. Delete dhcp_relay related config while deleting vlan 3. Add unitest How to verify it 1. By unitest 2. install whl and run cli Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com> commit a090523a9ef07eaab176893b7eaa660930fa5dbf Author: jingwenxie <jingwenxie@microsoft.com> Date: Fri Feb 10 09:13:51 2023 +0800 [GCU] protect loopback0 from deletion (#2638) What I did Refer to sonic-net/sonic-buildimage#11171, protect loopback0 from deletion How I did it Add patch checker to fail the validation when remove loopback0 How to verify it Unit test commit 18a3d00ad160fd7d890c3f8061cc84b96374f7a3 Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Thu Feb 9 05:20:11 2023 +0200 [config/show] Add command to control pending FIB suppression (#2495) * [config/show] Add command to control pending FIB suppression What I did I added a command config suppress-pending-fib that will allow user to enable/disable this feature. Once it is enabled, BGP will wait for route to be programmed to HW before announcing the route to the peers. I also added a corresponding show command that prints the status of this feature. commit 5244e3b5cbc5d6708f56401219a4257d47b4b0f7 Author: mihirpat1 <112018033+mihirpat1@users.noreply.github.com> Date: Wed Feb 8 16:39:00 2023 -0800 Add transceiver info CLI support to show output from TRANSCEIVER_INFO for ZR (#2630) * Add transceiver info CLI support to show output from TRANSCEIVER_INFO for ZR Signed-off-by: Mihir Patel <patelmi@microsoft.com> * Added test case for info CLI * Updated command reference * Resolved merged conflicts * Made convert_sfp_info_to_output_string generic for CMIS and non CMIS and added test case to address PR comment * Resolved test_multi_asic_interface_status_all failure * Addressed PR comments --------- Signed-off-by: Mihir Patel <patelmi@microsoft.com> commit 05aedd558dbe901b873e2e2c8e11afc15a67db85 Author: vdahiya12 <67608553+vdahiya12@users.noreply.github.com> Date: Tue Feb 7 12:30:18 2023 -0800 [show] add support for gRPC show commands for `active-active` (#2629) Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com This PR adds support for show mux hwmode muxdirection as well as show mux grpc muxdirection to show the state of gRPC connected to the SoCs for 'active-active' acble type vdahiya@sonic:~$ show mux grpc muxdirection Port Direction Presence PeerDirection ConnectivityState --------- ----------- ---------- --------------- ------------------- Ethernet0 active False active READY vdahiya@sonic:~$ vdahiya@sonic:~$ show mux grpc muxdirection --json { "HWMODE": { "Ethernet0": { "Direction": "active", "Presence": "False", "PeerDirection": "active", "ConnectivityState": "READY" } } } What I did Added support for the commands. How I did it How to verify it UT and running the changes on Testbed commit 9512ccd2d2863d7bcb5e7f42cf60b0be39c61c70 Author: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com> Date: Tue Feb 7 12:14:49 2023 -0800 [sai_failure_dump]Invoking dump during SAI failure (#2633) * Added logic in techsupport script to collect SAI failure dump commit 4971b7b71067e86c7f86591efc86993aa0c0ce1d Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Tue Feb 7 18:07:52 2023 +0200 [db_migrator] make LOG_LEVEL_DB migration more robust (#2651) It could be that LOG_LEVEL_DB includes some invalid data and/or a KEY_SET that is not cleaned up due to an issue, for example we observed _gearsyncd_KEY_SET set included in the LOG_LEVEL_DB and preserved in warm reboot. However, this key is not of type hash which leads to an exception and migration failure. The migration logic should be more robust allowing users to upgrade even though some daemon has left overs in the LOG_LEVEL_DB or invalid data is written. - What I did To fix migration issue that leads to device configuration being lost. - How I did it Wrap the logic in try/except/finally. - How to verify it 202205 -> 202211/master upgrade. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> commit d80ec9722880d7b8a6786a27696bff97ae30b903 Author: siqbal1986 <shahzad.iqbal@microsoft.com> Date: Mon Feb 6 12:00:09 2023 -0800 Fixed a bug in "show vnet routes all" causing screen overrun. (#2644) Signed-off-by: siqbal1486 <shahzad.iqbal@microsoft.com> commit 6b567168bc971cac112681596d828c919d252bc8 Author: mihirpat1 <112018033+mihirpat1@users.noreply.github.com> Date: Wed Feb 1 13:48:57 2023 -0800 show logging CLI support for logs stored in tmpfs (#2641) * show logging CLI support for logs stored in tmpfs Signed-off-by: Mihir Patel <patelmi@microsoft.com> * Fixed testcase failures * Reverted unwanted change in a file * Added testcase for syslog.1 in log.tmpfs directory * mend --------- Signed-off-by: Mihir Patel <patelmi@microsoft.com> commit 38e5caadb7caebedb9237a9cd87c927bd6637fe5 Author: jfeng-arista <98421150+jfeng-arista@users.noreply.github.com> Date: Wed Feb 1 11:29:49 2023 -0800 [chassis][voq] Add asic id for linecards so "show fabric counters queue/port" can work. (#2499) * Add asic id for linecards so "show fabric counters queue/port" can work. * Add test coverage --------- Signed-off-by: Jie Feng <jfeng@arista.com> commit 78e5f179772fc951732a33191865efabea77c965 Author: longhuan-cisco <84595962+longhuan-cisco@users.noreply.github.com> Date: Wed Feb 1 11:12:41 2023 -0800 Add Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table for ZR (#2615) * Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table * Fix alert typo * Fix display format and add cd short link * Add doc for pm * Update Command-Reference.md commit 8a7609930cae97934719609b42d61ad153c3350d Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com> Date: Wed Feb 1 09:33:14 2023 -0800 [masic support] 'show run bgp' support for multi-asic (#2427) Support 'show run bgp' for multi-asics Add mock tables and UTs for single-asic, multi-asic, bgp not running cases commit 370fe81229f3fbea29d5bf5b9ee2347824056d80 Author: kartik-arista <61531803+kartik-arista@users.noreply.github.com> Date: Tue Jan 31 10:19:26 2023 -0800 Making 'show feature autorestart' more resilient to missing auto_restart config in CONFIG_DB (#2592) Fixes BUG 762723 commit e6d880a0249f1f2e0b9d4ef2412e84e9a31b45a2 Author: Yaqiang Zhu <yaqiangzhu@microsoft.com> Date: Mon Jan 30 21:07:12 2023 -0800 [doc] Update docs for dhcp_relay config cli (#2598) What I did Updated docs about dhcp_relay config cli How I did it Updated docs about dhcp_relay config cli Signed-off-by: Yaqiang Zhu <yaqiangzhu@microsoft.com> commit 9865dda9b7075bc9c788cba893cba329a0548e24 Author: abdosi <58047199+abdosi@users.noreply.github.com> Date: Mon Jan 30 17:52:50 2023 -0800 Skip saidump for Spine Router as this can take more than 5 sec (#2637) To address sonic-net/sonic-buildimage#13561 skip saidump on T2 platforms for time-being. commit 56d41f2581157c31a09da365515ac9df9ebb540b Author: ycoheNvidia <99744138+ycoheNvidia@users.noreply.github.com> Date: Mon Jan 30 23:28:15 2023 +0200 Secure upgrade (#2337) #### What I did Added support for secure upgrade #### How I did it It includes image signing during build (in sonic buildimage repo) and verification during image install (in sonic-utilities). HLD can be found in the following PR: https://github.com/sonic-net/SONiC/pull/1024 #### How to verify it Feature is used to allow image was not modified since built from vendor. During installation, image can be verified with a signature attached to it. In order for image verification - image must be signed - need to provide signing key and certificate (paths in SECURE_UPGRADE_DEV_SIGNING_KEY and SECURE_UPGRADE_DEV_SIGNING_CERT in rules/config) during build , and during image install, need to enable secure boot flag in bios, and signing_certificate should be available in bios. #### Feature dependencies In order for this feature to work smoothly, need to have secure boot feature implemented as well. The Secure boot feature will be merged in the near future. sonic-buildimage PR: https://github.com/sonic-net/sonic-buildimage/pull/11862 commit 0744b19b7321aa33269ee7a76937f21e44c2750c Author: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com> Date: Tue Jan 31 02:15:01 2023 +0800 [system-health] Fix issue: show system-health CLI crashes (#2635) - What I did Fix issue: show system-health CLI crashes root@switch:/home/admin# show system-health summary Traceback (most recent call last): File "/usr/local/bin/show", line 8, in <module> sys.exit(cli()) File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 113, in summary _, chassis, stat = get_system_health_status() File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 10, in get_system_health_status if os.environ["UTILITIES_UNIT_TESTING"] == "1": File "/usr/lib/python3.9/os.py", line 679, in __getitem__ raise KeyError(key) from None KeyError: 'UTILITIES_UNIT_TESTING' - How I did it Use dict.get instead of [] operator. - How to verify it Manual test commit c3c4905bb1dac2fd201f4647b730b29424e20013 Author: anamehra <54692434+anamehra@users.noreply.github.com> Date: Mon Jan 30 10:05:10 2023 -0800 Fixed admin state config CLI for Backport interfaces (#2557) Fixed admin state config CLI for Backport interfaces Fixes sonic-net/sonic-buildimage#13057 commit 3d53f9930084c87bec12c498d8af625ae04a2a05 Author: zhixzhu <44230426+zhixzhu@users.noreply.github.com> Date: Tue Jan 31 02:02:33 2023 +0800 suppport multi asic for show queue counter (#2439) Added option -n for both "show queue counter" and "queuestat", using multi_asic module in queuestat to query database of specified namespace. Removed function get_queue_port() to decrease the times of connecting database. commit 5556fafc85edaa1d16276e02e7b34959033ffb29 Author: Baorong Liu <96146196+baorliu@users.noreply.github.com> Date: Fri Jan 27 11:19:23 2023 -0800 [show_bfd] add local discriminator in show bfd command (#2625) commit 17609919fd461521090113d1de6a77d5062905c9 Author: jingwenxie <jingwenxie@microsoft.com> Date: Fri Jan 27 15:48:15 2023 +0800 [GCU] Ignore bgpraw table in GCU operation (#2628) What I did After the previous fix #2623 , GCU still fails in the rollback operation. The bgpraw table should be discard in all GCU operation. Thus, I change get_config_db_as_json function to crop out "bgpraw" table. How I did it Pop "bgpraw" table if exists. How to verify it Unittest commit 3db8c009a87e246f6f2e16e5e9f22aca264d4c51 Author: Dante (Kuo-Jung) Su <dante.su@broadcom.com> Date: Thu Jan 26 01:30:55 2023 +0800 Add interface link-training command into the CLI doc (#2257) * LT Admin/Oper: Use 'N/A' when the data is unavailable Signed-off-by: Dante Su <dante.su@broadcom.com> * fix test failure Signed-off-by: Dante Su <dante.su@broadcom.com> * fix coverage failure Signed-off-by: Dante Su <dante.su@broadcom.com> * [doc]: Update Command-Reference.md (#2257) Add interface link-training command into the CLI doc Use 'N/A' if link-training attribute is not supported in the SAI. Signed-off-by: Dante Su <dante.su@broadcom.com> Signed-off-by: Dante Su <dante.su@broadcom.com> commit 28b255afedb04a9214ca8a7bf10c38c5c64d4c48 Author: jingwenxie <jingwenxie@microsoft.com> Date: Wed Jan 25 08:51:16 2023 +0800 [GCU] Ignore bgpraw in GCU applier (#2623) What I did show run all output will include bgpraw for business needs. GCU ipv6 test will update BGP_NEIGHBOR table which caused bgpraw content change, which will make the apply-patch operation fail. The solution is to add bgpraw to ignored tables. How I did it Add new added bgpraw table to ignored backend table. How to verify it Existing Unit test and local E2E GCU test. commit ff5167a1c4f2289b1c7b5cf23c802fa3ccde673a Author: Jing Zhang <zhangjing@microsoft.com> Date: Mon Jan 23 15:49:32 2023 -0800 [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when `radv` service is stopped (#2622) This PR is to add CLI support to enable or disable the feature to send out a good-bye packet when radv service is stopped on active-active dualtor devices. sign-off: Jing Zhang zhangjing@microsoft.com commit ed1d3c99b60bf8547342a1f98f349eac264fe887 Author: jfeng-arista <98421150+jfeng-arista@users.noreply.github.com> Date: Mon Jan 23 13:23:31 2023 -0800 [chassis][voq] Add "show fabric reachability" command. (#2528) What I did Added "show fabric reachability" command. The output of this command : Local Link Remote Module Remote Link Status ------------ --------------- ------------- -------- 0 304 171 up 1 304 156 up 2 304 147 up Added test for the change at tests/fabricstat_test.py. The test is at sonic-net/sonic-mgmt#6620 commit 049bacf95babe50d32d90c68cf7b4825f5a64b46 Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com> Date: Mon Jan 23 17:39:58 2023 +0200 Revert (#2599) b34a540c [generate_dump] Fix for deletion flow for all secret files from show-techsupport dump (#2571) 258ffa09 [generate_dump] Optimize the execution time of 'show techsupport' CLI by parallel function execution (#2512) 572c8cff Optimize the execution time of the 'show techsupport' script to 5-10%, (#2504) This reverts commits b34a540cca5555ab3aa74e19e81f24c2a20d311b 258ffa0928ce2c74ebdc180e13c6476dc2534983 572c8cffdddb7683e158d36067398600a71512ea commit fafb0dfef95607b5b7dc2da0307ebb2bcd4508bf Author: Saikrishna Arcot <sarcot@microsoft.com> Date: Thu Jan 19 14:42:14 2023 -0800 [warm-reboot] Use kexec_file_load instead of kexec_load when available (#2608) On some dev VMs, warm reboot on a VS image fails. Specifically, after kexec is called and the new kernel starts, the new kernel tries to load the initramfs, but fails to do so for whatever reason. There may be messages about gzip decompression failing and that it's corrupted. After some experimentation, it was found that when first loading the new kernel and initramfs into memory, using the `kexec_file_load` syscall (`-s` flag in kexec) worked fine, whereas using the default `kexec_load` syscall resulted in a failure. It's unknown why `kexec_file_load` worked fine when `kexec_load` didn't; there shouldn't be any difference for non-secure boot kernels, as far as I can tell. What was seen, however, was that when taking a KVM dump in the failure case, the memory that stored the initramfs had differences compared to what was on disk. It's unknown what caused these differences. As a workaround (and as a bit of a feature enhancement), use the `-a` flag with kexec, which tells it to use `kexec_file_load` if available, and `kexec_load` if it's not available or otherwise fails. armhf doesn't support `kexec_file_load`, whereas arm64 gained support for `kexec_file_load` in the 5.19 kernel (we're currently on 5.10). `amd64` has supported `kexec_file_load` since 3.17. This also makes it possible to do kexec on secure boot systems, where the kernel image must be loaded via `kexec_file_load`. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> commit 954d9e9f7b1678cc794af34ef1ef782bec8e2ee4 Author: pettershao-ragilenetworks <81281940+pettershao-ragilenetworks@users.noreply.github.com> Date: Fri Jan 20 06:17:18 2023 +0800 fix show techsupport error (#2597) *Modify the order of "--allow-process-stop" option, it belongs to 'generate_dump'. commit 3c8a9309e5a409dd008b84159ea3924209dbf0bf Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com> Date: Thu Jan 19 14:01:17 2023 -0600 [GCU] Prohibit removal of PFC_WD POLL_INTERVAL field (#2545) commit bde706b846e0c47e748ed3491177b3d5ad054175 Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Thu Jan 19 17:33:38 2023 +0200 [techsupport] include APPL_STATE_DB dump (#2607) - What I did I added APPL_STATE_DB to techsupport dump - How I did it Added a call to save APPL_STATE_DB - How to verify it Run techsupport and verify dump/APPL_STATE_DB.json Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> commit cb3d462db82894eb38f1f3f6edd7f39f5a09a060 Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com> Date: Tue Jan 17 15:31:46 2023 -0600 YANG Validation for ConfigDB Updates: RADIUS_SERVER (#2604) #### What I did Add YANG validation using GCU for writes to RADIUS_SERVER table in ConfigDB #### How I did it Using same method as https://github.com/sonic-net/sonic-utilities/pull/2190/files, extend to RADIUS table #### How to verify it verified testing on virtual switch CLI, unit tests commit b01737974e227040e5c3f0e1c48a4b4e8839c4e3 Author: Lior Avramov <73036155+liorghub@users.noreply.github.com> Date: Tue Jan 17 18:37:54 2023 +0200 Remove TODO comment which is no longer relevant (#2600) commit 521ecfd54317291014c584ecf7c11997381ab7c8 Author: jingwenxie <jingwenxie@microsoft.com> Date: Sat Jan 14 09:34:36 2023 +0800 [show] Add bgpraw to show run all (#2537) #### What I did Add bgpraw output to `show runningconfiguration all` ``` Requirements: 1. current `show runningconfig` will print all the ConfigDB in a json format, we need to add a new key-value into the json output "bgpraw" with a long string value 2. The long string value should be the output of `vtysh -c "show run"`. It is normally multiline string, may include special characters like \". Need to make sure the escaping properly 3. We do not need to insert the key-value into ConfigDB is not existing there 4. If ConfigDB already has the key-value, we do not need to override it by vtysh command output 5. Not break multi-asic use ``` #### How I did it Generate bgpraw output then append it to `show runnningconfiguration all`'s output #### How to verify it Mannual test #### Previous command output (if the output of a command-line utility has changed) ``` admin@vlab-01:~$ show run all { "ACL_TABLE": { ...... "WRED_PROFILE": { "AZURE_LOSSLESS": { "ecn": "ecn_all", "green_drop_probability": "5", "green_max_threshold": "2097152", "green_min_threshold": "1048576", "red_drop_probability": "5", "red_max_threshold": "2097152", "red_min_threshold": "1048576", "wred_green_enable": "true", "wred_red_enable": "true", "wred_yellow_enable": "true", "yellow_drop_probability": "5", "yellow_max_threshold": "2097152", "yellow_min_threshold": "1048576" } } } ``` #### New command output (if the output of a command-line utility has changed) ``` admin@vlab-01:~$ show run all { "ACL_TABLE": { ...... "WRED_PROFILE": { "AZURE_LOSSLESS": { "ecn": "ecn_all", "green_drop_probability": "5", "green_max_threshold": "2097152", "green_min_threshold": "1048576", "red_drop_probability": "5", "red_max_threshold": "2097152", "red_min_threshold": "1048576", "wred_green_enable": "true", "wred_red_enable": "true", "wred_yellow_enable": "true", "yellow_drop_probability": "5", "yellow_max_threshold": "2097152", "yellow_min_threshold": "1048576" } }, "bgpraw": "Building configuration...\n\nCurrent configuration......end\n" } ``` commit 83295189cab227d640839c0079207bf17b6442d8 Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com> Date: Fri Jan 13 05:47:22 2023 +0200 Extend fast-reboot STATE_DB entry timer (#2577) *Due to an issue of fallback from to cold-boot when using upgrade with fast-reboot combined with FW upgrade a short term solution is to extend the timer. Long term solution of using fast-reboot finalizer replacing the timer is in work. commit 68a11e77212c09d87d98b4a4724f57e06e6442da Author: Aryeh Feigin <101218333+arfeigin@users.noreply.github.com> Date: Wed Jan 11 10:18:07 2023 +0200 Preserve copp tables through DB migration (#2524) This PR should be merged together with sonic-net/sonic-swss#2548 and is required to 202205 and 202211. This PR implements [fastboot] Preserve CoPP table HLD to improve fastboot flow (sonic-net/SONiC#1107). - What I did Preserve COPP table contents through DB migration. (Mellanox only) - How I did it Skipped deleting of COPP tables in DB migrator. - How to verify it Observe COPP table contents are preserved right after reboot. commit c236b83a7afea4c5479c7cb18555f301847f080c Author: CliveNi <clive.ni@cloudlight.com.hk> Date: Tue Jan 10 01:11:19 2023 +0800 [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) (#2349) * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) - Description Checking that the running image is switched or not after CDB_run during firmware upgrade process. - Motivation and Context CDB_run will maybe cause several seconds NACK or stretching on i2c bus which depend on the implementation of module vendor, checking the status after CDB_run for compatible with different implementation. * Update unit tests for sfputil. Test : Creating "is_fw_switch_done" test, this function expected to return 1 when 'status' == True and running image('result'[1, 5]) different with committed('result'[2, 6]) one, otherwise return -1. * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) - Description Adding error judgements in "is_fw_switch_done" function. Update unit tests for "is_fw_switch_done". - Motivation and Context Checking status of images to avoid committing image with a wrong status. * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) Fixing : Comparing error code with a wrong variable. Refactor : Renaming variables for more suitable its purpose. Refactor : Removing if case which is low correlation with function. Feat : Adding "echo" to display detail result. * Update unit tests for sfputil. * [sfputil] Firmware download/upgrade CLI support for QSFP-DD (#1947) Feat : Reducing frequency of check during "is_fw_switch_done". Refactor : Removing a repeated line. commit 5ac55f06fc3efcfc02450ff33410b1df2e290ddd Author: Qi Luo <qiluo-msft@users.noreply.github.com> Date: Fri Jan 6 17:37:51 2023 -0800 Revert "sonic-utilities: Update config reload() to verify formatting of an input file (#2529)" (#2586) This reverts commit 42f51c26d1d0017f3211904ca19c023b5d784463. Reverts sonic-net/sonic-utilities#2529 Reason: There are use cases like `config reload /dev/stdin`, for example [L2 Switch mode · sonic-net/SONiC Wiki (github.com)](https://github.com/sonic-net/SONiC/wiki/L2-Switch-mode). The original PR would read input file twice, so /dev/stdin does not work. commit 2dc17968b6fa95289aa98fa30ff57eb87afaf231 Author: wenyiz2021 <91497961+wenyiz2021@users.noreply.github.com> Date: Fri Jan 6 15:24:02 2023 -0800 [masic] 'show interfaces counters' reminds to use '-d all' option to check for internal links (#2466) Print reminder to check internal links on multi-asic platforms Signed-off-by: Wenyi Zhang <wenyizhang@microsoft.com> commit 551836f524504cbcf7e9066bfa64104912a545c1 Author: Jing Zhang <zhangjing@microsoft.com> Date: Fri Jan 6 13:28:14 2023 -0800 [storyteller] add link prober state change to story teller (#2585) What I did Add linkprober category to story teller. It will reflect dualtor heartbeat events. sign-off: Jing Zhang zhangjing@microsoft.com How to verify it Tested on dualtor device, was able to grep link prober state change events. commit bfe85fdbd6f4244a0c4d5903a3e6cf75e87f68e6 Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com> Date: Tue Jan 3 11:21:52 2023 +0200 [generate_dump] Fix for deletion flow for all secret files from show-techsupport dump (#2571) - What I did Fixed a deletion flow for all secret files in the tech support dump. - How I did it Delete files by using the find and rm Linux utilities. - How to verify it Run the show_techsupport/test_techsupport_no_secret.py Signed-off-by: Vadym Hlushko <vadymh@nvidia.com> commit 80162b0bf02d6dff88c503a7c7310a7b0a287531 Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Mon Jan 2 15:01:09 2023 +0200 [sonic_installer] use /etc/resolv.conf from the host when migrating packages (#2573) - What I did SONiC package migration has been failing due to the lack of DNS configuration for registries domain names. I used /etc/resolv.conf from host OS when migrating. - How I did it Copy /etc/resolv.conf into new image filesystem during migration, then, restore it back. - How to verify it Run sonic-installer install. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> commit f22d6b0067d570b550b43ed98693cf23bf82a35b Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Thu Dec 29 15:37:38 2022 +0800 [Mellanox] Change severity to NOTICE in Mellanox buffer migrator when unable to fetch DEVICE_METADATA due to empty CONFIG_DB during initialization (#2569) - What I did It is expected that db_migrator is not able to fetch DEVICE_METADATA when it is invoked before the CONFIG_DB is initialized. In this case, we should not use ERROR to log the message since it's not an error. Change the severity to NOTICE - How I did it Change the severity. - How to verify it Manually test. Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 18f9ae1b0e1b4a02646f389b696553074867dcbc Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com> Date: Mon Dec 26 16:00:31 2022 +0800 Fix issue: unconfigured PGs are displayed in watermarkstat (#2556) - What I did All the PGs between minimal and maximal indexes are displayed regardless of whether they are configured. Originally, watermark counters were enabled for all PGs, so there is no issue. Now, watermark counters are enabled only for PGs with buffer configured, eg. if PG 0/2/3/4/6, is configured, PG 0-6 will be displayed, which is confusing by giving users a feeling that PG 7 is lost - How I did it Display valid PGs only - How to verify it Manually test and unit test. - Previous command output (if the output of a command-line utility has changed) Port PG0 PG1 PG2 PG3 PG4 ----------- ----- ----- ----- ----- ----- Ethernet0 0 0 0 0 0 Ethernet2 0 0 0 0 0 Ethernet8 0 0 0 0 0 Ethernet10 0 0 0 0 0 Ethernet16 0 0 0 0 0 Ethernet18 0 0 0 0 0 Ethernet32 0 0 0 0 0 - New command output (if the output of a command-line utility has changed) PG1 won't be displayed if it is not configured Port PG0 PG3 PG4 ----------- ----- ----- ----- Ethernet0 0 0 0 Ethernet2 0 0 0 Ethernet8 0 0 0 Ethernet10 0 0 0 Ethernet16 0 0 0 Ethernet18 0 0 0 Ethernet32 0 0 0 Signed-off-by: Stephen Sun <stephens@nvidia.com> commit 78566674edd15dd8aa618fcde520a7b170452840 Author: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com> Date: Tue Dec 20 17:05:23 2022 +0800 [Command Ref] Add doc for syslog rate limit (#2508) - What I did Add command reference doc for syslog rate limit feature - How I did it Add command reference doc for syslog rate limit feature - How to verify it Manual check Previous command output (if the output of a command-line utility has changed) New command output (if the output of a command-line utility has changed) admin@sonic:~$ show syslog rate-limit-container SERVICE INTERVAL BURST -------------- ---------- ------- bgp 0 0 database 300 20000 lldp 300 20000 mgmt-framework 300 20000 pmon 300 20000 radv 300 20000 snmp 300 20000 swss 300 20000 syncd 300 20000 teamd 300 20000 telemetry 300 20000 admin@sonic:~$ show syslog rate-limit-container bgp SERVICE INTERVAL BURST -------------- ---------- ------- bgp 0 0 commit 5181264f203e1fa5f74ca069ca2a58f4e192d718 Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com> Date: Tue Dec 20 11:04:02 2022 +0200 [generate_dump] Optimize the execution time of 'show techsupport' CLI by parallel function execution (#2512) - What I did Optimize the execution time of the 'show techsupport' script. - How I did it The show techsupport CLI command calls the generate_dump bash script. In the script, there are a many functions that do the next scenario: 1. Run some CLI command 2. Save output from step 1 to the temporary file 3. Append the temporary file from step 2 to the `/var/dump/sonic_dump_XXXX.tar` file 4. Delete the temporary file from step 2 This PR will add the execution of these functions in parallel manner. Also, it will not spawn too many processes to not waste all CPU time. - How to verify it First test scenario Run the `time show techsupport` CLI command and compare the execution time to the original script (with no parallelism), the execution time will be decreased by 10-20%. Second test scenario 1. Stuck the FW by using next commands a. mcra /dev/mst/mt52100_pci_cr0 0xa01e4 0x10 b. mcra /dev/mst/mt52100_pci_cr0 0xa05e4 0x10 c. mcra /dev/mst/mt52100_pci_cr0 0xa07e4 0x10 d. mcra /dev/mst/mt52100_pci_cr0 0xa09e4 0x10 e. mcra /dev/mst/mt52100_pci_cr0 0xa0be4 0x10 f. mcra /dev/mst/mt52100_pci_cr0 0xa0de4 0x10 g. mcra /dev/mst/mt52100_pci_cr0 0xa0fe4 0x10 2. Run the `time show techsupport` CLI command and compare the execution time to the original script (with no parallelism), the execution time will be decreased by up to 50% because inside the script we launch CLI commands with `timeout --foreground 5m`. Signed-off-by: Vadym Hlushko <vadymh@nvidia.com> commit 1ca3fedc4575c04b6578e6c5c66dac353be27072 Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Mon Dec 19 07:32:18 2022 +0200 [timer.unit.j2] use wanted-by in timer unit (#2546) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> Signed-off-by: Stepan Blyschak <stepanb@nvidia.com> commit b3e7f1c07d18542d700b8162c164fbd24544f505 Author: Preetham <51771885+preetham-singh@users.noreply.github.com> Date: Fri Dec 16 23:38:03 2022 +0530 Fixes #12170: Delete subinterface and recreate the subinterface in (#2513) * Fixes #12170: Delete subinterface and recreate the subinterface in default-vrf while unbinding subinterface from user defined vrf. commit 44c5d1c23a1632cfb316ae1d93b3e4cbeeb3934e Author: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Date: Thu Dec 15 23:21:58 2022 -0800 [db_migrator] Fix migration of Loopback data: handle all Loopback interfaces (#2560) Fix the issue where cross branch upgrades (base DB version 1_0_1) lead to a OA crash due to a duplicate IP2ME route being added when there are more than one Loopback interfaces. The issue happens as in current implementation lo is hardcoded to be replaced as Loopback0. When the base image's APP DB has more than one IP assigned to lo interface, upon migration, all the IPs are assinged to same loopback Loopback0. This is incorrect, as in newer images different IPs are assinged to distinct Loopback interfaces. How to verify it Verified on a physical testbed that this change fixes the OA crash issue. Also added a unit test to catch this issue in PR tests. commit ccefd454dd53c6332815b28a58fa20fd24215fdc Author: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com> Date: Thu Dec 15 09:55:08 2022 +0200 Optimize the execution time of the 'show techsupport' script to 5-10%, (#2504) - What I did Optimize the execution time of the 'show techsupport' script to 5-10%. - How I did it The show techsupport CLI command calls the generate_dump bash script. In the script, there are a many functions that do the next scenario: 1. Run some CLI command 2. Save output from step 1 to the temporary file 3. Append the temporary file from step 2 to the `/var/dump/sonic_dump_XXXX.tar` file 4. Delete the temporary file from step 2 This PR removes the 3 and 4 step from those functions and creates a new function save_to_tar() which will add to .tar archive the whole directory with temporary files (which means it will not spawn a tar -v -rhf ... process for each temporary file) - How to verify it Run the time show techsupport CLI command and compare the execution time to the original script, the execution time will be decreased by 5-10%. Signed-off-by: Vadym Hlushko <vadymh@nvidia.com> commit 4f825d9849a7def94fa3dfe9be8f22a88f50aa1f Author: Jing Zhang <zhangjing@microsoft.com> Date: Wed Dec 14 10:13:43 2022 -0800 [muxcable][show] update `show mux tunnel-route` to separate ASIC and kernel into two columns (#2553) Stemming from sonic-net/sonic-swss#2557. This PR is to update show mux tunnel-route command to show status of both ASIC and kernel tunnel routes. sign-off: Jing Zhang zhangjing@microsoft.com What I did How I did it How to verify it Previous command output (if the output of a command-line utility has changed) Only check Kernel Route, if removing tunnel route for server_ipv4 in kernel, it won't show in CMD output: zhangjing@********************:~$ show mux tunnel-route Ethernet4 PORT DEST_TYPE DEST_ADDRESS --------- ----------- -------------------------------- Ethernet4 server_ipv6 2603:10b0:d11:8614::a32:9112/128 New command output (if the output of a command-line utility has changed) Check both ASIC and APP DB for tunnel route status zhangjing@********************:~$ show mux tunnel-route Ethernet4 PORT DEST_TYPE DEST_ADDRESS kernel asic --------- ----------- -------------------------------- -------- ------ Ethernet4 server_ipv4 10.50.145.18/32 - added Ethernet4 server_ipv6 2603:10b0:d11:8614::a32:9112/128 added added commit d5465ed5b22ead0101ad2aaabf44050773968cfd Author: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com> Date: Tue Dec 13 23:27:57 2022 -0800 [show]Fix show route return code on error (#2542) - What I did Fix show route return command to return error code on failure cases. The parameter return_cmd=True in run_command will suppress the return code and return success even in error scenarios. - How I did it When run command is called with return_cmd = True, modified its return to include return code, which can then be used to assess if there is an error by the caller - How to verify it Added UT to verify it - Previous command output (if the output of a command-line utility has changed) root@sonic:/home/admin# show ip route 123 % Unknown command: show ip route 123 root@sonic:/home/admin# echo $? 0 - New command output (if the output of a command-line utility has changed) root@sonic:/home/admin# show ip route 123 % Unknown command: show ip route 123 root@sonic:/home/admin# echo $? 1 commit 14936d7ef6a46745dd9d8b6c07e0de476695cd6e Author: Lawrence Lee <lawlee@microsoft.com> Date: Mon Dec 12 17:07:27 2022 -0800 [route_check]: Ignore ASIC only SOC IPs (#2548) * [tests]: Improve route check test - Split test into separate methods based on functionality being tested - Parametrize main test method for better granularity when viewing results/running test cases - Add config DB mocking support - Move some setup/teardown code to fixtures for better consistency - Extract test data to separate file - Ignore routes for SOC IPs that are only present in the ASIC - Add test case to cover ASIC only SOC IPs Signed-off-by: Lawrence Lee <lawlee@microsoft.com> commit 609f18fed063cf5c299328e2f6ca36c907cc1883 Author: isabelmsft <67024108+isabelmsft@users.noreply.github.com> Date: Thu Dec 8 11:12:50 2022 -0600 YANG Validation for ConfigDB Updates: WARM_RESTART, SFLOW_SESSION, SFLOW, VXLAN_TUNNEL, VXLAN_EVPN_NVO, VXLAN_TUNNEL_MAP, MGMT_VRF_CONFIG, CABLE_LENGTH, VRF tables (#2526) commit 83aa5fb9e7671ee9871af4ccee764ad5ef84cf0f Author: isabelmsft <isabel.li@microsoft.com> Date: Fri Mar 17 19:15:47 2023 +0000 UT coverage commit da9db448985ce138059e8c18b986a8c8e70035d1 Author: isabelmsft <isabel.li@microsoft.com> Date: Tue Feb 7 02:17:25 2023 +0000 add sflow collector commit c4caaf80adec48a2010c3be3e8fb0fedf696da9b Author: isabelmsft <isabel.li@microsoft.com> Date: Mon Feb 6 21:54:44 2023 +0000 fix UT commit cda464a32738d37e71589dc0ac982be43908734d Author: isabelmsft <isabel.li@microsoft.com> Date: Sat Feb 4 09:06:35 2023 +0000 fix UT commit 1251c805a6adc370c0aeb37dd06d3927a712468b Author: isabelmsft <isabel.li@microsoft.com> Date: Wed Feb 1 02:38:19 2023 +0000 fix UT commit c704b71c893f17062f2b7eab54ebc9ad65fa2a6c Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 07:22:54 2023 +0000 fix UT commit f83abb274c4e02ee31e3df0353cbbc784d9601a0 Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 05:32:03 2023 +0000 fix UT commit 966a0e0fe75cc542c6ca4f3dcca54f9b5d54e25b Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 04:37:26 2023 +0000 add UT commit 92a5dc20feb6e19a0515c1ce7e41c75efe5281ae Author: isabelmsft <isabel.li@microsoft.com> Date: Thu Mar 23 00:55:15 2023 +0000 add UT commit 34a61f3bb7c25fb2b3c8aedceeb311f5c3c8ef84 Merge: 582bac06 10f31ea6 Author: isabelmsft <isabel.li@microsoft.com> Date: Wed Mar 22 22:11:09 2023 +0000 Merge remote-tracking branch 'origin/master' into mux_mclag commit 10f31ea6fb0876f913cfcfce8c95011e675a99f6 Author: Mai Bui <maibui@microsoft.com> Date: Tue Mar 21 00:25:39 2023 -0400 Revert "Replace pickle by json (#2636)" (#2746) This reverts commit 54e26359fccf45d2e40800cf5598a725798634cd. Due to https://github.com/sonic-net/sonic-buildimage/issues/14089 Signed-off-by: Mai Bui <maibui@microsoft.com> commit 05fa7513355cf333818c480fade157bdff969811 Author: abdosi <58047199+abdosi@users.noreply.github.com> Date: Fri Mar 17 16:27:48 2023 -0700 Fix the `show interface counters` throwing exception on device with no external interfaces (#2703) Fix the `show interface counters` throwing exception issue where device do not have any external ports and all are internal links (ethernet or fabric) which is possible in chassis commit 582bac065ee067db6ad06ca71296fc70a4ebcb57 Author: isabelmsft <isabel.li@microsoft.com> Date: Fri Mar 17 19:15:47 2023 +0000 UT coverage commit f27dea0cfdefbdcfc03d19136e4ae47ea72fd51f Author: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com> Date: Fri Mar 17 09:10:47 2023 +0200 [route_check] remove check-frr_patch mock (#2732) The test fails with python3.7 (works in 3.9) when stopping patch which hasn't been started. We can always mock check_output call and if FRR_ROUTES is not defined return empty dictionary by the mock. #### What I did Removed check_frr_patch mock to fix UT running on python3.7 #### How I did it Removed the mock #### How to verify it Run unit test in st…
yxieca
pushed a commit
that referenced
this pull request
Mar 27, 2023
#2608) On some dev VMs, warm reboot on a VS image fails. Specifically, after kexec is called and the new kernel starts, the new kernel tries to load the initramfs, but fails to do so for whatever reason. There may be messages about gzip decompression failing and that it's corrupted. After some experimentation, it was found that when first loading the new kernel and initramfs into memory, using the `kexec_file_load` syscall (`-s` flag in kexec) worked fine, whereas using the default `kexec_load` syscall resulted in a failure. It's unknown why `kexec_file_load` worked fine when `kexec_load` didn't; there shouldn't be any difference for non-secure boot kernels, as far as I can tell. What was seen, however, was that when taking a KVM dump in the failure case, the memory that stored the initramfs had differences compared to what was on disk. It's unknown what caused these differences. As a workaround (and as a bit of a feature enhancement), use the `-a` flag with kexec, which tells it to use `kexec_file_load` if available, and `kexec_load` if it's not available or otherwise fails. armhf doesn't support `kexec_file_load`, whereas arm64 gained support for `kexec_file_load` in the 5.19 kernel (we're currently on 5.10). `amd64` has supported `kexec_file_load` since 3.17. This also makes it possible to do kexec on secure boot systems, where the kernel image must be loaded via `kexec_file_load`. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
kellyyeh
pushed a commit
to kellyyeh/sonic-utilities
that referenced
this pull request
May 10, 2023
…available (sonic-net#2608)" This reverts commit 93c7d43.
kellyyeh
added a commit
to kellyyeh/sonic-utilities
that referenced
this pull request
May 10, 2023
…available (sonic-net#2608)" This reverts commit 93c7d43.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Saikrishna Arcot sarcot@microsoft.com
What I did
On some dev VMs, warm reboot on a VS image fails. Specifically, after kexec is called and the new kernel starts, the new kernel tries to load the initramfs, but fails to do so for whatever reason. There may be messages about gzip decompression failing and that it's corrupted.
After some experimentation, it was found that when first loading the new kernel and initramfs into memory, using the
kexec_file_load
syscall (-s
flag in kexec) worked fine, whereas using the defaultkexec_load
syscall resulted in a failure. It's unknown whykexec_file_load
worked fine whenkexec_load
didn't; there shouldn't be any difference for non-secure boot kernels, as far as I can tell. What was seen, however, was that when taking a KVM dump in the failure case, the memory that stored the initramfs had differences compared to what was on disk. It's unknown what caused these differences.As a workaround (and as a bit of a feature enhancement), use the
-a
flag with kexec, which tells it to usekexec_file_load
if available, andkexec_load
if it's not available or otherwise fails. armhf doesn't supportkexec_file_load
, whereas arm64 gained support forkexec_file_load
in the 5.19 kernel (we're currently on 5.10).amd64
has supportedkexec_file_load
since 3.17. This also makes it possible to do kexec on secure boot systems, where the kernel image must be loaded viakexec_file_load
.How I did it
How to verify it
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)