-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address Dell issue#46 : Adding MUX reset logic to fix probe failures #2356
Conversation
@stcheng, any update on this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check the comments, thanks!
# "echo [mux driver] <i2c-mux-address> > <i2c-bus/operation>" i2c-channel-first | ||
# where operation = "new_device" | ||
# i2c-channel-first is the first of the 8 channels that this mux should create | ||
i2c_mux_create() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for me, this piece of code is a little bit hard to understand. since the i2c tree is fixed on each platform, it would be better to have some more predefined numbers instead of number manipulation here.
the function could be simpler like i2c_mux_create <root> <sub_root> <first>
so that in the script you only need to call the function e.g. i2c_mux_create 4 0x71 10 or i2c_mux_create 7 0x71 50 or i2c_mux_create 7 0x72 58
the echo command and the details of the retry could be inside the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this.
i2c_bus_num=`echo $(basename $i2c_bus) | cut -d'-' -f 2` | ||
i2c_mux=`echo "$1" | cut -d' ' -f 3 | sed 's/^0x//'` | ||
i2c_mux_channel_first=$i2c_bus/i2c-$2 | ||
i2c_mux_channel_last=$i2c_bus/i2c-$(expr $2 + $MAX_MUX_CHANNELS - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above logic is too complex to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
# where operation = "new_device" | ||
# i2c-channel-first is the first of the 8 channels that this mux should create | ||
i2c_mux_create() { | ||
local MAX_MUX_CHANNEL_RETRY=5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel 5 is too large, could you change to 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this
break; | ||
else | ||
# If the channel did not get created, remove the mux, reset the mux tree and retry | ||
i2c_config "echo 0x$i2c_mux > /sys/bus/i2c/devices/i2c-$devnum/i2c-$i2c_bus_num/delete_device" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you rename the function i2c_config
to i2c_mux_remove
? and make the function symmetric with i2c_mux_remove
?
else | ||
# If the channel did not get created, remove the mux, reset the mux tree and retry | ||
i2c_config "echo 0x$i2c_mux > /sys/bus/i2c/devices/i2c-$devnum/i2c-$i2c_bus_num/delete_device" | ||
reset_muxes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before resetting the mux, could you add one line specific log indicating which mux failed to be initialized? what's the root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added debug with mux details.
@@ -191,6 +195,28 @@ xcvr_presence_interrupts() { | |||
esac | |||
} | |||
|
|||
# Reset the mux tree | |||
reset_muxes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for resetting the muxes, is it possible to reset only one mux per the failure? or do we need to reset all the muxes? if we reset all the muxes, do we need to re-build the whole i2c tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to reset only the specific mux that are part of this channel - however the complexity of that logic is not warranted. Resetting the muxes does not necessitate rebuilding the kernel's i2c tree.
By the way, could you also change the title of the pull request to indicate what is the change of the pull request? e.g. Adding MUX reset logic to fix probe failure issue. |
agree with shuotian. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you change the title of the commit to remove the "Address Dell issue#46 :" part?
you could add some more details in the commit message.
What I did This PR is to fix the issue of adding mux_acl_rule into IngressTableDrop. The error log is Jun 25 08:02:37.159020 svcstr-7050-acs-4 ERR swss#orchagent: :- validateAclRuleMatch: Match SAI_ACL_ENTRY_ATTR_FIELD_IN_PORTS in rule mux_acl_rule is not supported by table IngressTableDrop PR sonic-net#2341 added support for different matching field in different stage (INGRESS/EGRESS). For example, SAI_ACL_ENTRY_ATTR_FIELD_IN_PORTS is only supported at INGRESS stage. However, PR sonic-net#2341 only handled one path for creating ACL table, that is by CONFIG_DB entry. There is a case that addAclTable is directly called from other orch, such as MuxOrch. In that case, the stage dependent matcing field is not added. As a resule, we will see the above error logs. To address the issue, I moved the call of addStageMandatoryMatchFields from doAclTableTask to addAclTable to ensure addStageMandatoryMatchFields is always called. Please be noted that addMandatoryActions is called from both doAclTableTask and addAclTable to ensure the validation of ACL table is passing. Why I did it To fix ACL rule issue for mux. How I verified it Verified by running test_pfcwd Verified by checking syslog Signed-off-by: bingwang <wang.bing@microsoft.com>
…x-kernel] advance submodule head linkmgrd: * ab5b2c1 2022-09-02 | Fix mux config (sonic-net#128) (HEAD -> 202205, github/202205) [Longxiang Lyu] utilities: * 7de9305 2022-09-07 | [generate dump]Added error message when saisdkdump fails (sonic-net#2356) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam] * c5b0a6d 2022-09-07 | [counterpoll]Fixing counterpoll show for tunnel and acl stats (sonic-net#2355) [Sudharsan Dhamal Gopalarathnam] * 1452b44 2022-09-05 | [GCU] Fix missing backend in dry run (sonic-net#2347) [jingwenxie] * bc7b845 2022-09-04 | Add Password Hardening CLI support (sonic-net#2338) [davidpil2002] * 55e8948 2022-09-06 | [fast-reboot]Avoid stopping masked services during fast-reboot (sonic-net#2335) [Sudharsan Dhamal Gopalarathnam] * f7d69d4 2022-08-30 | Replace cmp in acl_loader with operator.eq (sonic-net#2328) [Zhaohui Sun] * 4054ebb 2022-09-05 | Add verification for override (sonic-net#2305) [jingwenxie] * 729d811 2022-05-30 | Fix sonic-installer and 'show version' command crash when database docker not running issue. (sonic-net#2183) [Hua Liu] platform-daemons: * 36ba7c0 2022-09-07 | [ycable] cleanup logic for creating grpc future ready (sonic-net#289) (HEAD -> 202205) [vdahiya12] * 2a9db73 2022-09-01 | [ycabled] fix insert events from xcvrd;cleanup some mux toggle logic (sonic-net#287) [vdahiya12] platform-common: * d7c990d 2022-09-03 | [CMIS] 'get_transceiver_info' should return 'None' when CMIS cable EEPROM is not ready (sonic-net#305) (HEAD -> 202205) [Kebo Liu] linux-kernel: * 25ea052 2022-08-31 | [patch]: Add accpt_untracked_na kernel param (sonic-net#292) (HEAD -> 202205) [Lawrence Lee] Signed-off-by: Ying Xie <ying.xie@microsoft.com>
…x-kernel] advance submodule head (#12025) linkmgrd: * ab5b2c1 2022-09-02 | Fix mux config (#128) (HEAD -> 202205, github/202205) [Longxiang Lyu] utilities: * 7de9305 2022-09-07 | [generate dump]Added error message when saisdkdump fails (#2356) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam] * c5b0a6d 2022-09-07 | [counterpoll]Fixing counterpoll show for tunnel and acl stats (#2355) [Sudharsan Dhamal Gopalarathnam] * 1452b44 2022-09-05 | [GCU] Fix missing backend in dry run (#2347) [jingwenxie] * bc7b845 2022-09-04 | Add Password Hardening CLI support (#2338) [davidpil2002] * 55e8948 2022-09-06 | [fast-reboot]Avoid stopping masked services during fast-reboot (#2335) [Sudharsan Dhamal Gopalarathnam] * f7d69d4 2022-08-30 | Replace cmp in acl_loader with operator.eq (#2328) [Zhaohui Sun] * 4054ebb 2022-09-05 | Add verification for override (#2305) [jingwenxie] * 729d811 2022-05-30 | Fix sonic-installer and 'show version' command crash when database docker not running issue. (#2183) [Hua Liu] platform-daemons: * 36ba7c0 2022-09-07 | [ycable] cleanup logic for creating grpc future ready (#289) (HEAD -> 202205) [vdahiya12] * 2a9db73 2022-09-01 | [ycabled] fix insert events from xcvrd;cleanup some mux toggle logic (#287) [vdahiya12] platform-common: * d7c990d 2022-09-03 | [CMIS] 'get_transceiver_info' should return 'None' when CMIS cable EEPROM is not ready (#305) (HEAD -> 202205) [Kebo Liu] linux-kernel: * 25ea052 2022-08-31 | [patch]: Add accpt_untracked_na kernel param (#292) (HEAD -> 202205) [Lawrence Lee] Signed-off-by: Ying Xie <ying.xie@microsoft.com> Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Update sonic-utilities submodule pointer to include the following: * 4d377a6 [subinterface]Added additional checks in portchannel and subinterface commands ([sonic-net#2345](sonic-net/sonic-utilities#2345)) * bbcdf2e disk_check: Publish event for RO state ([sonic-net#2320](sonic-net/sonic-utilities#2320)) * 3fd537b Support the bandit check by GitHub Action ([sonic-net#2358](sonic-net/sonic-utilities#2358)) * 491d3d3 [generate dump]Added error message when saisdkdump fails ([sonic-net#2356](sonic-net/sonic-utilities#2356)) * 6830e01 [counterpoll]Fixing counterpoll show for tunnel and acl stats ([sonic-net#2355](sonic-net/sonic-utilities#2355)) * 3be2ad7 [fast-reboot]Avoid stopping masked services during fast-reboot ([sonic-net#2335](sonic-net/sonic-utilities#2335)) * 0e1b0cf [GCU] Fix missing backend in dry run ([sonic-net#2347](sonic-net/sonic-utilities#2347)) * 676c31b Add verification for override ([sonic-net#2305](sonic-net/sonic-utilities#2305)) * 48997c2 Add Password Hardening CLI support ([sonic-net#2338](sonic-net/sonic-utilities#2338)) * 414e239 update unit tests for swap ([#locato](https://github.com/Azure/sonic-utilities/pull/locato)) * a91a492 consider swap checking memory in ([#stalle](https://github.com/Azure/sonic-utilities/pull/stalle)) * f0ce586 [route_check]: Ignore standalone tunnel routes ([sonic-net#2325](sonic-net/sonic-utilities#2325)) Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
Update sonic-utilities submodule pointer to include the following: * 0a7557b [minigraph] add option to specify golden path in load_minigraph ([sonic-net#2350](sonic-net/sonic-utilities#2350)) * 322aefc [GCU]Remove GCU unique lane check for duplicate lanes platforms ([sonic-net#2343](sonic-net/sonic-utilities#2343)) * 7099fff [fastboot] fastboot enhancement: Use warm-boot infrastructure for fast-boot ([sonic-net#2286](sonic-net/sonic-utilities#2286)) * 09026ed [warm-reboot] fix warm-reboot when /tmp/cache is missing ([sonic-net#2367](sonic-net/sonic-utilities#2367)) * a3c404c Fix typo in platform_sfputil_helper.is_rj45_port ([sonic-net#2374](sonic-net/sonic-utilities#2374)) * 637d834 Vnet_route_check Vxlan tunnel route update. ([sonic-net#2281](sonic-net/sonic-utilities#2281)) * 29a3e51 Added support for tunnel route status in show vnet routes all. ([sonic-net#2341](sonic-net/sonic-utilities#2341)) * 1ac584b Use 'default' VRF when VRF name is not provided ([sonic-net#2368](sonic-net/sonic-utilities#2368)) * 4d377a6 [subinterface]Added additional checks in portchannel and subinterface commands ([sonic-net#2345](sonic-net/sonic-utilities#2345)) * bbcdf2e disk_check: Publish event for RO state ([sonic-net#2320](sonic-net/sonic-utilities#2320)) * 3fd537b Support the bandit check by GitHub Action ([sonic-net#2358](sonic-net/sonic-utilities#2358)) * 491d3d3 [generate dump]Added error message when saisdkdump fails ([sonic-net#2356](sonic-net/sonic-utilities#2356)) * 6830e01 [counterpoll]Fixing counterpoll show for tunnel and acl stats ([sonic-net#2355](sonic-net/sonic-utilities#2355)) * 3be2ad7 [fast-reboot]Avoid stopping masked services during fast-reboot ([sonic-net#2335](sonic-net/sonic-utilities#2335)) * 0e1b0cf [GCU] Fix missing backend in dry run ([sonic-net#2347](sonic-net/sonic-utilities#2347)) * 676c31b Add verification for override ([sonic-net#2305](sonic-net/sonic-utilities#2305)) * 48997c2 Add Password Hardening CLI support ([sonic-net#2338](sonic-net/sonic-utilities#2338)) * 414e239 update unit tests for swap ([#locato](https://github.com/sonic-net/sonic-utilities/pull/locato)) * a91a492 consider swap checking memory in ([#stalle](https://github.com/sonic-net/sonic-utilities/pull/stalle)) * f0ce586 [route_check]: Ignore standalone tunnel routes ([sonic-net#2325](sonic-net/sonic-utilities#2325)) Signed-off-by: dprital <drorp@nvidia.com>
0a7557bd9 [minigraph] add option to specify golden path in load_minigraph (#2350) 322aefc37 [GCU]Remove GCU unique lane check for duplicate lanes platforms (#2343) 7099fffa7 [fastboot] fastboot enhancement: Use warm-boot infrastructure for fast-boot (#2286) 09026edbb [warm-reboot] fix warm-reboot when /tmp/cache is missing (#2367) a3c404c74 Fix typo in platform_sfputil_helper.is_rj45_port (#2374) 637d834ce Vnet_route_check Vxlan tunnel route update. (#2281) 29a3e5180 Added support for tunnel route status in show vnet routes all. (#2341) 1ac584bb3 Use 'default' VRF when VRF name is not provided (#2368) 4d377a620 [subinterface]Added additional checks in portchannel and subinterface commands (#2345) bbcdf2ed7 disk_check: Publish event for RO state (#2320) 3fd537b0a Support the bandit check by GitHub Action (#2358) 491d3d380 [generate dump]Added error message when saisdkdump fails (#2356) 6830e01ec [counterpoll]Fixing counterpoll show for tunnel and acl stats (#2355) 3be2ad7de [fast-reboot]Avoid stopping masked services during fast-reboot (#2335) 0e1b0cf20 [GCU] Fix missing backend in dry run (#2347) 676c31bd0 Add verification for override (#2305) 48997c266 Add Password Hardening CLI support (#2338) 414e239ea update unit tests for swap allocator a91a4922f consider swap checking memory in installer f0ce58635 [route_check]: Ignore standalone tunnel routes (#2325)
Related work items: sonic-net#2151, sonic-net#2194, sonic-net#2224, sonic-net#2237, sonic-net#2264, sonic-net#2281, sonic-net#2286, sonic-net#2297, sonic-net#2299, sonic-net#2305, sonic-net#2325, sonic-net#2335, sonic-net#2338, sonic-net#2341, sonic-net#2343, sonic-net#2347, sonic-net#2350, sonic-net#2355, sonic-net#2356, sonic-net#2358, sonic-net#2360, sonic-net#2363, sonic-net#2367, sonic-net#2368, sonic-net#2370, sonic-net#2374, sonic-net#2392, sonic-net#2398, sonic-net#2408, sonic-net#2414, sonic-net#2415, sonic-net#2419, sonic-net#2421, sonic-net#2422, sonic-net#2423, sonic-net#2426, sonic-net#2427, sonic-net#2430, sonic-net#2431, sonic-net#2433, sonic-net#2434, sonic-net#2436, sonic-net#2437, sonic-net#2441, sonic-net#2444, sonic-net#2445, sonic-net#2446, sonic-net#2456, sonic-net#2458, sonic-net#2460, sonic-net#2461, sonic-net#2463, sonic-net#2472, sonic-net#2475, sonic-net#11877, sonic-net#12024, sonic-net#12065, sonic-net#12097, sonic-net#12130, sonic-net#12209, sonic-net#12217, sonic-net#12244, sonic-net#12251, sonic-net#12255, sonic-net#12276, sonic-net#12284
0a7557bd9 [minigraph] add option to specify golden path in load_minigraph (sonic-net#2350) 322aefc37 [GCU]Remove GCU unique lane check for duplicate lanes platforms (sonic-net#2343) 7099fffa7 [fastboot] fastboot enhancement: Use warm-boot infrastructure for fast-boot (sonic-net#2286) 09026edbb [warm-reboot] fix warm-reboot when /tmp/cache is missing (sonic-net#2367) a3c404c74 Fix typo in platform_sfputil_helper.is_rj45_port (sonic-net#2374) 637d834ce Vnet_route_check Vxlan tunnel route update. (sonic-net#2281) 29a3e5180 Added support for tunnel route status in show vnet routes all. (sonic-net#2341) 1ac584bb3 Use 'default' VRF when VRF name is not provided (sonic-net#2368) 4d377a620 [subinterface]Added additional checks in portchannel and subinterface commands (sonic-net#2345) bbcdf2ed7 disk_check: Publish event for RO state (sonic-net#2320) 3fd537b0a Support the bandit check by GitHub Action (sonic-net#2358) 491d3d380 [generate dump]Added error message when saisdkdump fails (sonic-net#2356) 6830e01ec [counterpoll]Fixing counterpoll show for tunnel and acl stats (sonic-net#2355) 3be2ad7de [fast-reboot]Avoid stopping masked services during fast-reboot (sonic-net#2335) 0e1b0cf20 [GCU] Fix missing backend in dry run (sonic-net#2347) 676c31bd0 Add verification for override (sonic-net#2305) 48997c266 Add Password Hardening CLI support (sonic-net#2338) 414e239ea update unit tests for swap allocator a91a4922f consider swap checking memory in installer f0ce58635 [route_check]: Ignore standalone tunnel routes (sonic-net#2325)
- What I did
Address Dell issue #46 : sonic-net/sonic-platform-modules-dell#46
- How I did it
After adding a mux, perform extra sanity checks to verify if the channels do get created. If not, retry with an increasing delay per retry upto 2 secs.
- How to verify it
Kept a mux under reset to attempt to add mux which results in a similar symptoms as was seen when the probe failed. Verified the mux reset during the retry as well as the sanity checks post every mux creation.
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)