Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] Redirect ethtool stderr to subprocess for better error log #150

Closed
wants to merge 25 commits into from

Conversation

Junchao-Mellanox
Copy link
Owner

Why I did it

ethtool print error logs when EEPROM of a SFP is not available. It prints error like this:

INFO pmon#/supervisord: xcvrd Cannot get module EEPROM information: Input/output error
INFO pmon#/supervisord: xcvrd Cannot get Module EEPROM data: Invalid argument

However, this log does not contain the relevant SFP index which is hard for developer/qa to find the exactly SFP.

How I did it

Redirect ethtool stderr to subprocess and log it better

How to verify it

Manual test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

abdosi and others added 23 commits August 31, 2022 10:48
…ic-net#11900)

Why I did it:
API get_device_runtime_metadata() added by sonic-net#11795 uses merge operator for dict but that is supported only for python version >=3.9. This API will be be used by scrips eg:hostcfgd which is still build for buster which does not have python 3.9 support.
…esting with cisco 8102 (sonic-net#11901)

Why I did it
After PFC interop testing between 8102 and 7050cx3, data packet losses were observed on the Rx ports of the 7050cx3 (inflow from 8102) during testing. This was primarily due to the slower response times to react to PFC pause packets for the 8102, when receiving such frames from neighboring devices. To solve for the packet drops, the 7050cx3 pg headroom size has to be increased to 160kB.

How I did it
Modified the xoff threshold value to 160kB in the pg_profile file to allow for the buffer manager to read that value when building the image, and configuring the device

How to verify it
run "mmuconfig -l" once image is built


Signed-off-by: dojha <devojha@microsoft.com>
…et#11821)

[mux] Exit to write standby state to `active-active` ports

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
How I did it
Create YANG Model for SONiC console related features.

How to verify it
Add tests

Signed-off-by: Jing Kan jika@microsoft.com
Why I did it
In latest syncd container, it is installed bullseye, can't find command '/usr/bin/python'.
Some scripts such as test_copp still calls /usr/bin/python in syncd.
Submitted the change in sonic-net#11807 for syncd docker, but it's better to add it in bullseye base docker.

How I did it
Install python-is-python3 package in bullseye base docker to resolve this issue, whatever run python or python3, it will run /usr/bin/python3, will not cause the error of can't find command '/usr/bin/python'

How to verify it
run python in syncd container.

Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
Why I did it
Address issue sonic-net#10970

sign-off: Jing Zhang zhangjing@microsoft.com

How I did it
Add sonic-mux-cable.yang and unit tests.

How to verify it
Compile Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl and target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl.
Pass sonic-config-engine unit test.
Which release branch to backport (provide reason below if selected)
 201811
 201911
 202006
 202012
 202106
 202111
 202205
Description for the changelog
Link to config_db schema for YANG module changes
https://github.com/sonic-net/sonic-buildimage/blob/f8fe41a0238b8a7b9e32ae42262f41b63050c55f/src/sonic-yang-models/doc/Configuration.md#mux_cable
Why I did it
Z9432F Update the kernel dependency of platform module

How I did it
Modified the kernel version to current latest 5.10.0-12-2
After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
* [mux] skip mux operations during warm shutdown

- Enhance write_standby.py script to skip actions during warm shutdown.
- Expand the support to BGP service.
- MuX support was added by a previous PR.
- don't skip action during warm recovery

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
* Define whether a test is required by code

Why I did it
Define whether a test job is required before merging by code.
Let the failure of multi-asic and t0-sonic don't block pr merge.
The 'required' configuration can be modified by the owner in the future.

How I did it
Required:
t1-lag, t0
Not required:
multi-asic, t0-sonic

How to verify it
AZP itself verifies it.

Signed-off-by: jianquanye@microsoft.com
Include following commit:

- 443253f [patch]: Add accpt_untracked_na kernel param (sonic-net#292)

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
* 651f52b (HEAD, origin/master, origin/HEAD, master) Change syslog level of Event Publish (sonic-net#677)
* aca253a Add routing rule table for DASH (sonic-net#668)
With this PR in, you flap BGP and use events_tool to see the published events.
With telemetry PR #111 in and corresponding submodule update done in buildimage, one could run gnmi_cli to capture BGP flap events.
…-net#11923)

- Why I did it
Remove extra empty lines in the SN2201 pg_profile_lookup.ini to make it aligned with other platforms.
This extra empty line could confuse some test cases which need to parse this file.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
Update sonic-utilities submodule pointer to include the following:

Replace cmp in acl_loader with operator.eq (sonic-net#2328)
Subinterface vrf bind issue fix (sonic-net#2211)
[VRF]Adding CLI checks to ensure Vrf is valid in interface bind and static route commands (sonic-net#2333)
[doc]: Add MACsec CLI doc (sonic-net#2334)
[sonic-package-manager] Drop 'expires_in' (sonic-net#2002)
Handle non-front-panel ports in is_rj45_port (sonic-net#2327)
[service_mgmt]: Fix fetch MULTI_INST_DEPENDENT bug in service_mgmt.sh.j2 (sonic-net#2319)
correct an error by changing "show bgp summary" to "show bfd summary" (sonic-net#2324)
Update VRF unbind command (sonic-net#2331)
Fix issue: port_type is referenced before initialized (sonic-net#2323)
Fix issue: exception in is_rj45_port in multi ASIC env (sonic-net#2313)
Delete .DS_Store (sonic-net#2244)
Fix bug with checking VRF's routes in route_check.py (sonic-net#2301)
[decode-syseeprom] Fix setting use_db based on support_eeprom_db (sonic-net#2270)
Fix vrf UT failed issue (sonic-net#2309)
add lacp_rate to portchannel (sonic-net#2036)
- Why I did it
Update sonic-platform-daemons submodule pointer to include the following:

[ycabled] enable telemetry for 'active-active'; fix gRPC portid ordering (sonic-net#284)
[ycabled] remove some spurious logs (sonic-net#282)
Correct the peer forwarding state table (sonic-net#281)
add psu input voltage and current (sonic-net#276)
[ycabled] add capability to enable/disable telemetry (sonic-net#279)

Signed-off-by: dprital <drorp@nvidia.com>
It could happen that a container has already crashed but docker-wait-any
will wait forever till it starts. It should, however, immediately exit
to make the serivce restart.

#### Why I did it

It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in:

```
CONTAINER ID   IMAGE                                COMMAND                  CREATED        STATUS                    PORTS     NAMES
1abef1ecebff   bcbca2b74df6                         "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         what-just-happened
3c924d405cd5   docker-lldp:latest                   "/usr/bin/docker-lld…"   22 hours ago   Up 22 hours                         lldp
eb2b12a98c13   docker-router-advertiser:latest      "/usr/bin/docker-ini…"   22 hours ago   Up 22 hours                         radv
d6aac4a46974   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         mgmt-framework
d880fd07aab9   docker-platform-monitor:latest       "/usr/bin/docker_ini…"   22 hours ago   Up 22 hours                         pmon
75f9e22d4fdd   docker-snmp:latest                   "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         snmp
76d570a4bd1c   docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         telemetry
ee49f50344b3   docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         syncd
1f0b0bab3687   docker-teamd:latest                  "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         teamd
917aeeaf9722   docker-orchagent:latest              "/usr/bin/docker-ini…"   22 hours ago   Exited (0) 22 hours ago             swss
81a4d3e820e8   docker-fpm-frr:latest                "/usr/bin/docker_ini…"   22 hours ago   Up 22 hours                         bgp
f6eee8be282c   docker-database:latest               "/usr/local/bin/dock…"   22 hours ago   Up 22 hours                         database
```

The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot:
https://github.com/stepanblyschak/sonic-buildimage/blob/d01a91a569c9d545b30e8f81994b02d0c2513971/files/image_config/misc/docker-wait-any#L56

#### How I did it

Removed the check for ```"Running"```.

#### How to verify it

Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.
Some PR builds fails to find this file. Remove it temporarily until we root cause it
Why I did it
To support clear MACsec counters by sonic-clear macsec

How I did it
Add macsec sub-command in sonic-clear to cache the current macsec stats, and in the show macsec command to check the cache and return the diff with cache file.

How to verify it

admin@vlab-02:~$ show macsec  Ethernet0
MACsec port(Ethernet0)
---------------------  -----------
cipher_suite           GCM-AES-128
enable                 true
enable_encrypt         true
enable_protect         true
enable_replay_protect  false
replay_window          0
send_sci               true
---------------------  -----------
        MACsec Egress SC (52540067daa70001)
        -----------  -
        encoding_an  0
        -----------  -
                MACsec Egress SA (0)
                -------------------------------------  --------------------------------
                auth_key                               9DDD4C69220A1FA9B6763F229B75CB6F
                next_pn                                1
                sak                                    BA86574D054FCF48B9CD7CF54F21304A
                salt                                   000000000000000000000000
                ssci                                   0
                SAI_MACSEC_SA_ATTR_CURRENT_XPN         52
                SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED    0
                SAI_MACSEC_SA_STAT_OCTETS_PROTECTED    0
                SAI_MACSEC_SA_STAT_OUT_PKTS_ENCRYPTED  0
                SAI_MACSEC_SA_STAT_OUT_PKTS_PROTECTED  0
                -------------------------------------  --------------------------------
        MACsec Ingress SC (525400d4fd3f0001)
                MACsec Ingress SA (0)
                ---------------------------------------  --------------------------------
                active                                   true
                auth_key                                 9DDD4C69220A1FA9B6763F229B75CB6F
                lowest_acceptable_pn                     1
                sak                                      BA86574D054FCF48B9CD7CF54F21304A
                salt                                     000000000000000000000000
                ssci                                     0
                SAI_MACSEC_SA_ATTR_CURRENT_XPN           56
                SAI_MACSEC_SA_STAT_IN_PKTS_DELAYED       0
                SAI_MACSEC_SA_STAT_IN_PKTS_INVALID       0
                SAI_MACSEC_SA_STAT_IN_PKTS_LATE          0
                SAI_MACSEC_SA_STAT_IN_PKTS_NOT_USING_SA  0
                SAI_MACSEC_SA_STAT_IN_PKTS_NOT_VALID     0
                SAI_MACSEC_SA_STAT_IN_PKTS_OK            0
                SAI_MACSEC_SA_STAT_IN_PKTS_UNCHECKED     0
                SAI_MACSEC_SA_STAT_IN_PKTS_UNUSED_SA     0
                SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED      0
                SAI_MACSEC_SA_STAT_OCTETS_PROTECTED      0
                ---------------------------------------  --------------------------------

admin@vlab-02:~$ sonic-clear macsec
Clear MACsec counters

admin@vlab-02:~$ show macsec  Ethernet0
MACsec port(Ethernet0)
---------------------  -----------
cipher_suite           GCM-AES-128
enable                 true
enable_encrypt         true
enable_protect         true
enable_replay_protect  false
replay_window          0
send_sci               true
---------------------  -----------
        MACsec Egress SC (52540067daa70001)
        -----------  -
        encoding_an  0
        -----------  -
                MACsec Egress SA (0)
                -------------------------------------  --------------------------------
                auth_key                               9DDD4C69220A1FA9B6763F229B75CB6F
                next_pn                                1
                sak                                    BA86574D054FCF48B9CD7CF54F21304A
                salt                                   000000000000000000000000
                ssci                                   0
                SAI_MACSEC_SA_ATTR_CURRENT_XPN         52
                SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED    0
                SAI_MACSEC_SA_STAT_OCTETS_PROTECTED    0
                SAI_MACSEC_SA_STAT_OUT_PKTS_ENCRYPTED  0
                SAI_MACSEC_SA_STAT_OUT_PKTS_PROTECTED  0
                -------------------------------------  --------------------------------
        MACsec Ingress SC (525400d4fd3f0001)
                MACsec Ingress SA (0)
                ---------------------------------------  --------------------------------
                active                                   true
                auth_key                                 9DDD4C69220A1FA9B6763F229B75CB6F
                lowest_acceptable_pn                     1
                sak                                      BA86574D054FCF48B9CD7CF54F21304A
                salt                                     000000000000000000000000
                ssci                                     0
                SAI_MACSEC_SA_ATTR_CURRENT_XPN           0 <---this counters was cleared.
                SAI_MACSEC_SA_STAT_IN_PKTS_DELAYED       0
                SAI_MACSEC_SA_STAT_IN_PKTS_INVALID       0
                SAI_MACSEC_SA_STAT_IN_PKTS_LATE          0
                SAI_MACSEC_SA_STAT_IN_PKTS_NOT_USING_SA  0
                SAI_MACSEC_SA_STAT_IN_PKTS_NOT_VALID     0
                SAI_MACSEC_SA_STAT_IN_PKTS_OK            0
                SAI_MACSEC_SA_STAT_IN_PKTS_UNCHECKED     0
                SAI_MACSEC_SA_STAT_IN_PKTS_UNUSED_SA     0
                SAI_MACSEC_SA_STAT_OCTETS_ENCRYPTED      0
                SAI_MACSEC_SA_STAT_OCTETS_PROTECTED      0
                ---------------------------------------  --------------------------------


Signed-off-by: Ze Gan <ganze718@gmail.com>
Co-authored-by: Judy Joseph <jujoseph@microsoft.com>
* Remove faulty pipeline URLs
* Miscellaneous minor fixes
*[Yang] support for static-route yang model sonic-net#11932
@Junchao-Mellanox Junchao-Mellanox deleted the redirect-ethtool-error branch September 26, 2022 02:12
Junchao-Mellanox pushed a commit that referenced this pull request Nov 4, 2022
…n] update submodule head (sonic-net#12594)

linkmgrd:
* 8e9e49b 2022-11-02 | [active-standby][active-active] update link prober stats updating frequency to 30s (#152) (HEAD -> 202205) [Jing Zhang]
* 1ad3f18 2022-11-01 | [Active-Active] periodically re-sync soc side admin forwarding state  (#151) [Jing Zhang]
* cfcdb76 2022-11-01 | [202205] incrementing icmp buffer size (#150) [Jing Zhang]

swss:
* ac7570a 2022-11-03 | [Dynamic buffer calculation][Mellanox] Enhance the logic to identify buffer pools and profiles (sonic-net#2498) (HEAD -> 202205) [Stephen Sun]

swss-common:
* 300fc8f 2022-10-31 | [sonic-db-cli] Fix sonic-db-cli crash when database config file not ready issue. (sonic-net#701) (HEAD -> 202205) [Hua Liu]

platform-daemon:
* cd1608a 2022-10-28 | [ycabled] add support for detach mode in 'active-active' topology (sonic-net#309) (HEAD -> 202205) [vdahiya12]
* 2a5f0f4 2022-10-27 | Added filtering logic to send filtered fields from DB event (sonic-net#307) [mihirpat1]

platform-common:
* b521882 2022-10-30 |  CmisApi::get_application_advertisement catch AttributeError as well (sonic-net#316) (HEAD -> 202205) [Stephen Sun]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Junchao-Mellanox pushed a commit that referenced this pull request Nov 11, 2022
[sonic-linkmgrd][master] submodule update

b3501d2 Jing Zhang Wed Nov 2 22:22:45 2022 -0700 [active-standby][active-active] update link prober stats updating frequency to 30s (#152)
5d546ec Jing Zhang Tue Nov 1 16:12:17 2022 -0700 [202205] incrementing icmp buffer size (#150)
76b128a Jing Zhang Tue Nov 1 12:06:21 2022 -0700 [Active-Active] periodically re-sync soc side admin forwarding state (#151)

sign-off: Jing Zhang zhangjing@microsoft.com
Junchao-Mellanox pushed a commit that referenced this pull request Jul 5, 2024
…ically (sonic-net#19176)

#### Why I did it
src/sonic-restapi
```
* 314e4f6 - (HEAD -> 202311, origin/202311) Fix race build option not supported issue in armhf (#150) (57 minutes ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Sep 25, 2024
…utomatically (sonic-net#20289)

#### Why I did it
src/sonic-host-services
```
* 37039d3 - (HEAD -> 202311, origin/202311) [BFD]Fix BFD blackout issue (#150) (7 hours ago) [Sudharsan Dhamal Gopalarathnam]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Oct 24, 2024
…utomatically (sonic-net#20217)

#### Why I did it
src/sonic-host-services
```
* dbb063a - (HEAD -> 202405, origin/202405) [BFD]Fix BFD blackout issue (#150) (52 minutes ago) [Sudharsan Dhamal Gopalarathnam]
* 8b3fcbe - Password Hardening: Add support to disable expiration date (#93) (52 minutes ago) [davidpil2002]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.