forked from sonic-net/sonic-buildimage
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mellanox] Add SDK dump with new SAI implementation #8
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Mar 29, 2021
DavidZagury
force-pushed
the
202012-sdk-dumps
branch
2 times, most recently
from
March 29, 2021 12:35
6ce9244
to
9d9c409
Compare
Co-authored-by: mssonicbld <vsts@fv-az80-884.nqsemdo0cabejmrqkclmmohwag.dx.internal.cloudapp.net>
To add latest SAI drop REL_4.3.3.3 to SONIC which addresses the following CSP cases: CS00012058054: [4.3][IPinIP][TTL-PIPE] IPinIP TTL Pipe Mode is NOT working it is behaving UNIFORM mode even programed as PIPE mode CS00011227466: [4.3] Warmboot support with tunnel encap
) Fix the following issues: Spectrum-2, Spectrum-3 | Port | Fix link issue when using 25 GbE rate between two ports while one is on Spectrum-2-based system and the other is on Spectrum-3-based system All | warmboot | fail to upgrade from earlier SONiC versions with official SDK/FW 4.4.2306 (was on SONiC 201911) All | What-Just-Happened | When enabling or disabling WJH under high traffic load to the host CPU, in very specific and low probability conditions, an error could occur, that may result in loss of data, channel failure or in extreme cases SW failure Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
shlomibitton
suggested changes
Mar 30, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct path is "/var/log/mellanox/sdk-dumps" with a lower 'm'.
Did you generate an image with this commit? need to verify there are no degradations with new SDK/FW/SAI first.
DavidZagury
force-pushed
the
202012-sdk-dumps
branch
3 times, most recently
from
March 30, 2021 14:31
e416442
to
32edf76
Compare
Build Marvell kernel driver for prestera sai sdk Builds interrupt and dma kernel driver Removed the older method pre-compiled kernel module debian package and its makefile
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
) The file device/mellanox/x86_64-mlnx_msn4410-r0/plugins/sfputil.py is not a software link for device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py. And it is still using python2 syntex which causes some SFP CLI error. The PR is to change it to a softlink and add 4410 support in device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py.
…7122) Bumps [lxml](https://github.com/lxml/lxml) from 4.6.2 to 4.6.3. - [Release notes](https://github.com/lxml/lxml/releases) - [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt) - [Commits](lxml/lxml@lxml-4.6.2...lxml-4.6.3) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Update unsupported SAI attr ('SAI_ACL_TABLE_ATTR_FIELD_OUTER_VLAN_ID') to fix issues on acl table create
this PR updates the following commits in sonic-platform-daemons 260cf2d [xcvrd] change firmware information fields name inside MUX_CABLE_INFO table for Y cable (sonic-net#165) cfa600f [thermalctld] Initialize fan led in thermalctld for the first run (sonic-net#167) 8509f43 [thermalctld] Refactor to allow for greater unit test coverage; Add more unit tests (sonic-net#157) 70f4e7b [syseepromd] Update warning message to be more informative (sonic-net#160) Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
Integrate hw-management package V.7.0010.2002 Bug fixes: Removing critical thermal zones to prevent unexpected software system shutdown: *Kernel 4.9 -0071-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch *Kernel 4.19 -076-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch Removing redundant link for cpld3 for fixed systems (SN2100, SN2010). Fix an issue with missed attribute for cpld3 (port CPLD) for SN2700, SN2410. Signed-off-by: Stephen Sun <stephens@nvidia.com>
…sonic-net#7164) The psample module was not loaded on barefoot platform. The loading of this module is a prerequisite for testing SFlow. * add `.gitignore` to the `barefoot` subdirectory to overwrite ignore "platform/**/debian/*" in the root directory
The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.
…Metadata section of minigraph file (sonic-net#7166) Backport of sonic-net#7031 to the 202012 branch #### Why I did it To enable parsing the `AutoNegotiation` element from the LinkMetadata section of minigraph file #### How I did it Parse the value `AutoNegotiation` element from the `LinkMetadata` section of minigraph file. If the element is present, an `autoneg` key will be added to the port in the `PORT` table of Config DB with a value of either `0` or `1` If an `autoneg` value is present in port_config.ini, the value from the minigraph will take precedence, overriding that value. Also remove `AutoNegotiation` and `EnableAutoNegotiation` elements from the `DeviceInfo` section, as we will use this data in the `LinkMetadata` section to determine whether to enable auto-negotiation for a port.
Unit tests for thermalctld depend on sonic-platform-common as of sonic-net/sonic-platform-daemons#157
Unit tests for psud depend on sonic-platform-common as of sonic-net/sonic-platform-daemons#154
Temporary skip psud for Newport, for Barefoot needs. Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
…et#7153) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it If device reboot was caused by kernel panic, then we need retrieve and store the key information into the symbol file previous-reboot-cause.json. The CLI show reboot-cause will read this file to get the reason of previous reboot. This PR is related to PR in sonic-utilities repo: sonic-net/sonic-utilities#1486 How I did it The string variable previous_reboot_cause will be parsed to check whether it contains the keyword Kernel Panic. If it did, then store the keyword and time information into a dictionary. How to verify it I verified this change on a virtual testbed. admin@vlab-01:/host/reboot-cause$ more previous-reboot-cause.json {"gen_time": "2021_03_24_23_22_35", "cause": "Kernel Panic", "user": "N/A", "time": "Wed 24 Mar 2021 11:22:03 PM UTC", "comment": "N/A"} admin@vlab-01:/host/reboot-cause$ show reboot-cause Kernel Panic [Time: Wed 24 Mar 2021 11:22:03 PM UTC]
c5be3ca [psud] Increase unit test coverage; Refactor mock platform (sonic-net#154) 450b7d7 Bug fix: the fields that are not supported by vendor should be "N/A" in STATE_DB (sonic-net#168) Signed-off-by: Stephen Sun <stephens@nvidia.com>
…_startup.py (sonic-net#7154) To improve management of docker-gbsyncd-vs. gbsyncd_startup.py simply spawned syncd processes and then exited. In that case, supervisord would no longer manage any processes in the container, and thus there was no way to know if a critical process had exited. I recently created gbsyncdmgrd to be a more complete, robust replacement for gbsyncd_startup.py. NOTE: This PR is dependent on the inclusion of gbsyncdmgrd in the sonic-sairedis repo. A submodule update is pending at sonic-net#7089
…exit listener; Set all event buffer sizes to 1024 (sonic-net#7203) #### Why I did it Backport of sonic-net#7083 to the 202012 branch. To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged: ``` Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46 ``` This is basically an addendum to sonic-net#5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10. This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802). I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
This reverts commit 50e4cc1.
…tilites submodules (sonic-net#7209) sonic-swss -[SFlowMgr] Sflow Crash on 200G ports handled (sonic-net#1683) -Stablize the test case (sonic-net#1679) -Remove PGs from an administratively down port. (sonic-net#1677) sonic-swss-common - fix getting hash from redis db (sonic-net#465) - [dbconnector] Initialize redisContext (sonic-net#464) sonic-utilities - route_check: Fix hanging & logging level (sonic-net#1520) - Add self timeout and crash if exceeded. (sonic-net#1502) - [reboot] User-friendly reboot cause message for kernel panic (sonic-net#1486) - [acl-loader]: do not add default deny rule for egress acl (sonic-net#1531) Signed-off-by: Danny Allen <daall@microsoft.com>
…et#7543) #### Why I did it - PSU data is loaded into state DB. Following errors are seen in syslogs: "Failed to update PSU data - '<=' not supported between instances of 'float' and 'str'" - Issue is not seen in master image as the PSU API return type is different. #### How I did it - Changed the return type in PSU API's.
…t#7547) * [202012] Add SOC property to enable AN/LT on some platforms Why I did it To enable autonegotiation/link training on some Broadcom-based platforms (Arista 7060CX, 7260CX3, 7050cx3, Celestica DX010) How I did it Add appropriate SOC property for enabling the feature to the Broadcom config files of appropriate platforms Also convert line endings to UNIX format for one Celestica file * Add 'phy_an_lt_msft' to BCM config file permitted list
cleanup the build commands after build finished.
The default value is 600 minutes, it is not enough when building multiple images for a platform, change to 720 minutes.
…rm device facts. (sonic-net#7496) - Why I did it Current platform.json lacks some peripheral device related facts, like chassis/fan/pasu/drawer/thermal/components names, numbers, etc. - How I did it Add platform device facts to the platform.json file - How to verify it Run sonic-mgmt platform API tests which depend on these facts. Signed-off-by: Kebo Liu <kebol@nvidia.com>
…7535) #### Why I did it MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0 #### How I did it Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
…nic-net#7563) * [202012][swss/swss-common/utilities/platform-daemons] Update submodule sonic-swss - [flex-counters] Delay flex counters stats init for faster boot time [202012] (sonic-net#1736) sonic-swss-common - [swig] allow threads (sonic-net#477) sonic-utilities - [sfpshow] Gracefully handle improper 'specification_compliance' field (sonic-net#1594) sonic-platform-daemons - [xcvrd] Change the y_cable presence logic to use "mux_cable" table as identifier from Config DB (sonic-net#176) - [xcvrd] Enhance Media Settings (sonic-net#177) Signed-off-by: Danny Allen <daall@microsoft.com>
Changed DellEMC Z9932f media settings from Vendor Name + PN method to common method.
rules/config.user allows overriding default properties without touching tracked files. This change makes sure all properties can be set and not just the ones used in slave.mk. Signed-off-by: Christian Svensson <blue@cmd.nu>
Why I did it After PR sonic-net#7344, 'make init' and/or 'make reset' will also build sonic slave dockers. '-include rules/config.user' is supposed to be fine when the file is missing. However, when the file is missing, it generates a delayed error which later causes make init and make reset trying to build the sonic slave dockers. How I did it Define a do-nothing target for config.user to catch config.user build therefore preventing other builds to be triggered unexpectedly. How to verify it did make init and it is now only doing submodule init.
…7475) Previously, a brief sleep was necessary in order to get Python threads to progress. The root cause of this has since been found and fixed in sonic-swss-common: sonic-net/sonic-swss-common#477. The submodule was updated here, so we can now safely remove this sleep. This PR should also be cherry-picked to the 202012 branch once the submodule is updated there to also include the fix.
…c-net#7474) Why I did it Finding running containers through "docker ps" breaks when kubernetes deploys container, as the names are mangled. How I did it The data is is available from FEATURE table, which takes care of kubernetes deployment too. How to verify it Deploy a feature via kubernetes and don't expect error from container_check.
…sing files or sockets (sonic-net#7509) fuser support is required since new cisco hardware watchdog plugin uses them to check anyone else use's /dev/watchdogX resource. The actual validation happens in the platform code, but the package is required for pmon container. Currently the /dev/watchdogX is being used by cisco platform-monitor service. Cisco chassis level watchdog plugin uses "fuser" to claim the watchdog release from platform-monitor service.
LED_PROC_INIT_SOC variable was incorrectly referenced as LED_SOC_INIT_SOC. Introduced in sonic-net#5483 Rather than fixing the typo, I decided to simplify the script, removing the need for the conditional altogether by moving the bcmcmd call inside the conditional which checks for the presence of LED_SOC_INIT_SOC.
https://github.com/mbj4668/pyang/blob/master/pyang/repository.py#L93 throws an exception with pip 21.1 add ietf yang model explicitly to the build process fix the test failure. tests/test_sonic_yang_models.py .F [ 66%] tests/yang_model_tests/test_yang_model.py . [100%] Failed: pyang -f tree ./yang-models/*.yang > ./yang-models/sonic_yang_tree ----------------------------- Captured stderr call ----------------------------- ./yang-models/sonic-acl.yang:8: error: module "ietf-inet-types" not found in search path ./yang-models/sonic-device_metadata.yang:8: error: module "ietf-yang-types" not found in search path Signed-off-by: Guohan Lu <lguohan@gmail.com>
DavidZagury
force-pushed
the
202012-sdk-dumps
branch
from
May 11, 2021 07:30
32edf76
to
175e6db
Compare
DavidZagury
pushed a commit
that referenced
this pull request
Dec 8, 2021
Update Barefoot platform support for Bullseye and 5.10 kernel, and add python3-venv.
DavidZagury
pushed a commit
that referenced
this pull request
Feb 27, 2022
* [BFN] Updated platform APIs impl Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Extended BFN platform SFP APIs implementation * Update sfp.py * [BFN] Extended SFP platform plugin implementation Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [BFN] Extended Fans platform plugin implementation * [BFN] divided classes Fan and FanDrawer into 2 files * Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> What I did Add get_model() function Add get_low_critical_threshold() function Change __get(...) function. How I did it Differnece from previous implementation of __get(...) function is return real value or -9999.9 if value is not provided by thrift API * Add get_presence() function and revised __get() function Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> * [BFN] Updated PSU platform APIs impl Signed-off-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com> * Added BFN PSU cache (#9) Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [BFN] Fans and Fantray platform APIs update (#7) * [BFN] Updated SFP platform APIs (#10) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * [BFN] Updated platform API for thermal (#8) * Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> * Revert "[BFN] Fans and Fantray platform APIs update (#7)" (#11) This reverts commit c62a733. * Add support health monitor system (#15) Signed-off-by: Petro Bratash <petrox.bratash@intel.com> * Update chassis.py * [BFN] Updated FANs and FAN Tray platform API (#14) * Fix fix_alignment (#17) Signed-off-by: Petro Bratash <petrox.bratash@intel.com> * [BFN] Improvement show environment (#16) * Added PSU temperature skip into platform.json (#18) Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Do not skip psud on Newport Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [BFN] fix fan status from Not OK to Ok (#19) * [BFN] Updated SFP platform plugin (#13) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * [DPB] Fix typo for Ethernet0 2x200G[100G,40G] breakout mode (#21) Signed-off-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com> * [barefoot] Tmp fix vendor_rev (#22) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * Fixed python issues in sonic_platform/fan_drawer.py Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Updated fan_drawer.py * Fixing trailing white spaces in fan_drawer.py * [BFN] Fix thrift for SFPs API Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com> * In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * [Newport] Thermal manager (#23) * Signed-off-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> * Revert "In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue" This reverts commit 1e73127. * Removed 'controllable' options from platform.json to fix factory default config generation Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> * Update thermal_manager.py * Migrated SFP plugin to sonic_xcvr API (#30) Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com> Co-authored-by: KostiantynYarovyiBf <kostiantynx.yarovyi@intel.com> Co-authored-by: Vadym Yashchenko <vadymx.yashchenko@intel.com> Co-authored-by: Dmytro Lytvynenko <dmytrox.lytvynenko@intel.com> Co-authored-by: Volodymyr Boiko <volodymyrx.boiko@intel.com> Co-authored-by: Petro Bratash <petrox.bratash@intel.com> Co-authored-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com>
DavidZagury
pushed a commit
that referenced
this pull request
Dec 10, 2024
To fix a statistical issue. The original fix was done in FRRouting/frr#17297. However to accommodate 8.5.4 the patch in the PR was added. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'. Program terminated with signal SIGABRT, Aborted. #0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))] (gdb) bt #0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678 #4 0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352 #5 0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258 #6 route_next (node=<optimized out>) at ../lib/table.c:436 #7 route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410 #8 0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020") at ../zebra/interface.c:312 #9 0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867 #10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221 #11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810 #12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990 #13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198 #14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I did it
To create SDK dump on Mellanox devices when SDK event has occurred.
How I did it
Update SDK and SAI to a new version that supports this feature.
Set the SKUs keys needed to initialize the feature in SAI.
How to verify it
Simulate SDK event and check that dump is created in the expected path.
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)