Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMIS 'ConfigSuccess" failure while changing default ApSel code for 800G DR8/FR8 modules #459

Merged
merged 16 commits into from
May 17, 2024
Merged

Conversation

AnoopKamath
Copy link
Contributor

@AnoopKamath AnoopKamath commented Apr 5, 2024

Description

CMIS 'ConfigSuccess" failure while changing default ApSel code for 800G DR8/FR8 modules
Issue seen with vendors:
Eoptolink
Finisar
Source-Photonics

Ex: If module supports 800G, 400G and 100G app code and has default app mode as 800G, an issue has arisen with the 2x400G target mode which is hitting ConfigRejectedPartailDataPath error and failing. (same as 8x100 App mode)

Motivation and Context

Extract from CMIS spec: - 6.2.4.3 Host Rules and Recommendations
The host can change the width of a Data Path only while in the DPDeactivated state, i.e. the host must always transition an existing Data Path to DPDeactivated before selecting an Application with a different lane count. Any lane that becomes unused must be marked as such (AppSel = 0000b) or it must be assigned to a new valid Data Path (remaining in DPDeactivated state until eventually used)

Reset AppSel value for all lanes when setting non default app value

How Has This Been Tested?

Tested with different vendors changing different modes:

root@sonic:/home/cisco# show logging xcvrd | grep Ethernet160
Apr  5 17:11:02.534867 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:CONFIG_DB Table:PORT fvp {'admin_status': 'up', 'alias': 'etp20a', 'index': '20', 'lanes': '8,9,10,11', 'mtu': '9100', 'speed': '400000', 'subport': '1'}
Apr  5 17:11:02.537697 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'down', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000,200000,400000', 'supported_fecs': 'rs', 'host_tx_ready': 'true'}
Apr  5 17:11:02.538174 sonic WARNING pmon#xcvrd[29]: *** Ethernet160CONFIG_DBPORT handle_port_update_event() fvp {'admin_status': 'up', 'alias': 'etp20a', 'index': '20', 'lanes': '8,9,10,11', 'mtu': '9100', 'speed': '400000', 'subport': '1', 'key': 'Ethernet160', 'asic_id': 0, 'op': 'SET'}
Apr  5 17:11:02.550360 sonic WARNING pmon#xcvrd[29]: *** Ethernet160STATE_DBPORT_TABLE handle_port_update_event() fvp {'host_tx_ready': 'true', 'index': '-1', 'key': 'Ethernet160', 'asic_id': 0, 'op': 'SET'}
Apr  5 17:11:03.280329 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0x0, state=INSERTED, appl 0 host_lane_count 4 retries=0
Apr  5 17:11:03.341874 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting appl=3
Apr  5 17:11:03.403239 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting host_lanemask=0xf
Apr  5 17:11:03.524128 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting media_lanemask=0xf
Apr  5 17:11:03.531581 sonic NOTICE pmon#xcvrd[29]: CMIS: Changing from default AppSel 1 to non default AppSel code 3. Reset AppSel code for all lanes
Apr  5 17:11:03.556579 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: force Datapath reinit

Apr  5 17:11:10.936253 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:TRANSCEIVER_INFO fvp {'host_electrical_interface': '800G L C2M (placeholder)', 'active_apsel_hostlane1': '0', 'model': 'EOLD-138HG-02-41', 'hardware_rev': '1.0', 'vendor_rev': '01', 'active_apsel_hostlane5': '0', 'active_apsel_hostlane3': '0', 'host_lane_assignment_option': '1', 'active_apsel_hostlane2': '0', 'ext_identifier': 'Power Class 8 (17.0W Max)', 'media_interface_code': 'Undefined', 'specification_compliance': 'sm_media_interface', 'application_advertisement': "{1: {'host_electrical_interface_id': '800G L C2M (placeholder)', 'module_media_interface_id': 'Undefined', 'media_lane_count': 8, 'host_lane_count': 8, 'host_lane_assignment_options': 1, 'media_lane_assignment_options': 1}, 2: {'host_electrical_interface_id': '800G S C2M (placeholder)', 'module_media_interface_id': 'Undefined', 'media_lane_count': 8, 'host_lane_count': 8, 'host_lane_assignment_options': 1, 'media_lane_assignment_options': 1}, 3: {'host_electrical_interface_id': '400GAUI-4-L C2M (Annex 120G)', 'module_media_interface_id': '400GBASE-DR4 (Cl 124)', 'media_lane_count': 4, 'host_lane_count': 4, 'host_lane_assignment_options': 17, 'media_lane_assignment_options': 17}, 4: {'host_electrical_interface_id': '400GAUI-4-S C2M (Annex 120G)', 'module_media_interface_id': '400GBASE-DR4 (Cl 124)', 'media_lane_count': 4, 'host_lane_count': 4, 'host_lane_assignment_options': 17, 'media_lane_assignment_options': 17}, 5: {'host_electrical_interface_id': '100GAUI-1-L C2M (Annex 120G)', 'module_media_interface_id': '100G-FR/100GBASE-FR1 (Cl 140)', 'media_lane_count': 1, 'host_lane_count': 1, 'host_lane_assignment_options': 255, 'media_lane_assignment_options': 255}, 6: {'host_electrical_interface_id': '100GAUI-1-S C2M (Annex 120G)', 'module_media_interface_id': '100G-FR/100GBASE-FR1 (Cl 140)', 'media_lane_count': 1, 'host_lane_count': 1, 'host_lane_assignment_options': 255, 'media_lane_assignment_options': 255}}", 'vendor_oui': '70-ee-a3', 'active_apsel_hostlane6': '0', 'media_lane_count': '8', 'is_replaceable': 'True', 'cable_type': 'Length Cable Assembly(m)', 'connector': 'MPO 1x12', 'ext_rateselect_compliance': 'N/A', 'active_apsel_hostlane4': '0', 'vendor_date': '2023-02-24   ', 'host_lane_count': '8', 'encoding': 'N/A', 'nominal_bit_rate': '0', 'supported_max_tx_power': 'N/A', 'supported_min_laser_freq': 'N/A', 'active_apsel_hostlane8': '0', 'dom_capability': 'N/A', 'supported_max_laser_freq': 'N/A', 'type': 'QSFP-DD Double Density 8X Pluggable Transceiver', 'manufacturer': 'CISCO-EOPTOLINK ', 'media_interface_technology': '1310 nm EML', 'supported_min_tx_power': 'N/A', 'cmis_rev': '5.0', 'media_lane_assignment_option': '1', 'cable_length': '0.0', 'serial': 'EOP27080006     ', 'active_apsel_hostlane7': '0'}
Apr  5 17:11:10.936760 sonic WARNING pmon#xcvrd[29]: *** Ethernet160STATE_DBTRANSCEIVER_INFO handle_port_update_event() 
Apr  5 17:11:11.212687 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=INSERTED, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:11.274449 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting appl=3
Apr  5 17:11:11.336291 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting host_lanemask=0xf
Apr  5 17:11:11.459386 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting media_lanemask=0xf
Apr  5 17:11:11.473669 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: force Datapath reinit
Apr  5 17:11:33.323121 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_DEINIT, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:34.359393 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: DpDeinit duration 1.0 secs, modulePwrUp duration 10.0 secs
Apr  5 17:11:39.597517 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=AP_CONFIGURED, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:39.635618 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Apply Optics SI found for Vendor: CISCO-EOPTOLINK   PN: EOLD-138HG-02-41 lane speed: 100G
Apr  5 17:11:50.731261 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_INIT, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:50.746872 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: DpInit duration 10.0 secs
Apr  5 17:11:57.587755 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_TXON, appl 3 host_lane_count 4 retries=0
Apr  5 17:11:57.598835 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Turning ON tx power
Apr  5 17:12:03.315972 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=DP_ACTIVATION, appl 3 host_lane_count 4 retries=0
Apr  5 17:12:03.319969 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: READY
Apr  5 17:12:03.390218 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: updated TRANSCEIVER_INFO_TABLE [('active_apsel_hostlane1', '3'), ('active_apsel_hostlane2', '3'), ('active_apsel_hostlane3', '3'), ('active_apsel_hostlane4', '3'), ('host_lane_count', '4'), ('media_lane_count', '4')]
Apr  5 17:12:06.651543 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() :
Apr  5 17:12:06.651646 sonic WARNING pmon#xcvrd[29]: *** Ethernet160STATE_DBTRANSCEIVER_INFO handle_port_update_event() fvp 
Apr  5 17:12:06.655136 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: 400G, lanemask=0xf, state=INSERTED, appl 3 host_lane_count 4 retries=0
Apr  5 17:12:06.714680 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting appl=3
Apr  5 17:12:06.774654 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting host_lanemask=0xf
Apr  5 17:12:06.895745 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: Setting media_lanemask=0xf
Apr  5 17:12:06.917114 sonic NOTICE pmon#xcvrd[29]: CMIS: Ethernet160: no CMIS application update required...READY
Apr  5 17:12:14.943653 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000,200000,400000', 'supported_fecs': 'rs', 'host_tx_ready': 'true', 'speed': '400000'}
Apr  5 17:12:14.943750 sonic WARNING pmon#xcvrd[29]: $$$ Ethernet160 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000,200000,400000', 'supported_fecs': 'rs', 'host_tx_ready': 'true', 'speed': '400000'}
root@sonic:/home/cisco# 

subport_shutdown_ut.txt

Additional Information (Optional)

Reset AppSel value for all lanes when setting non default app value
@prgeor
Copy link
Collaborator

prgeor commented Apr 9, 2024

@AnoopKamath Please test

  1. Xcvrd restart
  2. Config interface shut/no-shut of 2x400G is not impacting the other datapath

Copy link
Contributor

@mihirpat1 mihirpat1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnoopKamath Can you please fix the build failure?

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@AnoopKamath
Copy link
Contributor Author

AnoopKamath commented May 14, 2024

@AnoopKamath Please test

  1. Xcvrd restart
  2. Config interface shut/no-shut of 2x400G is not impacting the other datapath
  1. Tested XCVRD restart and saw modules going to READY state
  2. Tested config shut/no-shut on 4 different modules and it is not impacting other datapath

Logs attached

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
"""
api.set_application(0xff, 0, 0)
api.set_datapath_deinit(0xff)
if not api.scs_apply_datapath_init(0xff):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnoopKamath this is NOT required

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prgeor,
We should apply (apply is applyDPInit) on all the lanes to unused(appsel 0), if we are setting different app code to partial lanes. When 800G config is active (all lanes used), then applying partial config (only one port of 400G) is clearly “partial” and “invalid” which is exactly that the module is responding.
Host must either apply both 400G ports or apply NULL config (to clear all lanes) before applying one port of 400G. DPDeactivated state and set app will push config- appsel 0 to Stage Control Set and we need to apply DPInit to push it to Active Control Set so that all the lanes become unused. Just doing DPDeinit and setting app to 0 does not solve the problem.

image

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@prgeor
Copy link
Collaborator

prgeor commented May 15, 2024

@AnoopKamath can you use this API sonic-net/sonic-platform-common#471

@AnoopKamath
Copy link
Contributor Author

@AnoopKamath can you use this API sonic-net/sonic-platform-common#471

@prgeor : UT looks good after I patched https://github.com/sonic-net/sonic-platform-common/pull/471/files. I will update the PR after you merger these changes. Thanks

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@prgeor prgeor merged commit 9ffce20 into sonic-net:master May 17, 2024
5 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-platform-daemons that referenced this pull request May 18, 2024
…0G DR8/FR8 modules (sonic-net#459)

* Update xcvrd.py

Reset AppSel value for all lanes when setting non default app value

* update reset_app_code mock test

* Update mock test for reset_app_code

* Add new api to reset app code

* Fix build failures and add return type

* Rename apis

* Address review comments

rename APIs

* remove decommission api from xcvrd

* Update xcvrd.py

* Update test_xcvrd.py

* Update test_xcvrd.py

* Update test_xcvrd.py

* Update test_xcvrd.py

* Add mock test to capture fail case

* Address review comments
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202311: #490

mssonicbld pushed a commit that referenced this pull request May 18, 2024
…0G DR8/FR8 modules (#459)

* Update xcvrd.py

Reset AppSel value for all lanes when setting non default app value

* update reset_app_code mock test

* Update mock test for reset_app_code

* Add new api to reset app code

* Fix build failures and add return type

* Rename apis

* Address review comments

rename APIs

* remove decommission api from xcvrd

* Update xcvrd.py

* Update test_xcvrd.py

* Update test_xcvrd.py

* Update test_xcvrd.py

* Update test_xcvrd.py

* Add mock test to capture fail case

* Address review comments
@zhenggen-xu
Copy link

@prgeor @lguohan can we back-port this to 202305? This is not only impact the 800G modules but also 400G modules with breakouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants