From 8b7cf5cc7042fc5043708a14bf9de19700c3b15e Mon Sep 17 00:00:00 2001 From: jfeng-arista <98421150+jfeng-arista@users.noreply.github.com> Date: Wed, 4 Oct 2023 10:12:13 -0700 Subject: [PATCH] Update fabric link monitoring plan. (#1013) [HLD] Update fabric link monitoring design and test plan. --- doc/voq/fabric.md | 400 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 371 insertions(+), 29 deletions(-) diff --git a/doc/voq/fabric.md b/doc/voq/fabric.md index b3360f01f3..d963e304d0 100644 --- a/doc/voq/fabric.md +++ b/doc/voq/fabric.md @@ -1,19 +1,22 @@ -# Fabric port support on Sonic +# Fabric port support on SONiC # High Level Design Document -#### Rev 1 # Table of Contents * [List of Tables](#list-of-tables) * [List of Figures](#list-of-figures) * [Revision](#revision) -* [About this Manual](#about-this-manual) * [Scope](#scope) * [Definitions/Abbreviations](#definitionsabbreviations) +* [Overview](#overview) * [1 Requirements](#1-requirements) * [2 Design](#2-design) -* [3 Testing](#3-testing) -* [4 Future Work](#4-future-work) +* [3 SAI API](#3-sai-api) +* [4 Configuration and management](#4-configuration-and-management) +* [5 Warmboot and Fastboot Design Impact](#5-warmboot-and-fastboot-design-impact) +* [6 Testing](#6-testing) +* [7 Open/Action items - if any](#7-openaction-items---if-any) +* [8 Restrictions/Limitations](#8-restrictionslimitations) # List of Tables * [Table 1: Abbreviations](#definitionsabbreviations) @@ -27,10 +30,9 @@ | 1.1 | Sep-1 2020 | Ngoc Do, Eswaran Baskaran (Arista Networks) | Add hotswap handling | | 2 | Oct-20 2020 | Ngoc Do, Eswaran Baskaran (Arista Networks) | Update counter information | | 2.1 | Nov-17 2020 | Ngoc Do, Eswaran Baskaran (Arista Networks) | Minor update on container starts | - -# About this Manual - -This document provides an overview of the SONiC support for fabric ports that are present in a VOQ-based chassis. These fabric ports are used to interconnect the forwarding Network Processing Units within the VOQ chassis. +| 3 | Jun-3 2022 | Cheryl Sanchez, Jie Feng (Arista Networks) | Update on fabric link monitoring | +| 3.1 | Mar-30 2023 | Jie Feng (Arista Networks) | Update Overview, SAI API and Configuration and management section | +| 3.2 | May-01 2023 | Jie Feng (Arista Networks) | Update Counter tables information | # Scope @@ -50,6 +52,10 @@ This document builds on top of the VOQ chassis architecture discussed [here](htt | ASIC | Application Specific Integrated Circuit | In addition to NPUs, also includes fabric chips that could forward packets or cells. | | cell | Fabric Data Units | The data units that traverse a cell-based chassis fabric. | +# Overview + +This document provides an overview of the SONiC support for fabric ports that are present in a VOQ-based chassis. These fabric ports are used to interconnect the forwarding Network Processing Units within the VOQ chassis. + # 1 Requirements Fabric ports are used in systems in which there are multiple forwarding ASICs are required to be connected. Traffic passes from one front panel port in a forwarding ASIC over a fabric network to one or multiple front panel ports on one or other ASICs. The fabric network is formed using fabric ASICs. Fabric links on the fabric network connect fabric ports on forwarding ASICs to fabric ports on fabric ASICs. @@ -84,7 +90,7 @@ DEVICE_METADATA|localhost: { Each fabric ASIC must be assigned a unique switch_id. The SAI VOQ specification recommends that this number be assigned to be different than the switch_id assigned to the forwarding ASICs in the chassis. -Fabric port status will be polled periodically and stored in table STATE_DB|FABRIC_PORT_TABLE. Typically, fabric port status about a fabric port includes: +Fabric port is numbered as the chip fabric port number, the its status will be polled periodically and stored in table STATE_DB|FABRIC_PORT_TABLE. Typically, fabric port status about a fabric port includes: - Status: Up or down - If port is down, we may have some more information indicating reason e.g. CRC or misaligned @@ -94,10 +100,6 @@ Fabric port status will be polled periodically and stored in table STATE_DB|FABR STATE_DB:FABRIC_PORT_TABLE:{{fabric_port_name}} "lane": {{number}} "status": “up|down” - "crc": “yes” # if status: down - "misaligned": “yes” # if status: down - "remote_switch_id": {{number}} # if status: up - "remote_lane": {{number}} # if status: up ``` Fabric port statistics include the following port counters: @@ -113,7 +115,10 @@ Fabric port statistics include the following port counters: SAI_PORT_STAT_IF_OUT_FABRIC_DATA_UNITS ``` -FabricPortsOrch defines the port counters in FLEX_COUNTER_DB and syncd's existing FlexCounters thread periodically collects and saves these counters in COUNTER_DB. “show” cli commands read COUNTER_DB and display statistics information. +FabricPortsOrch defines the port counters in FLEX_COUNTER_DB and syncd's existing FlexCounters thread periodically collects and saves these counters in COUNTER_DB. The counter oid is get from sai_serialize_object_id of the port. A “show” cli commands read COUNTER_DB and display statistics information. The example output of the cli is in section 2.7. + +***Example*** +"FLEX_COUNTER_TABLE:FABRIC_PORT_STAT_COUNTER:oid:0x10000000000df" Fabric port also has a couple of queue counters. Similar to the port counters, the queue counters are also polled with FLEX_COUNTER_DB. ``` @@ -122,6 +127,9 @@ Fabric port also has a couple of queue counters. Similar to the port counters, t SAI_QUEUE_STAT_CURR_OCCUPANCY_LEVEL ``` +***Example*** +"FLEX_COUNTER_TABLE:FABRIC_QUEUE_STAT_COUNTER:oid:0x15000000000219" + Note that Linecard Sonic instances will also have STATE_DB|FABRIC_PORT_TABLE as well as port/queue counters because there are fabric ports in forwarding ASICs as well. ## 2.3 System Initialization @@ -155,35 +163,369 @@ When a forwarding ASIC is initialized, the fabric ports are initialized by defau ## 2.7 Cli commands ``` -> show fabric counters -n [port_id] - -asic2 fabric port counter (number of fabric ports: 192) +> show fabric counters port + ASIC PORT STATE IN_CELL IN_OCTET OUT_CELL OUT_OCTET CRC FEC_CORRECTABLE FEC_UNCORRECTABLE SYMBOL_ERR +------ ------ ------- --------- ---------- ---------- ----------- ----- ----------------- ------------------- ------------ + 0 0 up 1 135 0 0 0 10 2009682570 0 + 0 1 down 0 0 0 0 0 0 5163529467 0 + 0 2 up 1 206 2 403 0 10 2015665810 0 +``` -PORT RxCells TxCells Crc Fec Corrected -------------------------------------------------------------------------- - 0 : 71660578 2 0 0 0 - 1 : 71659798 1 0 0 213 - 2 : 0 1 0 0 167 - 3 : 0 2 0 0 193 +``` +> show fabric counters queue + ASIC PORT STATE QUEUE_ID CURRENT_BYTE CURRENT_LEVEL WATERMARK_LEVEL +------ ------ ------- ---------- -------------- --------------- ----------------- + 0 0 up 0 0 0 24 + 0 1 down 0 0 0 24 + 0 2 down 0 0 0 24 + 0 3 up 0 0 0 24 ``` ### 2.7.1 Fabric Status -In a later phase, a `show fabric status` command will be added to show the remote switch ID and link ID for each fabric link of an ASIC. This will be obtained from the SAI_PORT_ATTR_FABRIC_REACHABILITY port attribute of the fabric port. Note that for fabric links that do not have a link partner because of the configuration of the chassis, this will show the status as `down`. The status will also be `down` for fabric links that are down due to some other physical error. To identify links that are down due to error vs links that are not expected to be up because of the chassis connectivity, we need to build up a list of expected fabric connectivity for each ASIC. This can be computed ahead of time based on the vendor configuration and populated in the minigraph. This will be implemented in a later phase. +In a later phase, a `show fabric reachability` command will be added to show the remote switch ID and link ID for each fabric link of an ASIC. The command will be added for both forwarding ASICs on Linecards and fabric ASICs on Fabric cards. This will be obtained from the SAI_PORT_ATTR_FABRIC_REACHABILITY port attribute of the fabric port. Note that for fabric links that do not have a link partner because of the configuration of the chassis, this will not shown in the command. + +``` +> show fabric reachability + +asic0 + Local Link Remote Module Remote Link Status +------------ --------------- ------------- -------- + 49 4 86 up + 50 2 87 up + 52 4 85 up + 54 2 93 up +.... +``` + +## 2.8 Fabric Link Monitor + +SONiC needs to monitor the fabric link status and take corresponding actions once an unhealthy link is detected to avoid traffic loss. Once the fabric link monitoring feature is enabled, SONiC needs to monitor the fabric capacity of a forwarding ASIC and take corresponding action once the capacity goes below the configured threshold. + +The design of fabric link monitor is intentionally scoped to use local component state such as information local to a linecard or information local to a supervisor. This design simplifies the need for inter-component communication. + +### 2.8.1 Monitor Fabric Link Status + +Unhealthy fabric links may lead to traffic drops. Fabric link monitoring is an important tool to minimize traffic loss. The fabric link monitor algorithm monitors fabric link status and isolates the link if one or more criteria are true. By isolating a fabric link, the link is still up in the physical layer, but is taken out of service and does not distribute traffic. This feature is needed on both fabric ASICs and forwarding ASICs. + +#### 2.8.1.1 Fabric link monitoring criteria + +The fabric link monitoring algorithm checks two type of errors on a link: crc errors and uncorrectable errors. + +The criteria can be extended to include checking other errors later. + +#### 2.8.1.2 Monitoring algorithm + +Instead of reacting to the counter changes, Orchagent adds a new poller and periodically polls status of all fabric links. By default, the total number of received cells, cells with crc errors, cells with uncorrectable errors are fetched from all serdes links periodically and the error rates are calculated using these numbers. If any one of the error rates is above the threshold for a number of consecutive polls, the link is identified as an unhealthy link. Then the link is automatically isolated to not distribute traffic. + +#### 2.8.1.1 Cli commands + +Several commands will be added to set fabric link monitor config parameters. +``` +> config fabric port monitor error threshold <#crcCells> <#rxCells> +``` +The above command can be used to set a fabric link monitoring error threshold. + +#crcCells: Number of errors over specified number of received cells. +#rxCells: Total number of received cells in which errors are monitored. + +If more than #crcCells out of #rxCells received cells seen with error, the fabric link needs to be isolated. + +``` +> config fabric port monitor poll threshold isolation <#polls> +``` +The above command can be used to set the number of consecutive polls in which the threshold needs to be detected to isolate a link. + +``` +> config fabric port monitor poll threshold recovery <#polls> +``` +The above command sets the number of consecutive polls in which no error is detected to unisolate a link . + +``` +> config fabric port isolate [port_id] +``` + +``` +> config fabric port unisolate [port_id] +``` + +Besides the fabric link monitoring algorithm, the above two commands are added. The commands can be used to manually isolate and unisolate a fabric link ( i.e. take the link out of service and put the link back into service ). The two commands can help us debug on the system as well as force isolate a fabric link. + + +### 2.8.2 Monitor Fabric Capacity + +When the fabric link monitoring feature is enabled, fabric links may not be operational in a system due to link down, or link isolation by the monitoring algorithm. As a result, the effective capacity of total fabric links may be less than required bandwidth, and lead to performance degradation. Implementing a capacity monitoring algorithm in Orchagent will be useful to alert capacity changes. This feature is for forwarding ASICs on Linecards. + +#### 2.8.2.1 Cli command + +``` +> config fabric monitor capacity threshold <50-100> +``` +The above command is used to configure a capacity threshold to trigger alerts when total fabric link capacity goes below it. + +A show command is added to display the fabric capacity on a system. + +``` +> show fabric monitor capacity +Monitored fabric capacity threshold: 90% + +ASIC Operating Total # % Last Event Last Time + Links of Links +----- ------ -------- ---- ---------- --------- +0 110 112 98 None Never +1 112 112 100 None Never +.... +``` + +#### 2.8.2.2 Monitoring algorithm + +Orchagent will track the total number of fabric links that are isolated. Once the number of total operational fabric links is below a configured threshold, alert users with a system log. The action is very conservative in this document, and can be extended to other actions like shutdown the ASIC in the future. + +### 2.8.3 Monitor Traffic on Fabric Links + +Monitoring traffic on fabric links is another important tool to diagnose fabric hardware issues. It is useful to identify when traffic is unbalanced among fabric links which are connected to the same forwarding ASIC. It can also help identify miswired links. + +#### 2.8.3.1 Cli command + +The following proposed CLI is used to show the traffic among fabric links on both fabric ASICs and forwarding ASICs. + +``` +> show fabric counters rate mbps + + Asic Link RX TX + ID + –------ ----- --------- ---------- + 0 1 0 36113 + .... + 0 19 0 36107 + 0 20 0 36110 + .... +``` + +# 3 SAI API + +The fabric port monitoring adds a new attribute, SAI_PORT_ATTR_FABRIC_ISOLATE. The new API can be used to isolate fabric ports. + +# 4 Configuration and management + +## 4.1 Config DB Enhancements + +Two tables are added into CONFIG DB for this feature. + +The FABRIC_PORT table contains information on a fabric port's alias, isolated status, and lanes. Below is an example CONFIG DB snippet: + +``` +{ +"FABRIC_PORT": { + "Fabric0": { + "alias": "Fabric0", + "isolateStatus": "False", + "lanes": "0" + }, + "Fabric1": { + "alias": "Fabric1", + "isolateStatus": "False", + "lanes": "1" + } +} +``` + +The FABRIC_MONITOR table contains information related to fabric port monitoring. An sample CONFIG DB snippet is shown below. + +``` +{ +"FABRIC_MONITOR": { + "FABRIC_MONITOR_DATA": { + "monErrThreshCrcCells": "1", + "monErrThreshRxCells": "61035156", + "monPollThreshIsolation": "1", + "monPollThreshRecovery": "8" + } + } +} +``` + +## 4.2 CLI/YANG model Enhancements + +A new module, sonic-fabric-port, is added for Fabric port table. Three new leaves added to this module, called isolateStatus, alias, and lanes. + +Snippet of sonic-fabric-port.yang: + +``` +module sonic-fabric-port{ + ... + container sonic-fabric-port { + container FABRIC_PORT { + description "FABRIC_PORT part of config_db.json"; + list FABRIC_PORT_LIST { + key "name"; + + leaf name { + type string { + length 1..128; + } + } + + leaf isolateStatus { + type string { + pattern "False|True"; + } + } + + leaf alias { + type string { + length 1..128; + } + } + + leaf lanes { + type string { + length 1..128; + } + } + } /* end of list FABRIC_PORT_LIST */ + } /* end of container FABRIC_PORT */ + } /* end of container sonic-fabric-port */ +} /* end of module sonic-fabric-port */ +``` + +Module sonic-fabric-monitor is added for FABRIC_MONITOR. New leaves are added as well for fabric port monitoring. + +Snippet of sonic-fabric-monitor.yang: + +``` +module sonic-fabric-monitor{ + ... + description "FABRIC_MONITOR yang Module for SONiC OS"; + + container sonic-fabric-monitor { + container FABRIC_MONITOR { + description "FABRIC_MONITOR part of config_db.json"; + container FABRIC_MONITOR_DATA { + + leaf monErrThreshCrcCells { + type uint32; + default 1; + } + + leaf monErrThreshRxCells { + type uint32; + default 61035156; + } + + leaf monPollThreshIsolation { + type uint32; + default 1; + } + + leaf monPollThreshRecovery { + type uint32; + default 8; + } + } /* end of container FABRIC_MONITOR_DATA */ + } /* end of container FABRIC_MONITOR */ + } /* end of container sonic-fabric-monitor */ +} /* end of module sonic-fabric-monitor */ + +``` + + +## 4.3 CLI + +Several new CLI commands are added for this feature. + +Command to display fabric counters port. + +``` +> show fabric counters port +``` + +Command to display fabric counters queue. -# 3 Testing +``` +> show fabric counters queue +``` + +Command to display fabric status. + +``` +> show fabric reachability +``` + +Command to set a fabric link monitoring error threshold. + +``` +> config fabric port monitor error threshold <#crcCells> <#rxCells> +``` + +Command to set the number of consecutive polls in which the threshold needs to be detected to isolate a link. + +``` +> config fabric port monitor poll threshold isolation <#polls> +``` + +Command to set the number of consecutive polls in which no error is detected to unisolate a link. + +``` +> config fabric port monitor poll threshold recovery <#polls> +``` + +Commands to manually isolate and unisolate a fabric link. + +``` +> config fabric port isolate [port_id] + +> config fabric port unisolate [port_id] +``` + +Command to display the fabric link isolated status. + +``` +> show fabric isolation +``` + +Command to display the fabric capacity on a system. + +``` +> show fabric monitor capacity +``` + +Command to configure a capacity threshold to trigger alerts when total fabric link capacity goes below it. + +``` +> config fabric monitor capacity threshold < threshold > +``` + +Command to show the traffic among fabric links. + +``` +> show fabric counters rate mbps +``` + + +# 5 Warmboot and Fastboot Design Impact + +The existing warmboot/fastboot feature is not affected due to this design. + +# 6 Testing Fabric port testing will rely on sonic-mgmt tests that can run on chassis hardware. -- Test fabric port mapping: To verify the fabric mapping, we can inspect the remote switch ID that are saved in the STATE_DB and match that with the known chassis architecture. +- Test fabric port mapping: To verify the fabric mapping, we can inspect the remote switch ID that are saved in the STATE_DB and match that with the known chassis architecture. More comprehensive information about this testing can be found in the Chassis Fabric Test Plan document, which is available at testplan/Chassis-fabric-test-plan.md. - Test traffic and counters: Send traffic through the chassis and verify traffic going through fabric ports via counters. -# 4 Future Work +- Test fabric port monitoring: + * Use the CLI to isolate/unisolate fabric ports, and verify whether the corresponding STATE_DB entries are updated. + * Create simulated errors (e.g., CRC errors) on a fabric port, and confirm that the algorithm takes appropriate action and updates the corresponding STATE_DB entries. + * Test fabric capcity monitoring: This test involves isolating/unisolating fabric ports on the system and checking that the 'show fabric capacity' command updates its output correctly to reflect the changes. + +# 7 Open/Action items - if any - In this proposal, all fabric ports on fabric ASICs or forwarding ASICs that join to form the fabric network will be enabled even when there are no peer ports available. We could provide a config model for the platforms to express the expected fabric connectivity and turn off unnecessary fabric ports. - Fabric ports that do not have a peer port will show up as a ‘down’ port. Fabric ports that do have a peer port could also go ‘down’ and there is no current way to differentiate this from a fabric port that does not have a peer port. This can be detected if the config model can express the expected fabric connectivity. -- Monitor, detect and disable fabric ports that consistently show errors. +# 8 Restrictions/Limitations +TBD