Skip to content

Commit

Permalink
Update SAI API, Configuration and management sections.
Browse files Browse the repository at this point in the history
  • Loading branch information
jfeng-arista authored and kenneth-arista committed Apr 4, 2023
1 parent 3aca6ec commit 57b1dd0
Showing 1 changed file with 230 additions and 13 deletions.
243 changes: 230 additions & 13 deletions doc/voq/fabric.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
# Fabric port support on Sonic
# Fabric port support on SONiC

# High Level Design Document
#### Rev 1

# Table of Contents
* [List of Tables](#list-of-tables)
* [List of Figures](#list-of-figures)
* [Revision](#revision)
* [About this Manual](#about-this-manual)
* [Scope](#scope)
* [Definitions/Abbreviations](#definitionsabbreviations)
* [Overview](#overview)
* [1 Requirements](#1-requirements)
* [2 Design](#2-design)
* [3 Testing](#3-testing)
* [4 Future Work](#4-future-work)
* [3 SAI API](#3-sai-api)
* [4 Configuration and management](#4-configuration-and-management)
* [5 Warmboot and Fastboot Design Impact](#5-warmboot-and-fastboot-design-impact)
* [6 Testing](#6-testing)
* [7 Open/Action items - if any](#7-openaction-items---if-any)
* [8 Restrictions/Limitations](#8-restrictionslimitations)

# List of Tables
* [Table 1: Abbreviations](#definitionsabbreviations)
Expand All @@ -28,10 +31,7 @@
| 2 | Oct-20 2020 | Ngoc Do, Eswaran Baskaran (Arista Networks) | Update counter information |
| 2.1 | Nov-17 2020 | Ngoc Do, Eswaran Baskaran (Arista Networks) | Minor update on container starts |
| 3 | Jun-3 2022 | Cheryl Sanchez, Jie Feng (Arista Networks) | Update on fabric link monitoring |

# About this Manual

This document provides an overview of the SONiC support for fabric ports that are present in a VOQ-based chassis. These fabric ports are used to interconnect the forwarding Network Processing Units within the VOQ chassis.
| 3.1 | Mar-30 2023 | Jie Feng (Arista Networks) | Update Overview, SAI API and Configuration and management section |

# Scope

Expand All @@ -51,6 +51,10 @@ This document builds on top of the VOQ chassis architecture discussed [here](htt
| ASIC | Application Specific Integrated Circuit | In addition to NPUs, also includes fabric chips that could forward packets or cells. |
| cell | Fabric Data Units | The data units that traverse a cell-based chassis fabric. |

# Overview

This document provides an overview of the SONiC support for fabric ports that are present in a VOQ-based chassis. These fabric ports are used to interconnect the forwarding Network Processing Units within the VOQ chassis.

# 1 Requirements

Fabric ports are used in systems in which there are multiple forwarding ASICs are required to be connected. Traffic passes from one front panel port in a forwarding ASIC over a fabric network to one or multiple front panel ports on one or other ASICs. The fabric network is formed using fabric ASICs. Fabric links on the fabric network connect fabric ports on forwarding ASICs to fabric ports on fabric ASICs.
Expand Down Expand Up @@ -166,7 +170,7 @@ When a forwarding ASIC is initialized, the fabric ports are initialized by defau

### 2.7.1 Fabric Status

In a later phase, a `show fabric status` command will be added to show the remote switch ID and link ID for each fabric link of an ASIC. The command will be added for both forwarding ASICs on Linecards and fabric ASICs on Fabric cards. This will be obtained from the SAI_PORT_ATTR_FABRIC_REACHABILITY port attribute of the fabric port. Note that for fabric links that do not have a link partner because of the configuration of the chassis, this will show the status as `down`. The status will also be `down` for fabric links that are down due to some other physical error. To identify links that are down due to error vs links that are not expected to be up because of the chassis connectivity, we need to build up a list of expected fabric connectivity for each ASIC. This can be computed ahead of time based on the vendor configuration and populated in the minigraph. This will be implemented in a later phase.
In a later phase, a `show fabric reachability` command will be added to show the remote switch ID and link ID for each fabric link of an ASIC. The command will be added for both forwarding ASICs on Linecards and fabric ASICs on Fabric cards. This will be obtained from the SAI_PORT_ATTR_FABRIC_REACHABILITY port attribute of the fabric port. Note that for fabric links that do not have a link partner because of the configuration of the chassis, this will show the status as `down`. The status will also be `down` for fabric links that are down due to some other physical error. To identify links that are down due to error vs links that are not expected to be up because of the chassis connectivity, we need to build up a list of expected fabric connectivity for each ASIC. This can be computed ahead of time based on the vendor configuration and populated in the minigraph. This will be implemented in a later phase.

## 2.8 Fabric Link Monitor

Expand Down Expand Up @@ -272,17 +276,230 @@ The following proposed CLI is used to show the traffic among fabric links on bot
....
```

# 3 Testing
# 3 SAI API

The fabric port monitoring adds a new attribute, SAI_PORT_ATTR_FABRIC_ISOLATE. The new API can be used to isolate fabric ports.

# 4 Configuration and management

## 4.1 Config DB Enhancements

Two tables are added into CONFIG DB for this feature.

The FABRIC_PORT table contains information on a fabric port's alias, isolated status, and lanes. Below is an example CONFIG DB snippet:

```
{
"FABRIC_PORT": {
"Fabric0": {
"alias": "Fabric0",
"isolateStatus": "False",
"lanes": "0"
},
"Fabric1": {
"alias": "Fabric1",
"isolateStatus": "False",
"lanes": "1"
}
}
```

The FABRIC_MONITOR table contains information related to fabric port monitoring. An sample CONFIG DB snippet is shown below.

```
{
"FABRIC_MONITOR": {
"FABRIC_MONITOR_DATA": {
"monErrThreshCrcCells": "1",
"monErrThreshRxCells": "61035156",
"monPollThreshIsolation": "1",
"monPollThreshRecovery": "8"
}
}
}
```

## 4.2 CLI/YANG model Enhancements

A new module, sonic-fabric-port, is added for Fabric port table. Three new leaves added to this module, called isolateStatus, alias, and lanes.

Snippet of sonic-fabric-port.yang:

```
module sonic-fabric-port{
...
container sonic-fabric-port {
container FABRIC_PORT {
description "FABRIC_PORT part of config_db.json";
list FABRIC_PORT_LIST {
key "name";
leaf name {
type string {
length 1..128;
}
}
leaf isolateStatus {
type string {
pattern "False|True";
}
}
leaf alias {
type string {
length 1..128;
}
}
leaf lanes {
type string {
length 1..128;
}
}
} /* end of list FABRIC_PORT_LIST */
} /* end of container FABRIC_PORT */
} /* end of container sonic-fabric-port */
} /* end of module sonic-fabric-port */
```

Module sonic-fabric-monitor is added for FABRIC_MONITOR. New leaves are added as well for fabric port monitoring.

Snippet of sonic-fabric-monitor.yang:

```
module sonic-fabric-monitor{
...
description "FABRIC_MONITOR yang Module for SONiC OS";
container sonic-fabric-monitor {
container FABRIC_MONITOR {
description "FABRIC_MONITOR part of config_db.json";
container FABRIC_MONITOR_DATA {
leaf monErrThreshCrcCells {
type uint32;
default 1;
}
leaf monErrThreshRxCells {
type uint32;
default 61035156;
}
leaf monPollThreshIsolation {
type uint32;
default 1;
}
leaf monPollThreshRecovery {
type uint32;
default 8;
}
} /* end of container FABRIC_MONITOR_DATA */
} /* end of container FABRIC_MONITOR */
} /* end of container sonic-fabric-monitor */
} /* end of module sonic-fabric-monitor */
```


## 4.3 CLI

Several new CLI commands are added for this feature.

Command to display fabric counters port.

```
> show fabric counters port
```

Command to display fabric counters queue.

```
> show fabric counters queue
```

Command to display fabric status.

```
> show fabric reachability
```

Command to set a fabric link monitoring error threshold.

```
> config fabric port monitor error threshold <#crcCells> <#rxCells>
```

Command to set the number of consecutive polls in which the threshold needs to be detected to isolate a link.

```
> config fabric port monitor poll threshold isolation <#polls>
```

Command to set the number of consecutive polls in which no error is detected to unisolate a link.

```
> config fabric port monitor poll threshold recovery <#polls>
```

Commands to manually isolate and unisolate a fabric link.

```
> config fabric port isolate [port_id]
> config fabric port unisolate [port_id]
```

Command to display the fabric link isolated status.

```
> show fabric isolation
```

Command to display the fabric capacity on a system.

```
> show fabric monitor capacity
```

Command to configure a capacity threshold to trigger alerts when total fabric link capacity goes below it.

```
> config fabric monitor capacity threshold < threshold >
```

Command to show the traffic among fabric links.

```
> show fabric counters rate mbps
```


# 5 Warmboot and Fastboot Design Impact

The existing warmboot/fastboot feature is not affected due to this design.

# 6 Testing

Fabric port testing will rely on sonic-mgmt tests that can run on chassis hardware.

- Test fabric port mapping: To verify the fabric mapping, we can inspect the remote switch ID that are saved in the STATE_DB and match that with the known chassis architecture.
- Test fabric port mapping: To verify the fabric mapping, we can inspect the remote switch ID that are saved in the STATE_DB and match that with the known chassis architecture. More comprehensive information about this testing can be found in the Chassis Fabric Test Plan document, which is available at testplan/Chassis-fabric-test-plan.md.

- Test traffic and counters: Send traffic through the chassis and verify traffic going through fabric ports via counters.

# 4 Future Work
- Test fabric port monitoring:
* Use the CLI to isolate/unisolate fabric ports, and verify whether the corresponding STATE_DB entries are updated.
* Create simulated errors (e.g., CRC errors) on a fabric port, and confirm that the algorithm takes appropriate action and updates the corresponding STATE_DB entries.
* Test fabric capcity monitoring: This test involves isolating/unisolating fabric ports on the system and checking that the 'show fabric capacity' command updates its output correctly to reflect the changes.

# 7 Open/Action items - if any

- In this proposal, all fabric ports on fabric ASICs or forwarding ASICs that join to form the fabric network will be enabled even when there are no peer ports available. We could provide a config model for the platforms to express the expected fabric connectivity and turn off unnecessary fabric ports.

- Fabric ports that do not have a peer port will show up as a ‘down’ port. Fabric ports that do have a peer port could also go ‘down’ and there is no current way to differentiate this from a fabric port that does not have a peer port. This can be detected if the config model can express the expected fabric connectivity.

# 8 Restrictions/Limitations

TBD

0 comments on commit 57b1dd0

Please sign in to comment.