-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended port operational status with various error and fault status #2060
Conversation
83ff60a
to
2f120bb
Compare
inc/saiport.h
Outdated
* @type bool | ||
* @flags READ_ONLY | ||
*/ | ||
SAI_PORT_ATTR_MAC_LOCAL_FAULT_STATUS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it better to add this as PORT_ATTR_FAULT_STATUS and then define enum such as mac_local_faluse, remote fault, and in the future could be other types of fault.
@prgeor @lguohan Ideally we should have the port errors as part of the existing oper_status_t port notification, is the consensus. typedef enum _sai_port_oper_status_t
.... |
@JaiOCP we decided to not use existing notification instead created new notification which can be used in future by adding new faults (apart from local/remote) faults. |
> @JaiOCP <https://github.com/JaiOCP> we decided to not use existing
notification instead created new notification which can be used in future
by adding new faults (apart from local/remote) faults.
Who decided not to use the existing notification?? Was there a review
meeting in the community or any offline discussion?
As I mentioned we had an offline meeting with folks from MSFT, NV, Marell
and there were many concerns raised with such an approach.
Then we need to answer all the questions that were raised in the small
group discussion along with folks from Nvidia.
1. How do we know a given error is cleared or is not asserted now.
2. How do we handle the async notifications? for e.g. race condition
between this error notification and port oper status notification
Probably some more.
…On Mon, Aug 19, 2024 at 12:12 PM Prince George ***@***.***> wrote:
@prgeor <https://github.com/prgeor> @lguohan <https://github.com/lguohan>
We discussed this topic offline with other vendors. There is a issue with
having a separate port error notification as the callback can reach in any
order along with port oper status. Besides this there is also the issue of
when the error state is cleared.
#2048 <#2048>
Ideally we should have the port errors as part of the existing
oper_status_t port notification, is the consensus. Please look at the
private branch (PR is not yet opened)
typedef enum _sai_port_oper_status_t { /** Unknown */
SAI_PORT_OPER_STATUS_UNKNOWN,
/** Up */
SAI_PORT_OPER_STATUS_UP,
/** Down */
SAI_PORT_OPER_STATUS_DOWN,
/** Test Running */
SAI_PORT_OPER_STATUS_TESTING,
/** Not Present */
SAI_PORT_OPER_STATUS_NOT_PRESENT,
/** Remote-fault observed on the port */
SAI_PORT_OPER_STATUS_REMOTE_FAULT,
/** Local-fault for example loss of signal and more */
SAI_PORT_OPER_STATUS_LOCAL_FAULT,
/** Pre-emphasis set failed when configuration is done*/
SAI_PORT_OPER_STATUS_PREEMPHASIS_SET_FAILED,
/** FEC set failed when configuration is done */
SAI_PORT_OPER_STATUS_FEC_SET_FAILED,
/** Speed set failed when configuration is done */
SAI_PORT_OPER_STATUS_SPEED_SET_FAILED,
/** Interface type set failed when configuration is done */
SAI_PORT_OPER_STATUS_IF_TYPE_SET_FAILED,
/** Media type set failed when configuration is done */
SAI_PORT_OPER_STATUS_MEDIA_TYPE_SET_FAILED,
....
@JaiOCP <https://github.com/JaiOCP> we decided to not use existing
notification instead created new notification which can be used in future
by adding new faults (apart from local/remote) faults.
—
Reply to this email directly, view it on GitHub
<#2060 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKCSHLIS6VZZV5ZTN3MPCJ3ZSI7QJAVCNFSM6AAAAABL5T7KGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJXGI2TSNRUGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
|
inc/saiport.h
Outdated
typedef enum _sai_port_error_status_t | ||
{ | ||
/** No errors */ | ||
SAI_PORT_ERROR_STATUS_CLEAR = 0, | ||
|
||
SAI_PORT_ERROR_STATUS_MAC_LOCAL_FAULT = 1, | ||
|
||
SAI_PORT_ERROR_STATUS_MAC_REMOTE_FAULT = 2, | ||
|
||
SAI_PORT_ERROR_STATUS_FEC_SYNC_LOSS = 4, | ||
|
||
SAI_PORT_ERROR_STATUS_FEC_LOSS_ALIGNMENT_MARKER = 8 | ||
} sai_port_error_status_t; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already an enum sai_port_err_status_t that has similar status'. Can we reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshitgulati18 thanks for pointing out. sai_port_err_status_list_t
is a list of various port error which needs to be traversed to find out which ERROR is set. As the list grows it becomes less efficient whereas in this PR the proposal is to use 64-bit bitmap of errors which is more efficient. I guess 64 bit is sufficient to cover port related errors...
inc/saiport.h
Outdated
|
||
SAI_PORT_ERROR_AUTONEG_FAILED=0x8, | ||
|
||
SAI_PORT_ERROR_LINK_TRAINING_FAILED=0x10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplication, appears twice
SAI_PORT_ERROR_STATUS_FEC_SYNC_LOSS = 4, | ||
|
||
SAI_PORT_ERROR_STATUS_FEC_LOSS_ALIGNMENT_MARKER = 8 | ||
} sai_port_error_status_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add SAI_PORT_ERROR_STATUS_HIGH_SER and SAI_PORT_ERROR_STATUS_HIGH_BER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyk-nvidia added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated?
/** | ||
* @brief Attribute bitmap data for #SAI_PORT_ATTR_ERROR_STATUS | ||
*/ | ||
typedef enum _sai_port_error_status_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are any other values from sai_port_err_status_t needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeberesford may I know which one you are interested to be added here? I am not sure if I can duplicate the attributes here but I can try...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally mostly interested in local/remote fault, just want to make sure the others are not useful or are covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to Local/Remote fault, SAI_PORT_ERR_STATUS_CRC_RATE would also be good to include here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshitgulati18 added now
@mikeberesford may I know what is blocking? |
I believe we were going to mark sai_port_err_status_list_t/SAI_PORT_ATTR_ERR_STATUS_LIST as deprecated? |
@mikeberesford can you check now? |
Please add |
@mikeberesford added |
Still not done? I don't appear to be able to leave a comment on the line, but I still see:
|
@@ -1935,6 +1988,7 @@ typedef enum _sai_port_attr_t | |||
* | |||
* @type sai_port_err_status_list_t | |||
* @flags READ_ONLY | |||
* @deprecated true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeberesford now added here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JaiOCP can you review?
@@ -1935,6 +1988,7 @@ typedef enum _sai_port_attr_t | |||
* | |||
* @type sai_port_err_status_list_t | |||
* @flags READ_ONLY | |||
* @deprecated true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@kcudnik can you please check if this PR can be merged? This has been approved by multiple reviewers. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a backward compatible change if NOS and SAI are not using the same header files. We need to capture this in release notes.
Signed-off-by: Prince George <prgeor@microsoft.com>
…teproject#2060) Port status change notification now adds additional field to indicate various port error/fault status that caused the port operational status to go down. A capability attribute is added to query the Switch's capability that can be used to know if the new fault/error status are supported in the port status change notification. sai_port_error_status_t bitmap can indicate more than one port error conditions by setting the corresponding error bit indicating why the port went down. Ideally, when the port operational status goes UP, the bitmap value should be ZERO i.e no errors. Adding the error status as part of the port link status change notification allows better correlation of port operational status with various port error/fault conditions events that may have caused the port down event. Signed-off-by: Prince George <prgeor@microsoft.com> Signed-off-by: siqbal1986 <shahzad.iqbal@microsoft.com>
@@ -89,6 +140,8 @@ typedef struct _sai_port_oper_status_notification_t | |||
/** Port operational status */ | |||
sai_port_oper_status_t port_state; | |||
|
|||
/** Bitmap of various port error or fault status */ | |||
sai_port_error_status_t port_error_status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is breaking change not backward compatible, could cause some issues on sonic when deserialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding this extra field we should make this RO attribute per port, and if user would be interested in port status, then it could query that attribute in notification using SAI api, then everything would be backward compatible, we can still move this around
* @brief Attribute data for #SAI_PORT_ATTR_ERROR_STATUS | ||
* Note enum values must be powers of 2 to be used as Bit mask to query multiple errors | ||
* | ||
* @flags free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be changed to "strict" then it will force validator to chack if all enum are power of 2 to catch user mistakes
typedef enum _sai_port_error_status_t | ||
{ | ||
/** No errors */ | ||
SAI_PORT_ERROR_STATUS_CLEAR = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor @lguohan I added exception for that, but i would vote that this flag should be removed, since its not real flag, usually flags are used like this:
if (x & FLAG) == FLAG)
checks if flag is set, in this case if you use CLEAR value then this condidtion will be always true, and CLEAR is not part of actual flags that can be set, it seems like intention is to do this:
if (x == CLEAR) ...
but i think that is a bad design, more natural is first code snipper or eventually it could be done like this:
if (x) ... // some flags are set
do we need a live discussion on this or i can just remove this entry ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kcudnik sai_port_error_status_t
being used as enum or flag is up to the implementation. I can use enum value like below:-
I can use the enum value like so
if (port_error_status)
{
// Found Errors
}
else {
// No Errors
}
`
In another implementation I can look for only interested errors where flag could be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the same as i wrote in my comment, but there eis no need for SAI_PORT_ERROR_STATUS_CLEAR enum value zero, im proposing to remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kcudnik why do you say there is no need for SAI_PORT_ERROR_STATUS_CLEAR
? I am planning to use it when there is port UP event and gave the example above .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because that enum is not an actual flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kcudnik as I mentioned I am planning to use sai_port_error_status_t
as both as an enum value and also also as a flag. I don't want to check each and every flag rather at time I am interested only in few flags in which case I know the enum value and in some case I may be interested only to check one flag. It not just about CLEAR flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLEAR is not a flag is definition of absence of any flags,, read my first comment in this thread
@lguohan @JaiOCP @rlhui @prgeor since we agreed to propose new notification extended struct count/attr_list which i prepared here: #2087 i propose to override/rewrite history without ever changing sai_port_oper_status_notifiation_t struct in the first place to be backward compatible all the time, does this sound good ? |
…teproject#2060) Port status change notification now adds additional field to indicate various port error/fault status that caused the port operational status to go down. A capability attribute is added to query the Switch's capability that can be used to know if the new fault/error status are supported in the port status change notification. sai_port_error_status_t bitmap can indicate more than one port error conditions by setting the corresponding error bit indicating why the port went down. Ideally, when the port operational status goes UP, the bitmap value should be ZERO i.e no errors. Adding the error status as part of the port link status change notification allows better correlation of port operational status with various port error/fault conditions events that may have caused the port down event. Signed-off-by: Prince George <prgeor@microsoft.com>
Port status change notification now adds additional field to indicate various port error/fault status that caused the port operational status to go down. A capability attribute is added to query the Switch's capability that can be used to know if the new fault/error status are supported in the port status change notification.
sai_port_error_status_t
bitmap can indicate more than one port error conditions by setting the corresponding error bit indicating why the port went down. Ideally, when the port operational status goes UP, the bitmap value should be ZERO i.e no errors.Adding the error status as part of the port link status change notification allows better correlation of port operational status with various port error/fault conditions events that may have caused the port down event.