-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sonic-swss]: Added flap counter and last flap time functionality. #928
base: master
Are you sure you want to change the base?
Conversation
retest this please |
5 similar comments
retest this please |
retest this please |
retest this please |
retest this please |
retest this please |
Please review the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this enhancement. This piece of information is definitely an important one that is missing in the current SONiC system. I have the following suggestions that open for discussion:
- since the flap counter is initialized as 0, and there is no other components editing this variable, is it okay to store the number in orchagent without querying the value from the database to get the old value?
- could you add the flap_counter and last_flap_ts as variables for the struct Port?
While I do have some concerns on whether to put these pieces of information into which database, since the current operational status is already stored in the application database, I am fine with appending these two entries there for now. But we could also discuss on whether to migrate the information into state database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check my comments
{ | ||
SWSS_LOG_ENTER(); | ||
|
||
time_t now = time(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the getTimestamp()
function instead
{ | ||
return; | ||
} | ||
updateDbPortLastFlapTime(new_tuples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move these two functions into updatePortOperStatus
function instead of here.
under the two lines:
updateDbPortOperStatus(port, status);
port.m_oper_status = status;
@@ -2500,6 +2559,8 @@ bool PortsOrch::initializePort(Port &port) | |||
{ | |||
port.m_oper_status = SAI_PORT_OPER_STATUS_DOWN; | |||
} | |||
m_portTable->hset(port.m_alias, "flap_counter", "0"); | |||
m_portTable->hset(port.m_alias, "last_flap", "N/A"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should go to state db?
@@ -1239,14 +1241,71 @@ bool PortsOrch::setHostIntfsOperStatus(const Port& port, bool isUp) const | |||
return true; | |||
} | |||
|
|||
void PortsOrch::updateDbPortFlapCounter(const Port& port, vector<FieldValueTuple>& old_tuples, vector<FieldValueTuple>& new_tuples) const | |||
{ | |||
SWSS_LOG_ENTER(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tab = 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use redis incr and format your changes.
redis incr is nice but it requires much more changes. i suggest a design review first |
how to move forward on this one? schedule a call to discuss this feature? @prprakas to schedule? |
@lguohan will add a vs-test for this, I think that should be good to move ahead with this as per the communication from @zhenggen-xu |
@prprakas , can you fix simply comments we have requested, also can you share the ETA for adding vs test? |
- Added natmgr and natorch changes. - Added nat Zone related changes. - Added Warm reboot changes. Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni akhilesh.samineni@broadcom.com
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
…n_sonic Natmgrd changes in sonic-swss sub module to support NAT feature.
…sonic-net#1156) * [aclorch] Validate that provided IN/OUT_PORTS are physical interfaces - Add a check during the rule validation step to make sure provided IN/OUT_PORTS rules only include physical interfaces - Add vs test cases for invalid IN/OUT_PORTS inputs Signed-off-by: Danny Allen <daall@microsoft.com> * Clarify test names * Add extra delay for table teardown * Check if new tests are causing tests to fail * Check if new tests are causing tests to fail * Check if ProducerStateTable is having problems * Make delete ACL table test more strict * Remove invalid interface test cases * Undo change to delete test * Undo change to delete test
…ges_in_sonic Natsyncd changes in sonic-swss submodule to support NAT feature.
…es_in_sonic Orchagent changes in sonic-swss submodule to support NAT feature.
…t and queue stats (sonic-net#1170) - Updates portsorch to use FlexCounterManager for port and queue stats instead of direct redis operations - Updates FlexCounterManager to allow clients to enable the group in the constructor, for convenience - Updates the makefile for cfgmgr to include new flex_counter dependency from portsorch Signed-off-by: Danny Allen <daall@microsoft.com>
Signed-off-by: Danny Allen <daall@microsoft.com>
…asons (sonic-net#1173) - Refactor drop reason capability query to trim SAI prefixes - Store device capabilities in orchagent to perform safety checks Fixes sonic-net#1136 - Rather than depending on each ASIC vendor to follow the same error handling doctrine, this PR validates HW support in orchagent, which should be more reliable. Related to sonic-net/sonic-utilities#785 - In order to validate user input, we need to remove the SAI prefixes before we store the results. This removes the need for the CLI to perform these checks. Signed-off-by: Danny Allen <daall@microsoft.com>
…#1141) Clear loopback interfaces in kernel at intfmgrd starting except warmstart. Clear vrf interfaces in kernel at vrfmgrd starting except warmstart.
…warm reboot" issue (sonic-net#1171) * during warm-reboot teamd need to use same system-id before warm-reboot Signed-off-by: shine.chen <shine.chen@nephosinc.com> * refine per review comment Signed-off-by: shine.chen <shine.chen@mediatek.com> * skip saved pdu on negotiation stage Signed-off-by: shine.chen <shine.chen@mediatek.com> * reformat teammgr file
Signed-off-by: Danny Allen <daall@microsoft.com>
…onic-net#1158) Otherwise, we can see that DUT after warm reboot sends ARP packets with management IP (10.112.71.4) Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
…onic-net#1166) Originally portsorch was designed to remove LAG member from LAG when member gets disabled by teamd. This could lead to potential issues including flood to that port and loops, since after removal member becomes a switch port. Now, portsorch will make use of SAI_LAG_MEMBER_ATTR_INGRESS_DISABLE and SAI_LAG_MEMBER_ATTR_EGRESS_DISABLE to control collection/distribution through that LAG member port. With this new flow, teammgrd and teamsyncd are candidates to be refactored, especially teamsyncd warm reboot logic, since now we don't need to compare old, new lags and lag members. Teamsyncd's job is simply to signal "status" field update without waiting for reconciliation timer to expire. e.g. one port channel went down on peer: ``` admin@arc-switch1025:~$ show interfaces portchannel Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available, S - selected, D - deselected, * - not synced No. Team Dev Protocol Ports ----- --------------- ----------- -------------- 0001 PortChannel0001 LACP(A)(Up) Ethernet112(S) 0002 PortChannel0002 LACP(A)(Up) Ethernet116(S) 0003 PortChannel0003 LACP(A)(Up) Ethernet120(S) 0004 PortChannel0004 LACP(A)(Dw) Ethernet124(D) admin@arc-switch1025:~$ docker exec -it syncd sx_api_lag_dump.py LAG Members Table =============================================================================================================== | SWID| LAG Logical Port| LAG Oper State| PVID| Member Port ID| Port Label| Collector| Distributor| Oper State| =============================================================================================================== | 0 0x10000000 UP 1| 0x11900| 29| Enable| Enable| UP| =============================================================================================================== | 0 0x10000100 UP 1| 0x11b00| 30| Enable| Enable| UP| =============================================================================================================== | 0 0x10000200 UP 1| 0x11d00| 31| Enable| Enable| UP| =============================================================================================================== | 0 0x10000300 DOWN 1| 0x11f00| 32| Disable| Disable| UP| =============================================================================================================== ``` Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
…ite (sonic-net#1185) Signed-off-by: Danny Allen <daall@microsoft.com>
InitSystemAcl table is removed. ECN ACL rules: DSCP 0/8 mapping to yellow are removed. Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
retest this please |
Number of times a link got reset(up/down) has been added under this change, also the last flap timestamp for the link has also been added, they will be updated once the oper status of any port changes. 2 additional fields have been added to appl_db i.e. "flap_counter" and "last_flap_time" which willbe updated when "oper_status" is updated.
Signed-off-by: Prem Prakash prprakash@linkedin.com
What I did
Added flap counter and last flap timestamp feature
Why I did it
This would help in solving operational issues faster. Instead of relying on the logs to figure out a bad flapper we can see this feature under status command.
How I verified it
Tested it by building up the docker-orchagent.gz after my change, changes were required on sonic-utilities repo as well to see the output, I will raise a parallel PR for that.
flap_output.txt