-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FDB] [202012] Fix fbdorch to properly handle syncd FDB FLUSH Notif #2401
Conversation
…#2254) Vlan delete couldn't be handled by OA when there is fdb learnt on the member and when the member is deleted This inability of handling APPL_DB notif is affecting warm-restart. FDB Entry from State DB is not removed. OA doesn't have the logic to handle consolidate flush notif coming from syncd FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters during a flush notification Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
- What I did using a copy of FDBEntry fields (stored in FDBUpdate) instead of a reference since the reference gets invalidated in the storeFdbEntryState() simplified clearFdbEntry() interface - Why I did it To fix the memory usage issue The issue is that the SWSS_LOG_INFO() uses the mac&, port_alias&, and bv_id& which are invalidated in the storeFdbEntryState(). - How I verified it Run the tests that were used to find the issues and checked the ASAN report Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This is backport of PR #2254 from |
fdb flush behavior for 202012 has been set for some SAI version and backporting may have side effects. Since this is more than a bug fix, I think we may not need to backport this immediately. |
The only flow this changes affects is when there is a notification of type SAI_FDB_EVENT_FLUSHED coming from syncd. And as i understand, the SAI used in 202012 also suggests that there might be a consolidated flush notification which can be received from syncd. https://github.com/opencomputeproject/SAI/blob/v1.7/inc/saifdb.h#L292 and we've seen it on the latest 202012 image. And currently in 202012, this seems unsupported (flush per port and per vlan) https://github.com/sonic-net/sonic-swss/blob/202012/orchagent/fdborch.cpp#L448 and thus the corresponding state_db entries are not removed.
This is causing problems during warm-reboot:
so, i believe this fix is required. |
/easycla |
@prsunny do we have plan to merge this PR? |
@@ -5355,3 +5355,17 @@ std::unordered_set<std::string> PortsOrch::generateCounterStats(const string& ty | |||
} | |||
return counter_stats; | |||
} | |||
|
|||
bool PortsOrch::decrFdbCount(const std::string& alias, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont see it being done in the previous flow. Why is it newly done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the m_fdb_count is decremented during the SAI_FDB_EVENT_AGED notification. So, for the SAI_FDB_EVENT_FLUSHED i think the same is required. https://github.com/sonic-net/sonic-swss/blob/202012/orchagent/fdborch.cpp#L227
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm if this is tested on 202012? I see the ref count as a new change and not part of the FLUSH handling itself for 202012.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, when the corresponding fdb entry expires/removed, some vendor SDK's generate AGED notification and some vendors generate FLUSH notification.
In 202012, we have the code which decrements the m_fdb_count when there is an AGED notification https://github.com/sonic-net/sonic-swss/blob/202012/orchagent/fdborch.cpp#L311, so when the capability to handle FLUSH notification is added, i see no issues in decrementing m_fdb_count as this keeps the implementation consistent.
As for testing this change, do you mean for me to check if m_fdb_count is decremented when there is an AGED notification?
Update sonic-swss submodule pointer to include the following: * [FDB] [202012] Fix fbdorch to properly handle syncd FDB FLUSH Notif ([sonic-net#2401](sonic-net/sonic-swss#2401)) * Support for platforms based on Clounix Networks' device ([sonic-net#2399](sonic-net/sonic-swss#2399)) Signed-off-by: dprital <drorp@nvidia.com>
…onic-net#2401) * [FDB] Fix fbdorch to properly handle syncd FDB FLUSH Notif (sonic-net#2254) Vlan delete couldn't be handled by OA when there is fdb learnt on the member and when the member is deleted This inability of handling APPL_DB notif is affecting warm-restart. FDB Entry from State DB is not removed. OA doesn't have the logic to handle consolidate flush notif coming from syncd FdbOrch doesn't have logic to clear internal cache and decrement corresponding fdb counters during a flush notification Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
What I did
Backport the fdborch fix to properly handle syncd FDB flush notifications.
Why I did it
How I verified it
Details if related