-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orchagent changes in sonic-swss submodule to support NAT feature. #1125
Orchagent changes in sonic-swss submodule to support NAT feature. #1125
Conversation
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
orchagent/natorch.cpp
Outdated
sai_attribute_t attr; | ||
memset(&attr, 0, sizeof(attr)); | ||
attr.id = SAI_SWITCH_ATTR_AVAILABLE_SNAT_ENTRY; | ||
maxAllowedNatEntries = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename this variable as we are using this to check the number SNAT entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed this variable to maxAllowedSNatEntries.
attr.id = SAI_SWITCH_ATTR_AVAILABLE_SNAT_ENTRY; | ||
maxAllowedNatEntries = 0; | ||
|
||
status = sai_switch_api->get_switch_attribute(gSwitchId, 1, &attr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this SAI call to when NAT is enabled from CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get operation on a unimplemented SAI attribute doesn't result in crash, so retaining in current changes as discussed.
if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
{ | ||
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be can do this check at the beginning of the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAT entries have to be cached irrespective of NAT is enabled or not.
if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
{ | ||
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are counters being updated here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing dynamic entry is replaced by static entry as Static entry has higher priority, so decremented the dynamic and incremented the static counters.
if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
{ | ||
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to clean up the entries which are added to HW as well ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will also delete the entries from hardware.
if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
{ | ||
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, this notification comes when a L3 interface is added to NAT ZONE, so every time a interface is added to a NAT_ZONE. I have 2 questions:
- since all the conntrack entries are flushed, the entries in asic will also be removed right ? What will happen to the traffic, they will be dropped ?
- When we need to flush of conntrack entries from natorch can this be done from natmgr instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This notification comes when do "sonic-clear nat translations" is executed. Yes all the entries in the asic are removed. Current traffic flows will see a drop and relearned again and added to asic.
OA is the component that tries to keep the Kernel conntrack entries and the hardware entries in sync. So, it is doing the flush entries.
if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
{ | ||
SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current SONiC architecture, the kernel entries are not updated/added/deleted from orchagent. Can you move this functionality to natmgr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OA is responsible for checking the hardware activity/hitbit of each of the entry in its cache. Since the cache is in OA code, it is the appropriate place to update the timeouts for the corresponding entries in the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand OA is checking the hitbit, but in the current SONiC architecture the kernel entries are no update from OA. The mgr modules normally updates the kernel entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Arvind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The natmgr has no idea about the hardware activity of nat entries, so it cannot update the entries timeouts in kernel, if natmgr has to update the timeouts, OA has to inform the natmgr through separate communication channel which is an overhead and has it issues with scaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NAT entries are not updated real time, polling for hit-bit happens every 30 seconds, so adding another hop will not have a big impact on the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, lets keep the current code for now. Let's open a issue to track this change. We can make the code changes in the next release.
@@ -73,6 +73,8 @@ static map<string, sai_hostif_trap_type_t> trap_id_map = { | |||
{"udld", SAI_HOSTIF_TRAP_TYPE_UDLD}, | |||
{"bfd", SAI_HOSTIF_TRAP_TYPE_BFD}, | |||
{"bfdv6", SAI_HOSTIF_TRAP_TYPE_BFDV6} | |||
{"src_nat_miss", SAI_HOSTIF_TRAP_TYPE_SNAT_MISS}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure that these trap ids are created only on platform which support NAT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done based on the feature check.
orchagent/Makefile.am
Outdated
chassisorch.cpp \ | ||
debugcounterorch.cpp | ||
|
||
orchagent_SOURCES += flex_counter/flex_counter_manager.cpp flex_counter/flex_counter_stat_manager.cpp | ||
orchagent_SOURCES += debug_counter/debug_counter.cpp debug_counter/drop_counter.cpp | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
orchagent/copporch.h
Outdated
@@ -50,6 +50,7 @@ class CoppOrch : public Orch | |||
protected: | |||
object_map m_trap_group_map; | |||
bool enable_sflow_trap; | |||
bool isNatSupported; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the same style for variables as in file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this variable and using 'gIsNatSupported' global variable.
orchagent/intfsorch.h
Outdated
std::set<IpPrefix> getSubnetRoutes(); | ||
std::map<string, uint32_t> m_nat_zone; | ||
bool isNatSupported; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use same style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this variable and using 'gIsNatSupported' global variable.
orchagent/intfsorch.cpp
Outdated
@@ -1015,3 +1087,30 @@ void IntfsOrch::doTask(SelectableTimer &timer) | |||
} | |||
} | |||
} | |||
|
|||
void IntfsOrch::getNatSupportedInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you are repeating this same function in multiple places. Can you do this query one time, like in switchorch, and use a single API to check if NAT is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Moved this functionality to main.cpp file and updating the global variable 'gIsNatSupported'.
orchagent/intfsorch.cpp
Outdated
m_nat_zone[alias] = nat_zone_id; | ||
if (isNatSupported) | ||
{ | ||
setRouterIntfsNatZoneId(port, nat_zone_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this call. The router interface is created only in setIntf. So this API is always going to return true
after its first check. You are actually setting the zone in as part of Line 808. Can you please check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User configures the new 'nat_zone' even after rif is created, this call is used to set the new 'nat_zone'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it but the set of RIF attribute is currently handled in setIntf. It is done for SUB_PORT currently but I think can be modified to set NAT zone for any RIF type. Another follow up is, why do we need m_nat_zone map seperate? Can this nat zone be part of the port structure, like m_rif_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting NAT zone for Physical, Vlan and Lag interfaces only, so not handled in setIntf.
Yes, nat zone can be part of port structure, made changes to address it.
orchagent/intfsorch.cpp
Outdated
@@ -739,6 +796,21 @@ bool IntfsOrch::addRouterIntfs(sai_object_id_t vrf_id, Port &port) | |||
attr.value.u32 = port.m_mtu; | |||
attrs.push_back(attr); | |||
|
|||
if (isNatSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set operation on a unimplemented SAI attribute (SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID as of SAI 1.5) results in crash, to avoid it added this check here.
orchagent/intfsorch.cpp
Outdated
attr.id = SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID; | ||
if (m_nat_zone.find(port.m_alias) == m_nat_zone.end()) | ||
{ | ||
attr.value.u32 = DEFAULT_NAT_ZONE_ID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required, its the SAI default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed it.
orchagent/intfsorch.cpp
Outdated
else | ||
{ | ||
attr.value.u32 = m_nat_zone[port.m_alias]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can revisit the check as if natzone is present in configuration, set the attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, modified the code to set the attribute if it is present in cache.
sai_samplepacket_api_t* sai_samplepacket_api; | ||
sai_debug_counter_api_t* sai_debug_counter_api; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The current nexthop tracking in OA works on different SAI implementations (those which track and those which do not track the nexthop changes) though it will be sub-optimal on the SAI that tracks too. We can proceed as-is with the current OA changes till the SAI implementations converge. |
Retest this please. |
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
- Added OA and zone related changes. Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
e5acb9b
to
286ca21
Compare
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>
@AkhileshSamineni, @kirankella, |
5ad47af
@arlakshm Resolved merge conflicts. |
retest this please. |
Retest this please. |
2 similar comments
Retest this please. |
Retest this please. |
retest vs please |
retest vs please |
Add more dependencies to setup.py. Now that we're building as a wheel, we can add all dependent packages to make building and installation more robust, as there will no longer be a need to install them explicitly. Move sonic-config-engine from `install_requires` to `tests_require`, as it is only needed by the unit tests. Remove code to install fastentrypoints package if not installed. When building a Python 3 wheel using python3 setup.py bdist_wheel, we would receive a permissions error when calling easy_install.main(['fastentrypoints']), as with Python 3 it appears elevated permissions are required to install the package. This appears to be a behavior change between the Python 2 and Python 3 versions of setuptools. This needs to be explicitly installed before building the package. We are already installing in the SONiC slave container. Will update the sonic-utilities README to specify, also.
Added orchagent and zone related changes.
Link to NAT HLD : https://github.com/Azure/SONiC/blob/master/doc/nat/nat_design_spec.md
Depends on:
sonic-swss : #1059 and #1126
sonic-swss-common : sonic-net/sonic-swss-common#304
sonic-linux-kernel : sonic-net/sonic-linux-kernel#100
sonic-sairedis : sonic-net/sonic-sairedis#519 and sonic-net/sonic-sairedis#546
Signed-off-by: Akhilesh Samineni akhilesh.samineni@broadcom.com