forked from sonic-net/sonic-swss
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing teamd cleanup #6
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@dgsudharsan please have a look at these PRs:
Also, the original code is working and very likely that you are using old codebase:
P.S: the suggested fix doesn't assume warm/fast boot scenario |
dgsudharsan
pushed a commit
that referenced
this pull request
Aug 9, 2023
**What I did** Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers **Why I did it** ``` Indirect leak of 83776 byte(s) in 476 object(s) allocated from: #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115 #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460 #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584 #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634 #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461 #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501 #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450 #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763 #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179 #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241 #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238 #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235 #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755 #14 0x555558e9be25 in main orchagent/main.cpp:766 #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) ```
dgsudharsan
pushed a commit
that referenced
this pull request
Aug 18, 2023
**What I did** Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers **Why I did it** ``` Indirect leak of 83776 byte(s) in 476 object(s) allocated from: #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115 #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460 #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584 #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634 #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461 #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501 #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450 #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763 #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179 #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241 #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238 #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235 #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755 #14 0x555558e9be25 in main orchagent/main.cpp:766 #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) ```
dgsudharsan
pushed a commit
that referenced
this pull request
Sep 22, 2023
**What I did** Fix the Mem Leak by moving the raw pointers in type_maps to use smart pointers **Why I did it** ``` Indirect leak of 83776 byte(s) in 476 object(s) allocated from: #0 0x7f0a2a414647 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x5555590cc923 in __gnu_cxx::new_allocator, std::allocator > const, referenced_object> > >::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115 #2 0x5555590cc923 in std::allocator_traits, std::allocator > const, referenced_object> > > >::allocate(std::allocator, std::allocator > const, referenced_object> > >&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460 #3 0x5555590cc923 in std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_get_node() /usr/include/c++/10/bits/stl_tree.h:584 #4 0x5555590cc923 in std::_Rb_tree_node, std::allocator > const, referenced_object> >* std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_create_node, std::allocator > const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:634 #5 0x5555590cc923 in std::_Rb_tree_iterator, std::allocator > const, referenced_object> > std::_Rb_tree, std::allocator >, std::pair, std::allocator > const, referenced_object>, std::_Select1st, std::allocator > const, referenced_object> >, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::_M_emplace_hint_unique, std::allocator > const&>, std::tuple<> >(std::_Rb_tree_const_iterator, std::allocator > const, referenced_object> >, std::piecewise_construct_t const&, std::tuple, std::allocator > const&>&&, std::tuple<>&&) /usr/include/c++/10/bits/stl_tree.h:2461 #6 0x5555590e8757 in std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >::operator[](std::__cxx11::basic_string, std::allocator > const&) /usr/include/c++/10/bits/stl_map.h:501 #7 0x5555590d48b0 in Orch::setObjectReference(std::map, std::allocator >, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*, std::less, std::allocator > >, std::allocator, std::allocator > const, std::map, std::allocator >, referenced_object, std::less, std::allocator > >, std::allocator, std::allocator > const, referenced_object> > >*> > >&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) orchagent/orch.cpp:450 #8 0x5555594ff66b in QosOrch::handleQueueTable(Consumer&, std::tuple, std::allocator >, std::__cxx11::basic_string, std::allocator >, std::vector, std::allocator >, std::__cxx11::basic_string, std::allocator > >, std::allocator, std::allocator >, std::__cxx11::basic_string, std::allocator > > > > >&) orchagent/qosorch.cpp:1763 #9 0x5555594edbd6 in QosOrch::doTask(Consumer&) orchagent/qosorch.cpp:2179 #10 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:241 #11 0x5555590c8743 in Consumer::drain() orchagent/orch.cpp:238 #12 0x5555590c8743 in Consumer::execute() orchagent/orch.cpp:235 #13 0x555559090dad in OrchDaemon::start() orchagent/orchdaemon.cpp:755 #14 0x555558e9be25 in main orchagent/main.cpp:766 #15 0x7f0a299b6d09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What I did
Optimized teamd cleanup to handle scaled portchannel configuration scenarios.
Why I did it
A process per portchannel exists in teamd which needs to be cleaned up on exit so remove port channel flow is handled. In the current logic this is done one by one. So in case of scaled configurations not all port channels get cleaned up resulting in some portchannel entries left stale.
Sep 24 14:37:18.181202 r-panther-23 NOTICE teamd#teamsyncd: :- cleanTeamSync: Cleaning up LAG teamd resources ...
Sep 24 14:37:20.448177 r-panther-23 NOTICE teamd#teammgrd: :- cleanTeamProcesses: Cleaning up LAGs during shutdown...
Sep 24 14:38:14.612553 r-panther-23 NOTICE teamd#teammgrd: :- removeLag: Stop port channel PortChannel58
Sep 24 14:38:15.140316 r-panther-23 INFO dockerd[782]: time="2021-09-24T14:38:15.138013633Z" level=info msg="Container 59cf5a968e3f38f0cfdf9b9e4799c707fb2421554fe54bc0f98703c1fdc5f724 failed to exit within 60 seconds of signal 15 - using the force"
Sep 24 14:38:15.374049 r-panther-23 INFO /container: docker cmd: wait for teamd
Sep 24 14:38:15.375029 r-panther-23 INFO /container: docker cmd: stop for teamd
Sep 24 14:38:15.375203 r-panther-23 DEBUG /container: container_stop: END
Sep 24 14:38:15.419638 r-panther-23 NOTICE admin: Stopped teamd service...
Sep 24 14:38:15.423805 r-panther-23 INFO systemd[1]: teamd.service: Succeeded.
In the above example port channels after 58 (from the map) are not cleaned up even after 60 seconds. Each system command called from the process is costly and it takes a lot of time when the scale is high.
The netlink messages for stale portchannel come up during the initialization which if ordered differently might result in STATE_DB entry being removed
Sep 24 14:39:07.533092 r-panther-23 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:PortChannel7 admin:0 oper:0 addr:98:03:9b:f6:00:00 ifindex:85 master:0 type:team
Sep 24 14:39:07.559340 r-panther-23 NOTICE swss#portsyncd: :- onMsg: nlmsg type:17 key:PortChannel7 admin:0 oper:0 addr:98:03:9b:f6:00:00 ifindex:85 master:0 type:te
Sep 24 14:39:10.350253 r-panther-23 NOTICE teamd#tlm_teamd: :- remove_lag: The LAG 'PortChannel7' has been removed.
Added logic to reduce the number of calls executed via system command.
How I verified it
Oct 8 05:13:22.074152 mts-sonic-dut NOTICE teamd#teammgrd: :- cleanTeamProcesses: Cleaning up LAGs during shutdown...
Oct 8 05:13:22.088962 mts-sonic-dut NOTICE teamd#teammgrd: :- cleanTeamProcesses: Executed the cleanup command. Waiting for cleanup complete
Oct 8 05:13:22.131145 mts-sonic-dut NOTICE teamd#teammgrd: :- cleanTeamProcesses: Waiting for 64 processes to be cleaned up
Oct 8 05:13:23.141405 mts-sonic-dut NOTICE teamd#teammgrd: :- cleanTeamProcesses: LAGs cleanup is done
Oct 8 05:13:23.142193 mts-sonic-dut NOTICE teamd#teammgrd: :- main: Exiting
Oct 8 05:13:23.666751 mts-sonic-dut INFO teamd#supervisord 2021-10-08 05:13:23,665 INFO stopped: teammgrd (exit status 0)
The entire teamd cleanup with this optimization takes just one second instead of unable to clean up even after 1 minute.
Details if related