-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chassis] Too many open files error and unable to connect to redis socket error #10870
Comments
@abdosi fyi |
seen on other platforms also |
Get object list with multi-asic VS: {"list": 550, ANd there are 1000+ ''anon_inode:[eventpoll]' and 600+ 'socket' opened. For epoll is related with PubSub, for socket it related with DBConnector, however the python object count is normal. Also following test code also show similer issue: So, consider about the C++ object not destoryed by SWIG generated code. will add log to confirm this. |
This issue is because some method which return new object need declare as %newobject in SWIG: Verifyed on local devbox, will submit PR and validate if the fix can work with chassis |
The issue can be reproduce with following code: SonicDBConfig.load_sonic_global_db_config() gc.collect() And there will be 3 epoll and 4 socket opened in python. |
#### Why I did it Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released: https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership #### How I did it Update swsscommon.i to decorate return new object methods with %newobject #### How to verify it Pass all test case. Run following code in python and validate there is no epoll and socket leak: from swsscommon.swsscommon import SonicDBConfig from swsscommon.swsscommon import SonicV2Connector import gc SonicDBConfig.load_sonic_global_db_config() SonicDBConfig.get_ns_list() db = SonicV2Connector(use_unix_socket_path=True, namespace='') db.connect("CONFIG_DB") db.get_redis_client("CONFIG_DB") client = db.get_redis_client("CONFIG_DB") client.pubsub() client.pubsub() client.pubsub() client.newConnector(0) client.newConnector(0) client.newConnector(0) gc.collect() #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 #### Description for the changelog Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 #### Link to config_db schema for YANG module changes #### A picture of a cute animal (not mandatory but encouraged)
Found another memory leak issue in swsscommon: void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
|
#### Why I did it Fix memory leak issue in ConfigDBConnector: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer: ``` void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on) { m_db_name = db_name; m_key_separator = m_table_name_separator = get_db_separator(db_name); SonicV2Connector_Native::connect(m_db_name, retry_on); if (wait_for_init) { auto& client = get_redis_client(m_db_name); auto pubsub = client.pubsub(); <== this pointer not delete later. ``` Also change DBConnector::pubsub() to deprecated for none SWIG scenario. #### How I did it Change DBConnector::pubsub() to return a smart pointer. #### How to verify it Pass all test case. Run following code in python and validate there is no epoll and socket leak: ``` import gc from swsscommon import swsscommon config_db = swsscommon.ConfigDBConnector_Native() config_db.connect() config_db.connect() config_db.connect() gc.collect() ``` #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 #### Description for the changelog Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 #### Link to config_db schema for YANG module changes #### A picture of a cute animal (not mandatory but encouraged) Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
* Fix memory leak issue in ConfigDBConnector. (sonic-net#655) #### Why I did it Fix memory leak issue in ConfigDBConnector: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer: ``` void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on) { m_db_name = db_name; m_key_separator = m_table_name_separator = get_db_separator(db_name); SonicV2Connector_Native::connect(m_db_name, retry_on); if (wait_for_init) { auto& client = get_redis_client(m_db_name); auto pubsub = client.pubsub(); <== this pointer not delete later. ``` Also change DBConnector::pubsub() to deprecated for none SWIG scenario. #### How I did it Change DBConnector::pubsub() to return a smart pointer. #### How to verify it Pass all test case. Run following code in python and validate there is no epoll and socket leak: ``` import gc from swsscommon import swsscommon config_db = swsscommon.ConfigDBConnector_Native() config_db.connect() config_db.connect() config_db.connect() gc.collect() ``` #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 #### Description for the changelog Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 #### Link to config_db schema for YANG module changes #### A picture of a cute animal (not mandatory but encouraged) Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net> * Transfer organization from Azure to sonic-net (sonic-net#656) Transfer organization from Azure to sonic-net * Add docker-mux related table names (sonic-net#627) This PR is to add table name definitions for database entries used by docker mux processes. Sign-off: Jing Zhang zhangjing@microsoft.com * Add libzmq dependency * Add test logs * Add libzmq3-dev dependency * Add libboost-serialization and uuid-dev dependencies * Add boost and uuid * Add installation before building docker Co-authored-by: Hua Liu <58683130+liuh-80@users.noreply.github.com> Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net> Co-authored-by: Liu Shilong <shilongliu@microsoft.com> Co-authored-by: Jing Zhang <zjsw1206@gmail.com> Co-authored-by: Ubuntu <zain@zb-dev-vm.022x1jpnpm4u1iy2d325acts3c.yx.internal.cloudapp.net>
sonic-net/sonic-swss-common#655 root@vlab-08:/# ls -l /proc/24/fd | wc -l |
Fix memory leak issue in ConfigDBConnector: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer: ``` void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on) { m_db_name = db_name; m_key_separator = m_table_name_separator = get_db_separator(db_name); SonicV2Connector_Native::connect(m_db_name, retry_on); if (wait_for_init) { auto& client = get_redis_client(m_db_name); auto pubsub = client.pubsub(); <== this pointer not delete later. ``` Also change DBConnector::pubsub() to deprecated for none SWIG scenario. Change DBConnector::pubsub() to return a smart pointer. Pass all test case. Run following code in python and validate there is no epoll and socket leak: ``` import gc from swsscommon import swsscommon config_db = swsscommon.ConfigDBConnector_Native() config_db.connect() config_db.connect() config_db.connect() gc.collect() ``` <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
#### Why I did it Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released: https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership #### How I did it Update swsscommon.i to decorate return new object methods with %newobject #### How to verify it Pass all test case. Run following code in python and validate there is no epoll and socket leak: from swsscommon.swsscommon import SonicDBConfig from swsscommon.swsscommon import SonicV2Connector import gc SonicDBConfig.load_sonic_global_db_config() SonicDBConfig.get_ns_list() db = SonicV2Connector(use_unix_socket_path=True, namespace='') db.connect("CONFIG_DB") db.get_redis_client("CONFIG_DB") client = db.get_redis_client("CONFIG_DB") client.pubsub() client.pubsub() client.pubsub() client.newConnector(0) client.newConnector(0) client.newConnector(0) gc.collect() #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 #### Description for the changelog Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 #### Link to config_db schema for YANG module changes #### A picture of a cute animal (not mandatory but encouraged)
Fix memory leak issue in ConfigDBConnector: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer: ``` void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on) { m_db_name = db_name; m_key_separator = m_table_name_separator = get_db_separator(db_name); SonicV2Connector_Native::connect(m_db_name, retry_on); if (wait_for_init) { auto& client = get_redis_client(m_db_name); auto pubsub = client.pubsub(); <== this pointer not delete later. ``` Also change DBConnector::pubsub() to deprecated for none SWIG scenario. Change DBConnector::pubsub() to return a smart pointer. Pass all test case. Run following code in python and validate there is no epoll and socket leak: ``` import gc from swsscommon import swsscommon config_db = swsscommon.ConfigDBConnector_Native() config_db.connect() config_db.connect() config_db.connect() gc.collect() ``` <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
Fix memory leak issue in ConfigDBConnector: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer: ``` void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on) { m_db_name = db_name; m_key_separator = m_table_name_separator = get_db_separator(db_name); SonicV2Connector_Native::connect(m_db_name, retry_on); if (wait_for_init) { auto& client = get_redis_client(m_db_name); auto pubsub = client.pubsub(); <== this pointer not delete later. ``` Also change DBConnector::pubsub() to deprecated for none SWIG scenario. Change DBConnector::pubsub() to return a smart pointer. Pass all test case. Run following code in python and validate there is no epoll and socket leak: ``` import gc from swsscommon import swsscommon config_db = swsscommon.ConfigDBConnector_Native() config_db.connect() config_db.connect() config_db.connect() gc.collect() ``` <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net> Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
#### Why I did it Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released: https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership #### How I did it Update swsscommon.i to decorate return new object methods with %newobject #### How to verify it Pass all test case. Run following code in python and validate there is no epoll and socket leak: from swsscommon.swsscommon import SonicDBConfig from swsscommon.swsscommon import SonicV2Connector import gc SonicDBConfig.load_sonic_global_db_config() SonicDBConfig.get_ns_list() db = SonicV2Connector(use_unix_socket_path=True, namespace='') db.connect("CONFIG_DB") db.get_redis_client("CONFIG_DB") client = db.get_redis_client("CONFIG_DB") client.pubsub() client.pubsub() client.pubsub() client.newConnector(0) client.newConnector(0) client.newConnector(0) gc.collect() #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [x] 202111 #### Description for the changelog Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 #### Link to config_db schema for YANG module changes #### A picture of a cute animal (not mandatory but encouraged)
- Why I did it Fix epoll and socket resurce leak issue: [chassis] Too many open files error and unable to connect to redis socket error sonic-net/sonic-buildimage#10870 The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released: https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership - How I did it Update swsscommon.i to decorate return new object methods with %newobject - How to verify it Pass all test case. Run following code in python and validate there is no epoll and socket leak: from swsscommon.swsscommon import SonicDBConfig from swsscommon.swsscommon import SonicV2Connector import gc SonicDBConfig.load_sonic_global_db_config() SonicDBConfig.get_ns_list() db = SonicV2Connector(use_unix_socket_path=True, namespace='') db.connect("CONFIG_DB") db.get_redis_client("CONFIG_DB") client = db.get_redis_client("CONFIG_DB") client.pubsub() client.pubsub() client.pubsub() client.newConnector(0) client.newConnector(0) client.newConnector(0) gc.collect() Co-authored-by: Hua Liu <58683130+liuh-80@users.noreply.github.com>
Description
Steps to reproduce the issue:
Describe the results you received:
Error logs seen
Describe the results you expected:
Should not see any errors in syslog
Output of
show version
:With master branch.
Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: