Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassis] Too many open files error and unable to connect to redis socket error #10870

Closed
SuvarnaMeenakshi opened this issue May 18, 2022 · 7 comments
Assignees
Labels
Chassis 🤖 Modular chassis support MSFT Triaged this issue has been triaged

Comments

@SuvarnaMeenakshi
Copy link
Contributor

Description

Steps to reproduce the issue:

  1. Boot up chassis and leave SNMP service running for a few days (seeing issue after 5 days of service being up)
  2. Observe the below logs in syslog in both Supervisor and Linecards.
    • Unable to connect to redis
    • Too many files open
  3. No functionality issue seen

Describe the results you received:

Error logs seen

 snmp#snmp-subagent [ax_interface] ERROR: MIBUpdater.start() caught an unexpected exception during update_data()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.7/dist-packages/ax_interface/mib.py", line 37, in start#012    self.reinit_data()#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/ietf/rfc4363.py", line 51, in reinit_data#012    mibs.init_sync_d_interface_tables, self.db_conn#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/__init__.py", line 653, in get_sync_d_from_all_namespace#012    ns_tuple = per_namespace_func(db_conn)#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/__init__.py", line 274, in init_sync_d_interface_tables#012    if_name_map_util, if_id_map_util = port_util.get_interface_oid_map(db_conn, blocking=False)#012  File "/usr/local/lib/python3.7/dist-packages/swsssdk/port_util.py", line 68, in get_interface_oid_map#012    db.connect('COUNTERS_DB')#012  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1672, in connect#012    return _swsscommon.SonicV2Connector_Native_connect(self, db_name, retry_on)#012RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
 
 snmp#snmp-subagent [ax_interface] ERROR: MIBUpdater.start() caught an unexpected exception during update_data()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.7/dist-packages/ax_interface/mib.py", line 43, in start#012    self.update_data()#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/vendor/__init__.py", line 57, in update_data#012    cpu_util = psutil.cpu_percent()#012  File "/usr/local/lib/python3.7/dist-packages/psutil/__init__.py", line 1755, in cpu_percent#012    _last_cpu_times = cpu_times()#012  File "/usr/local/lib/python3.7/dist-packages/psutil/__init__.py", line 1616, in cpu_times#012    return _psplatform.cpu_times()#012  File "/usr/local/lib/python3.7/dist-packages/psutil/_pslinux.py", line 604, in cpu_times#012    with open_binary('%s/stat' % procfs_path) as f:#012  File "/usr/local/lib/python3.7/dist-packages/psutil/_common.py", line 711, in open_binary#012    return open(fname, "rb", **kwargs)#012OSError: [Errno 24] Too many open files: '/proc/stat'
 
 ERR snmp#snmp-subagent [ax_interface] ERROR: MIBUpdater.start() caught an unexpected exception during update_data()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.7/dist-packages/ax_interface/mib.py", line 37, in start#012    self.reinit_data()#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/ietf/rfc1213.py", line 715, in reinit_data#012    self.db_conn.connect(self.db_conn.CONFIG_DB)#012  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1672, in connect#012    return _swsscommon.SonicV2Connector_Native_connect(self, db_name, retry_on)#012RuntimeError: Unable to connect to redis: Cannot assign requested address


ERR snmp#snmp-subagent [ax_interface] ERROR: MIBUpdater.start() caught an unexpected exception during update_data()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.7/dist-packages/ax_interface/mib.py", line 43, in start#012    self.update_data()#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/ietf/rfc4292.py", line 55, in update_data#012    front_ns = multi_asic.get_all_namespaces()['front_ns']#012  File "/usr/local/lib/python3.7/dist-packages/sonic_py_common/multi_asic.py", line 239, in get_all_namespaces#012    num_asics = get_num_asics()#012  File "/usr/local/lib/python3.7/dist-packages/sonic_py_common/multi_asic.py", line 119, in get_num_asics#012    with open(asic_conf_file_path) as asic_conf_file:#012OSError: [Errno 24] Too many open files: '/usr/share/sonic/platform/asic.conf'

Describe the results you expected:

Should not see any errors in syslog

Output of show version:

With master branch.

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@SuvarnaMeenakshi SuvarnaMeenakshi added the Chassis 🤖 Modular chassis support label May 18, 2022
@SuvarnaMeenakshi
Copy link
Contributor Author

@abdosi fyi

@zhangyanzhao zhangyanzhao added Triaged this issue has been triaged MSFT labels May 25, 2022
@abdosi
Copy link
Contributor

abdosi commented May 25, 2022

seen on other platforms also

@liuh-80
Copy link
Contributor

liuh-80 commented Jul 20, 2022

Get object list with multi-asic VS:

{"list": 550,
"dict": 3186,
"tuple": 2608,
"coroutine": 46,
"frame": 45,
"Future": 22,
"TimerHandle": 21,
"Context": 92,
"FutureIter": 22,
"TaskWakeupMethWrapper": 22,
"scputimes": 12,
"svmem": 1,
"wrapper_descriptor": 1184,
"builtin_function_or_method": 1588,
"method_descriptor": 947,
"classmethod_descriptor": 27,
"getset_descriptor": 925,
"member_descriptor": 382,
"weakref": 1435,
"staticmethod": 180,
"module": 276,
"SimpleNamespace": 1,
"function": 6527,
"type": 831,
"ModuleSpec": 273,
"property": 547,
"classmethod": 193,
"cell": 529,
"frozenset": 101,
"FileFinder": 31,
"set": 169,
"SourceFileLoader": 212,
"CodecInfo": 2,
"ABCMeta": 78,
"TextIOWrapper": 3,
"IncrementalDecoder": 1,
"IncrementalEncoder": 2,
"BufferedReader": 1,
"FileIO": 3,
"BufferedWriter": 2,
"_Environ": 2,
"Quitter": 2,
"_Printer": 3,
"_Helper": 1,
"Repr": 1,
"method": 163,
"itemgetter": 37,
"mappingproxy": 1,
"WeakKeyDictionary": 2,
"count": 3,
"EnumMeta": 54,
"DynamicClassAttribute": 2,
"OrderedDict": 60,
"_NamedIntConstant": 74,
"RegexFlag": 15,
"_lru_cache_wrapper": 7,
"JSONEncoder": 1,
"JSONDecoder": 1,
"Scanner": 1,
"ExtensionFileLoader": 17,
"_TemplateMetaclass": 1,
"method-wrapper": 2,
"WeakSet": 4,
"_MainThread": 1,
"Event": 6,
"Formatter": 2,
"WeakValueDictionary": 2,
"_StderrHandler": 1,
"Condition": 2,
"PercentStyle": 2,
"deque": 8,
"RootLogger": 1,
"Manager": 1,
"AddressFamily": 32,
"SocketKind": 7,
"MsgFlag": 14,
"AddressInfo": 7,
"Logger": 5,
"NullHandler": 1,
"Signals": 32,
"Handlers": 2,
"Sigmasks": 3,
"_Feature": 10,
"_SSLMethod": 6,
"Options": 12,
"AlertDescription": 27,
"SSLErrorNumber": 9,
"VerifyFlags": 5,
"VerifyMode": 3,
"TLSVersion": 7,
"Purpose": 2,
"_ASN1Object": 2,
"StrictVersion": 1,
"UUID": 4,
"SafeUUID": 3,
"PlaceHolder": 1,
"_SendfileMode": 3,
"_ParameterKind": 5,
"_RunningLoop": 1,
"Random": 1,
"IPv4Address": 31,
"IPv4Network": 19,
"IPv6Network": 28,
"ValueType": 13,
"PduTypes": 19,
"IPv6Address": 43,
"RegisteredPDU": 13,
"Reason": 6,
"Errors": 32,
"_WindowsFlavour": 1,
"_PosixFlavour": 1,
"_NormalAccessor": 1,
"attrgetter": 2,
"_NSEnum": 1,
"methodcaller": 1,
"partial": 7,
"YAMLObjectMetaclass": 1,
"SysLogHandler": 1,
"socket": 4,
"Load": 1,
"Store": 1,
"Del": 1,
"AugLoad": 1,
"AugStore": 1,
"Param": 1,
"And": 1,
"Or": 1,
"Add": 1,
"Sub": 1,
"Mult": 1,
"MatMult": 1,
"Div": 1,
"Mod": 1,
"Pow": 1,
"LShift": 1,
"RShift": 1,
"BitOr": 1,
"BitXor": 1,
"BitAnd": 1,
"FloorDiv": 1,
"Invert": 1,
"Not": 1,
"UAdd": 1,
"USub": 1,
"Eq": 1,
"NotEq": 1,
"Lt": 1,
"LtE": 1,
"Gt": 1,
"GtE": 1,
"Is": 1,
"IsNot": 1,
"In": 1,
"NotIn": 1,
"_UnixDefaultEventLoopPolicy": 1,
"hamt": 1,
"hamt_bitmap_node": 1,
"MIBMeta": 20,
"_Local": 1,
"LLDPRemoteTables": 12,
"LLDPLocalChassis": 6,
"LLDPLocalSystemDataUpdater": 1,
"MIBEntry": 12,
"LLDPRemTableUpdater": 1,
"SubtreeMIBEntry": 92,
"_UnixSelectorEventLoop": 1,
"LocPortUpdater": 1,
"LLDPLocManAddrUpdater": 1,
"SonicV2Connector": 93,
"EpollSelector": 1,
"_SelectorMapping": 1,
"SelectorKey": 2,
"Handle": 4,
"DbTables": 24,
"IfTypes": 3,
"ArpUpdater": 1,
"NextHopUpdater": 1,
"OverlayAdpaterMIBEntry": 24,
"sysNameUpdater": 1,
"PhysicalClass": 12,
"FanInfoDB": 5,
"FanDrawerInfoDB": 4,
"PhysicalRelationInfoDB": 4,
"PsuInfoDB": 8,
"XcvrInfoDB": 6,
"ThermalInfoDB": 2,
"OidMIBEntry": 24,
"PhysicalTableMIBUpdater": 1,
"DbTables32": 4,
"DbTables64": 8,
"InterfaceMIBUpdater": 1,
"RedisOidTreeUpdater": 2,
"ThermalCacheUpdater": 1,
"FanCacheUpdater": 1,
"FanDrawerCacheUpdater": 1,
"PsuCacheUpdater": 1,
"XcvrCacheUpdater": 1,
"EntitySensorDataScale": 17,
"EntitySensorStatus": 3,
"EntitySensorValueRange": 2,
"PhysicalSensorTableMIBUpdater": 1,
"RouteUpdater": 1,
"FdbUpdater": 1,
"NicDuplex": 3,
"BatteryTime": 2,
"_WrapNumbers": 1,
"IOPriority": 4,
"LLDPRemManAddrUpdater": 1,
"InterfacesUpdater": 1,
"EntitySensorDataType": 12,
"Negsignal": 32,
"Connections": 1,
"SystemUtilizationHandler": 1,
"BgpSessionUpdater": 1,
"Agent": 1,
"MIBTable": 1,
"SocketManager": 1,
"Task": 22,
"_GatheringFuture": 1,
"ThreadPoolExecutor": 1,
"SimpleQueue": 1,
"Thread": 1,
"DirectionTypes": 2,
"QueueStatUpdater": 1,
"CHASSISInfoDB": 1,
"PSUInfoDB": 2,
"PowerStatusHandler": 1,
"PubSub": 35,
"Codec": 2,
"ConfigDBConnector": 4,
"AgentX": 1,
"_SelectorSocketTransport": 1,
"KeyedRef": 1,
"PfcUpdater": 1,
"PfcPrioUpdater": 1}

ANd there are 1000+ ''anon_inode:[eventpoll]' and 600+ 'socket' opened.

For epoll is related with PubSub, for socket it related with DBConnector, however the python object count is normal.

Also following test code also show similer issue:
from swsscommon.swsscommon import SonicDBConfig
SonicDBConfig.load_sonic_global_db_config()
SonicDBConfig.get_ns_list()
from swsscommon.swsscommon import SonicV2Connector
db = SonicV2Connector(use_unix_socket_path=True, namespace='')
db.connect("CONFIG_DB")
db.get_redis_client("CONFIG_DB")
db.connect("CONFIG_DB")
db.get_redis_client("CONFIG_DB")
client = db.get_redis_client("CONFIG_DB")
client.pubsub() <== run multiple times the epool keeps increase but PubSub object destoryed

So, consider about the C++ object not destoryed by SWIG generated code. will add log to confirm this.

@liuh-80
Copy link
Contributor

liuh-80 commented Jul 21, 2022

This issue is because some method which return new object need declare as %newobject in SWIG:
https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership

Verifyed on local devbox, will submit PR and validate if the fix can work with chassis

@liuh-80
Copy link
Contributor

liuh-80 commented Jul 21, 2022

The issue can be reproduce with following code:
from swsscommon.swsscommon import SonicDBConfig
from swsscommon.swsscommon import SonicV2Connector
import gc

SonicDBConfig.load_sonic_global_db_config()
SonicDBConfig.get_ns_list()
db = SonicV2Connector(use_unix_socket_path=True, namespace='')
db.connect("CONFIG_DB")
db.get_redis_client("CONFIG_DB")
client = db.get_redis_client("CONFIG_DB")
client.pubsub()
client.pubsub()
client.pubsub()
client.newConnector(0)
client.newConnector(0)
client.newConnector(0)

gc.collect()

And there will be 3 epoll and 4 socket opened in python.
After fix this issue, ther will only 1 socket opened.

liuh-80 added a commit to sonic-net/sonic-swss-common that referenced this issue Jul 23, 2022
#### Why I did it
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released:
https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership


#### How I did it
Update swsscommon.i to decorate return new object methods with %newobject

#### How to verify it
Pass all test case.

Run following code in python and validate there is no epoll and socket leak:

from swsscommon.swsscommon import SonicDBConfig
from swsscommon.swsscommon import SonicV2Connector
import gc

SonicDBConfig.load_sonic_global_db_config()
SonicDBConfig.get_ns_list()
db = SonicV2Connector(use_unix_socket_path=True, namespace='')
db.connect("CONFIG_DB")
db.get_redis_client("CONFIG_DB")
client = db.get_redis_client("CONFIG_DB")
client.pubsub()
client.pubsub()
client.pubsub()
client.newConnector(0)
client.newConnector(0)
client.newConnector(0)

gc.collect()

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

#### Description for the changelog
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

#### Link to config_db schema for YANG module changes

#### A picture of a cute animal (not mandatory but encouraged)
@liuh-80
Copy link
Contributor

liuh-80 commented Jul 27, 2022

Found another memory leak issue in swsscommon:

void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
m_db_name = db_name;
m_key_separator = m_table_name_separator = get_db_separator(db_name);
SonicV2Connector_Native::connect(m_db_name, retry_on);

if (wait_for_init)
{
    auto& client = get_redis_client(m_db_name);
    auto pubsub = client.pubsub(); <== this is a pointer but not delete.

liuh-80 added a commit to sonic-net/sonic-swss-common that referenced this issue Aug 1, 2022
#### Why I did it
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

#### How I did it
Change DBConnector::pubsub() to return a smart pointer.

#### How to verify it
Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```
#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

#### Description for the changelog
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

#### Link to config_db schema for YANG module changes

#### A picture of a cute animal (not mandatory but encouraged)


Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
zbud-msft added a commit to renukamanavalan/sonic-swss-common that referenced this issue Aug 3, 2022
* Fix memory leak issue in ConfigDBConnector. (sonic-net#655)

#### Why I did it
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

#### How I did it
Change DBConnector::pubsub() to return a smart pointer.

#### How to verify it
Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```
#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

#### Description for the changelog
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

#### Link to config_db schema for YANG module changes

#### A picture of a cute animal (not mandatory but encouraged)


Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>

* Transfer organization from Azure to sonic-net (sonic-net#656)

Transfer organization from Azure to sonic-net

* Add docker-mux related table names  (sonic-net#627)

This PR is to add table name definitions for database entries used by docker mux processes.

Sign-off: Jing Zhang zhangjing@microsoft.com

* Add libzmq dependency

* Add test logs

* Add libzmq3-dev dependency

* Add libboost-serialization and uuid-dev dependencies

* Add boost and uuid

* Add installation before building docker

Co-authored-by: Hua Liu <58683130+liuh-80@users.noreply.github.com>
Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
Co-authored-by: Liu Shilong <shilongliu@microsoft.com>
Co-authored-by: Jing Zhang <zjsw1206@gmail.com>
Co-authored-by: Ubuntu <zain@zb-dev-vm.022x1jpnpm4u1iy2d325acts3c.yx.internal.cloudapp.net>
@SuvarnaMeenakshi
Copy link
Contributor Author

sonic-net/sonic-swss-common#655
checked with the sonic-buildimage multi-asic VS image with your latest swss-common PR, I see that the fd count is not going above this range; the issue that we were seeing in SNMP is fixed with the fix

root@vlab-08:/# ls -l /proc/24/fd | wc -l
650
root@vlab-08:/# ls -l /proc/24/fd | wc -l
650
root@vlab-08:/# ls -l /proc/24/fd | wc -l
626
root@vlab-08:/# ls -l /proc/24/fd | wc -l
626

liuh-80 added a commit to liuh-80/sonic-swss-common that referenced this issue Nov 8, 2022
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
lukasstockner pushed a commit to genesiscloud/sonic-swss-common that referenced this issue Nov 10, 2022
#### Why I did it
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released:
https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership


#### How I did it
Update swsscommon.i to decorate return new object methods with %newobject

#### How to verify it
Pass all test case.

Run following code in python and validate there is no epoll and socket leak:

from swsscommon.swsscommon import SonicDBConfig
from swsscommon.swsscommon import SonicV2Connector
import gc

SonicDBConfig.load_sonic_global_db_config()
SonicDBConfig.get_ns_list()
db = SonicV2Connector(use_unix_socket_path=True, namespace='')
db.connect("CONFIG_DB")
db.get_redis_client("CONFIG_DB")
client = db.get_redis_client("CONFIG_DB")
client.pubsub()
client.pubsub()
client.pubsub()
client.newConnector(0)
client.newConnector(0)
client.newConnector(0)

gc.collect()

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

#### Description for the changelog
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

#### Link to config_db schema for YANG module changes

#### A picture of a cute animal (not mandatory but encouraged)
lukasstockner pushed a commit to genesiscloud/sonic-swss-common that referenced this issue Nov 10, 2022
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
liuh-80 added a commit to sonic-net/sonic-swss-common that referenced this issue Nov 14, 2022
Fix memory leak issue in ConfigDBConnector:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is DBConnector::pubsub() will return a pointer, and following code call this method but never release the returned pointer:

```
void ConfigDBConnector_Native::db_connect(string db_name, bool wait_for_init, bool retry_on)
{
    m_db_name = db_name;
    m_key_separator = m_table_name_separator = get_db_separator(db_name);
    SonicV2Connector_Native::connect(m_db_name, retry_on);

    if (wait_for_init)
    {
        auto& client = get_redis_client(m_db_name);
        auto pubsub = client.pubsub(); <== this pointer not delete later.
```

Also change DBConnector::pubsub() to deprecated for none SWIG scenario.

Change DBConnector::pubsub() to return a smart pointer.

Pass all test case.

Run following code in python and validate there is no epoll and socket leak:
```
import gc
from swsscommon import swsscommon
config_db = swsscommon.ConfigDBConnector_Native()
config_db.connect()
config_db.connect()
config_db.connect()

gc.collect()
```

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>

Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
kpetremann pushed a commit to criteo-forks/sonic-swss-common that referenced this issue Jan 19, 2023
#### Why I did it
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released:
https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership


#### How I did it
Update swsscommon.i to decorate return new object methods with %newobject

#### How to verify it
Pass all test case.

Run following code in python and validate there is no epoll and socket leak:

from swsscommon.swsscommon import SonicDBConfig
from swsscommon.swsscommon import SonicV2Connector
import gc

SonicDBConfig.load_sonic_global_db_config()
SonicDBConfig.get_ns_list()
db = SonicV2Connector(use_unix_socket_path=True, namespace='')
db.connect("CONFIG_DB")
db.get_redis_client("CONFIG_DB")
client = db.get_redis_client("CONFIG_DB")
client.pubsub()
client.pubsub()
client.pubsub()
client.newConnector(0)
client.newConnector(0)
client.newConnector(0)

gc.collect()

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [x] 202111

#### Description for the changelog
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

#### Link to config_db schema for YANG module changes

#### A picture of a cute animal (not mandatory but encouraged)
liat-grozovik pushed a commit to sonic-net/sonic-swss-common that referenced this issue Feb 8, 2023
- Why I did it
Fix epoll and socket resurce leak issue:
[chassis] Too many open files error and unable to connect to redis socket error
sonic-net/sonic-buildimage#10870

The reason of this issue is in SWIG any method return a new object need decorate with %newobject, so SWIG will generate code to release C++ object when python wrapper object released:
https://www.swig.org/Doc4.0/SWIGDocumentation.html#Customization_ownership

- How I did it
Update swsscommon.i to decorate return new object methods with %newobject

- How to verify it
Pass all test case.

Run following code in python and validate there is no epoll and socket leak:

from swsscommon.swsscommon import SonicDBConfig
from swsscommon.swsscommon import SonicV2Connector
import gc

SonicDBConfig.load_sonic_global_db_config()
SonicDBConfig.get_ns_list()
db = SonicV2Connector(use_unix_socket_path=True, namespace='')
db.connect("CONFIG_DB")
db.get_redis_client("CONFIG_DB")
client = db.get_redis_client("CONFIG_DB")
client.pubsub()
client.pubsub()
client.pubsub()
client.newConnector(0)
client.newConnector(0)
client.newConnector(0)

gc.collect()

Co-authored-by: Hua Liu <58683130+liuh-80@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support MSFT Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants