Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when accessing attribute from one netref in another. #270

Closed
SergeyYanos opened this issue May 17, 2018 · 2 comments
Closed

Deadlock when accessing attribute from one netref in another. #270

SergeyYanos opened this issue May 17, 2018 · 2 comments

Comments

@SergeyYanos
Copy link

SergeyYanos commented May 17, 2018

If you have two netrefs on two connections and try to access attributes of one from the other you will get deadlock in protocol.sync_recv_and_dispatch method.

From what I was able to gather:
When we first call run_example from t1(thread 1) we enter protocol.sync_recv_and_dispatch of e1(Example obj 1) and take the _sync_lock, in addition t2 takes the _sync_lock of e2. Then t1 tries to take _sync_lock of e2 when it is accessing e2.exposed_e meanwhile t2 is doing the same with _sync_lock of e1 and e1.exposed_e. This results in deadlock because t1 holds a lock that t2 wants and won't release it until it(t1) gets the lock that t2 holds which in turn waits for t1 to release the lock it(t2) wants.

Environment
  • rpyc version=3.4.4
  • python version=2.7
  • operating system=Linux(Fedora 27).
Minimal example(sketchbook.py)
import rpyc
import logging
import threading


logging.basicConfig(level=logging.DEBUG, format='%(levelname)-8s : %(name)-10s - %(message)s')


class Example(object):
    def __init__(self, e):
        self.exposed_e = e
        self.logger = logging.getLogger('example_%s' % e)

    def run_example(self, examples):
        for _ in range(1000):
            for example in examples:
                self.logger.debug('%s accessing %s', self.exposed_e, example.exposed_e)


def main():
    logger = logging.getLogger('main')
    logger.debug('Starting test')

    host1 = '10.135.174.5'
    host2 = '10.135.175.5'
    port = '18812'

    conn1 = rpyc.classic.connect(host1, port)
    conn2 = rpyc.classic.connect(host2, port)

    examples = [conn1.modules['sketchbook'].Example(1),
                conn2.modules['sketchbook'].Example(2)]

    threads = [threading.Thread(target=e.run_example, args=(examples,)) for e in examples]

    for t in threads:
        t.start()
    logger.debug('Threads started!')

    for t in threads:
        t.join()
    logger.debug('Test finished!')


if __name__ == "__main__":
    main()
@coldfix
Copy link
Contributor

coldfix commented May 20, 2018

Hi,

thanks for reporting and analyzing the issue.

I'm not surprised that this happens. This can probably be fixed by dropping _sync_lock/_sync_event and using only _recvlock instead. However, before doing this, I will have to think about it more carefully, (because I don't fully understand the reason why sync_lock is there at all right now).

Honestly, the code base looks in many places like it was originally written with single-threaded applications in mind and the support for multithreading was only added later. This leads to some architectural deficits with many edge cases. If possible, I would recommend not sharing the same connection across multiple threads and in particular avoiding communication between two shared connections. In fact, I believe the request/reactor mechanism is broken in many places and the only real fix would be to have a reactor that runs every request in its own thread (or better greenlet for performance reasons), i.e. migrating rpyc to fully take advantage of gevent.

That being said, I will of course try to fix every issue that can be identified and easily fixed.

@coldfix
Copy link
Contributor

coldfix commented May 20, 2018

Hi, pushed a commit that fixes this particular issue (won't be released until in a few weeks though). Let me know if you encounter further problems. Best, Thomas

coldfix added a commit that referenced this issue Jun 11, 2018
This release brings a few minor backward incompatibilities, so be sure to read
on before upgrading. However, fear not: the ones that are most likely relevant
to you have a relatively simple migration path.

Backward Incompatibilities
^^^^^^^^^^^^^^^^^^^^^^^^^^

* ``classic.teleport_function`` now executes the function in the connection's
  namespace by default. To get the old behaviour, use
  ``teleport_function(conn, func, conn.modules[func.__module__].__dict__)``
  instead.

* Changed signature of ``Service.on_connect`` and ``on_disconnect``, adding
  the connection as argument.

* Changed signature of ``Service.__init__``, removing the connection argument

* no longer store connection as ``self._conn``. (allows services that serve
  multiple clients using the same service object, see `#198`_).

* ``SlaveService`` is now split into two asymetric classes: ``SlaveService``
  and ``MasterService``. The slave exposes functionality to the master but can
  not anymore access remote objects on the master (`#232`_, `#248`_).
  If you were previously using ``SlaveService``, you may experience problems
  when feeding the slave with netrefs to objects on the master. In this case, do
  any of the following:

  * use ``ClassicService`` (acts exactly like the old ``SlaveService``)
  * use ``SlaveService`` with a ``config`` that allows attribute access etc
  * use ``rpyc.utils.deliver`` to feed copies rather than netrefs to
    the slave

* ``RegistryServer.on_service_removed`` is once again called whenever a service
  instance is removed, making it symmetric to ``on_service_added`` (`#238`_)
  This reverts PR `#173`_ on issue `#172`_.

* Removed module ``rpyc.experimental.splitbrain``. It's too confusing and
  undocumented for me and I won't be developing it, so better remove it
  altogether. (It's still available in the ``splitbrain`` branch)

* Removed module ``rpyc.experimental.retunnel``. Seemingly unused anywhere, no
  documentation, no clue what this is about.

* ``bin/rpyc_classic.py`` will bind to ``127.0.0.1`` instead of ``0.0.0.0`` by
  default

* ``SlaveService`` no longer serves exposed attributes (i.e., it now uses
  ``allow_exposed_attrs=False``)

* Exposed attributes no longer hide plain attributes if one otherwise has the
  required permissions to access the plain attribute. (`#165`_)

.. _#165: #165
.. _#172: #172
.. _#173: #173
.. _#198: #198
.. _#232: #232
.. _#238: #238
.. _#248: #248

What else is new
^^^^^^^^^^^^^^^^

* teleported functions will now be defined by default in the globals dict

* Can now explicitly specify globals for teleported functions

* Can now use streams as context manager

* keep a hard reference to connection in netrefs, may fix some ``EOFError``
  issues, in particular on Jython related (`#237`_)

* handle synchronous and asynchronous requests uniformly

* fix deadlock with connections talking to each other multithreadedly (`#270`_)

* handle timeouts cumulatively

* fix possible performance bug in ``Win32PipeStream.poll`` (oversleeping)

* use readthedocs theme for documentation (`#269`_)

* actually time out sync requests (`#264`_)

* clarify documentation concerning exceptions in ``Connection.ping`` (`#265`_)

* fix ``__hash__`` for netrefs (`#267`_, `#268`_)

* rename ``async`` module to ``async_`` for py37 compatibility (`#253`_)

* fix ``deliver()`` from IronPython to CPython2 (`#251`_)

* fix brine string handling in py2 IronPython (`#251`_)

* add gevent_ Server. For now, this requires using ``gevent.monkey.patch_all()``
  before importing for rpyc. Client connections can already be made without
  further changes to rpyc, just using gevent's monkey patching. (`#146`_)

* add function ``rpyc.lib.spawn`` to spawn daemon threads

* fix several bugs in ``bin/rpycd.py`` that crashed this script on startup
  (`#231`_)

* fix problem with MongoDB, or more generally any remote objects that have a
  *catch-all* ``__getattr__`` (`#165`_)

* fix bug when copying remote numpy arrays (`#236`_)

* added ``rpyc.utils.helpers.classpartial`` to bind arguments to services (`#244`_)

* can now pass services optionally as instance or class (could only pass as
  class, `#244`_)

* The service is now charged with setting up the connection, doing so in
  ``Service._connect``. This allows using custom protocols by e.g. subclassing
  ``Connection``.  More discussions and related features in `#239`_-`#247`_.

* service can now easily override protocol handlers, by updating
  ``conn._HANDLERS`` in ``_connect`` or ``on_connect``. For example:
  ``conn._HANDLERS[HANDLE_GETATTR] = self._handle_getattr``.

* most protocol handlers (``Connection._handle_XXX``) now directly get the
  object rather than its ID as first argument. This makes overriding
  individual handlers feel much more high-level. And by the way it turns out
  that this fixes two long-standing issues (`#137`_, `#153`_)

* fix bug with proxying context managers (`#228`_)

* expose server classes from ``rpyc`` top level module

* fix logger issue on jython

.. _#137: #137
.. _#146: #146
.. _#153: #153
.. _#165: #165
.. _#228: #228
.. _#231: #231
.. _#236: #236
.. _#237: #237
.. _#239: #239
.. _#244: #244
.. _#247: #247
.. _#251: #251
.. _#253: #253
.. _#264: #264
.. _#265: #265
.. _#267: #267
.. _#268: #268
.. _#269: #269
.. _#270: #270

.. _gevent: http://www.gevent.org/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants