Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

rpc: fix asio bug that causes coredump #263

Merged
merged 1 commit into from
Jun 24, 2019

Conversation

neverchanje
Copy link
Contributor

@neverchanje neverchanje commented Jun 20, 2019

Fixed apache/incubator-pegasus#307.
The core problem is when the socket is reading/writing, the other threads which encounter a network error will close that socket, and may cause its internal descriptor_data to be null, then the thread w/r using the wild pointer will coredump in this case.
The solution is to add write-lock on close and read-lock on r/w, so that will ensure thread-safety of the Boost-ASIO socket.

@qinzuoyan
Copy link
Member

asio_rpc_session::set_options()里面也要加write_lock

@neverchanje
Copy link
Contributor Author

set_options 只有初始化的时候会调,不需要加锁

@qinzuoyan
Copy link
Member

不一定,如果是client端的socket,set_options()在_socket->async_connect()的回调函数里面被调用,此时有可能在其他线程调用close(),还是有潜在风险的。
或者在asio_rpc_session::connect()里面调用set_options()是多余的?

@neverchanje
Copy link
Contributor Author

neverchanje commented Jun 20, 2019

set_options 在 server 端只有这里用到了

void asio_rpc_session::connect()
{
    if (set_connecting()) {
        boost::asio::ip::tcp::endpoint ep(boost::asio::ip::address_v4(_remote_addr.ip()),
                                          _remote_addr.port());

        add_ref();
        _socket->async_connect(ep, [this](boost::system::error_code ec) {
            if (!ec) {
                dinfo("client session %s connected", _remote_addr.to_string());

                set_options();
                set_connected();
                on_send_completed();
                start_read_next();
            } else {
                derror("client session connect to %s failed, error = %s",
                       _remote_addr.to_string(),
                       ec.message().c_str());
                on_failure(true);
            }
            release_ref();
        });
    }
}

只有 set_connecting 返回 true 才会 async_connect,才会 set_options
set_connecting 这个函数是加锁的,所以一定只有一个线程 connect。
所以 set_options 不需要考虑多线程,如果这个连接有别的线程,那 set_connecting 应该返回 false。

@qinzuoyan
Copy link
Member

set_connecting() 加锁并不影响set_options(),两者在asio_rpc_session::connect()中是被平行调用的关系,且一个在回调函数前,一个在回调函数内。
考虑如下场景:

  1. client发起连接
  2. 连接成功,在某个线程中调用_socket->async_connect()的回调函数,调用set_options()
  3. 在另外一个线程中,企图关闭这个session(先不管为什么关闭,既然close()是public函数,就有这种可能),主动调用asio_rpc_session::close()

src/core/tools/common/asio_rpc_session.cpp Outdated Show resolved Hide resolved
src/core/tools/common/asio_rpc_session.cpp Outdated Show resolved Hide resolved
src/core/tools/common/asio_rpc_session.cpp Outdated Show resolved Hide resolved
src/core/tools/common/asio_rpc_session.cpp Outdated Show resolved Hide resolved
public:
virtual void connect() override;
~asio_rpc_session() override;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加这么多空行干什么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个应该是 convention? public functions 之间加 new line,留白给以后加注释。

@neverchanje neverchanje added the type/bug-fix This PR fixes a bug. label Jun 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
1.11.5 type/bug-fix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pegasus coredump at boost asio
3 participants