[C++] AckGroupingTrackerEnabled may cause segmentation fault #8914

BewareMyPower · 2020-12-11T06:48:35Z

Describe the bug
Sometimes the program may crashed at AckGroupingTrackerEnabled#scheduleTimer. Though #8519 tries to solve the problem by extending the lifetime of AckGroupingTrackerEnabled so that the callback won't access the outdated this. However, the segmentation fault still happens.

A typical stack trace is:

 #6 <signal handler called>
 #7 0x00007f5aad920b60 in ?? ()
 #8 0x00007f6e9ee7d1bb in boost::asio::detail::wait_handler<pulsar::AckGroupingTrackerEnabled::scheduleTimer()::{lambda(boost::system::error_code const&)#1}>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
 from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
 #9 0x00007f6e9edd78d3 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
 from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
 #10 0x00007f6e9edd4aa6 in pulsar::ExecutorService::startWorker(std::shared_ptr<boost::asio::io_context>) ()
 from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
 #11 0x00007f6e9edd9c82 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (pulsar::ExecutorService::)(std::shared_ptr<boost::asio::io_context>)> (pulsar::ExecutorService, std::shared_ptr<boost::asio::io_context>)> ()> >::_M_run() ()
 from /opt/vertica/verticadb/v_verticadb_node0003_catalog/Libraries/0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c/PulsarSourceLib_0213a49612da8c2ad8b19aa3bd77ddec00a000000002090c.so
 #12 0x00007f6fcb5d2070 in ?? () from /lib64/libstdc++.so.6
 #13 0x00007f6fcb006dd5 in start_thread () from /lib64/libpthread.so.0
 #14 0x00007f6fca923ead in clone () from /lib64/libc.so.6

To Reproduce
It cannot be reproduced easily. The running environment is that a Client is long lived, and many Readers are periodly created and used to read some messages.

Expected behavior
The segmentation fault should not happen.

Additional context
A solution that may work is refactoring the timer design. Currently, the deadline timer is recreated each time in the callback. And there's no state check like PartitionedConsumerImpl::partitionsUpdateTimer_:

void PartitionedConsumerImpl::runPartitionUpdateTask() {
    partitionsUpdateTimer_->expires_from_now(partitionsUpdateInterval_);
    partitionsUpdateTimer_->async_wait(
        std::bind(&PartitionedConsumerImpl::getPartitionMetadata, shared_from_this()));
}

void PartitionedConsumerImpl::getPartitionMetadata() {
    using namespace std::placeholders;
    lookupServicePtr_->getPartitionMetadataAsync(topicName_)
        .addListener(std::bind(&PartitionedConsumerImpl::handleGetPartitions, shared_from_this(), _1, _2));
}

void PartitionedConsumerImpl::handleGetPartitions(Result result,
                                                  const LookupDataResultPtr& lookupDataResult) {
    Lock stateLock(mutex_);
    if (state_ != Ready) {
        // NOTE: when consumer is not ready, the runPartitionUpdateTask won't be scheduled
        return;
    }
    /* do the real work... */
    runPartitionUpdateTask();
}

However, we still need to give a detail explanation for the stack trace that's mentioned before.

The text was updated successfully, but these errors were encountered:

tisonkun · 2022-12-09T12:21:37Z

Closed as stale. The development of the C++ client has been permanently moved to http://github.com/apache/pulsar-client-cpp. Please open an issue there if it's still relevant.

BewareMyPower added the type/bug The PR fixed a bug or issue reported a bug label Dec 11, 2020

sijie mentioned this issue Dec 11, 2020

ISSUE-8914: [C++] AckGroupingTrackerEnabled may cause segmentation fault streamnative/pulsar-archived#1868

Open

codelipenghui assigned BewareMyPower Dec 14, 2020

codelipenghui added the lifecycle/stale label Mar 4, 2022

tisonkun closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2022

BewareMyPower mentioned this issue Feb 3, 2023

[fix] Avoid resource leakage of AckGroupingTracker apache/pulsar-client-cpp#185

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] AckGroupingTrackerEnabled may cause segmentation fault #8914

[C++] AckGroupingTrackerEnabled may cause segmentation fault #8914

BewareMyPower commented Dec 11, 2020

tisonkun commented Dec 9, 2022

[C++] AckGroupingTrackerEnabled may cause segmentation fault #8914

[C++] AckGroupingTrackerEnabled may cause segmentation fault #8914

Comments

BewareMyPower commented Dec 11, 2020

tisonkun commented Dec 9, 2022