Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition between optimize and drop #9901

Merged
merged 1 commit into from
Mar 28, 2020

Conversation

alesapin
Copy link
Member

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix race condition between drop and optimize in ReplicatedMergeTree.

@alesapin alesapin added pr-bugfix Pull request with bugfix, not backported by default no-docs-needed labels Mar 27, 2020
@alesapin
Copy link
Member Author

Tsan report without fix:

==================
WARNING: ThreadSanitizer: data race (pid=26478)
  Read of size 8 at 0x7b8000ef7b80 by thread T227 (mutexes: write M591233240794102704, write M572374417354488224):
    #0 std::__1::shared_ptr<DB::BackgroundProcessingPoolTaskInfo>::get() const /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/memory:3821:49 (clickhouse+0x107ff03e)    #1 std::__1::shared_ptr<DB::BackgroundProcessingPoolTaskInfo>::operator bool() const /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/memory:3832:62 (clickhouse+0x107ff03e)
    #2 DB::ReplicatedMergeTreeQueue::pullLogsToQueue(std::__1::shared_ptr<zkutil::ZooKeeper>, std::__1::function<void (Coordination::WatchResponse const&)>) /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:563:13 (clickhouse+0x107ff03e)
    #3 DB::ReplicatedMergeTreeMergePredicate::ReplicatedMergeTreeMergePredicate(DB::ReplicatedMergeTreeQueue&, std::__1::shared_ptr<zkutil::ZooKeeper>&) /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:1619:12 (clickhouse+0x10815b30)
    #4 DB::ReplicatedMergeTreeQueue::getMergePredicate(std::__1::shared_ptr<zkutil::ZooKeeper>&) /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/MergeTree/ReplicatedMergeTreeQueue.cpp:1308:12 (clickhouse+0x1080ed79)
    #5 DB::StorageReplicatedMergeTree::optimize(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::IAST> const&, bool, bool, DB::Context const&) /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/StorageReplicatedMergeTree.cpp:3055:61 (clickhouse+0x1052eb0b)
    #6 DB::InterpreterOptimizeQuery::execute() /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/InterpreterOptimizeQuery.cpp:29:12 (clickhouse+0xfca2dc0)
    #7 DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*, bool) /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/executeQuery.cpp:332:32 (clickhouse+0x1029ab5c)
    #8 DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, bool) /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/executeQuery.cpp:577:30 (clickhouse+0x10299bc1)    #9 DB::TCPHandler::runImpl() /home/alesap/code/cpp/ClickHouse/dbms/programs/server/TCPHandler.cpp:249:24 (clickhouse+0x953ca87)
    #10 DB::TCPHandler::run() /home/alesap/code/cpp/ClickHouse/dbms/programs/server/TCPHandler.cpp:1237:9 (clickhouse+0x9548b87)
    #11 Poco::Net::TCPServerConnection::start() /home/alesap/code/cpp/ClickHouse/contrib/poco/Net/src/TCPServerConnection.cpp:43:3 (clickhouse+0x10ff5802)
    #12 Poco::Net::TCPServerDispatcher::run() /home/alesap/code/cpp/ClickHouse/contrib/poco/Net/src/TCPServerDispatcher.cpp:114:20 (clickhouse+0x10ff60a6)
    #13 Poco::PooledThread::run() /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/src/ThreadPool.cpp:199:14 (clickhouse+0x139795c6)
    #14 Poco::(anonymous namespace)::RunnableHolder::run() /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/src/Thread.cpp:55:11 (clickhouse+0x13977c7f)
    #15 Poco::ThreadImpl::runnableEntry(void*) /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/src/Thread_POSIX.cpp:345:27 (clickhouse+0x13976447)

  Previous write of size 8 at 0x7b8000ef7b80 by thread T225 (mutexes: write M1053414654322247440):
    #0 std::__1::enable_if<(is_move_constructible<DB::BackgroundProcessingPoolTaskInfo*>::value) && (is_move_assignable<DB::BackgroundProcessingPoolTaskInfo*>::value), void>::type std::__1::swap<DB::BackgroundProcessingPoolTaskInfo*>(DB::BackgroundProcessingPoolTaskInfo*&, DB::BackgroundProcessingPoolTaskInfo*&) /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/type_traits:3697:9 (clickhouse+0x1052c2df)
    #1 std::__1::shared_ptr<DB::BackgroundProcessingPoolTaskInfo>::swap(std::__1::shared_ptr<DB::BackgroundProcessingPoolTaskInfo>&) /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/memory:4333:5 (clickhouse+0x1052c2df)
    #2 std::__1::shared_ptr<DB::BackgroundProcessingPoolTaskInfo>::reset() /home/alesap/code/cpp/ClickHouse/contrib/libcxx/include/memory:4342:18 (clickhouse+0x1052c2df)
    #3 DB::StorageReplicatedMergeTree::shutdown() /home/alesap/code/cpp/ClickHouse/dbms/src/Storages/StorageReplicatedMergeTree.cpp:2890:27 (clickhouse+0x1052c2df)
    #4 DB::InterpreterDropQuery::executeToTable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ASTDropQuery::Kind, bool, bool, bool) /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/InterpreterDropQuery.cpp:115:20 (clickhouse+0xfc211
e8)
    #5 DB::InterpreterDropQuery::execute() /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/InterpreterDropQuery.cpp:46:20 (clickhouse+0xfc20190)
    #6 DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*, bool) /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/executeQuery.cpp:332:32 (clickhouse+0x1029ab5c)
    #7 DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, bool) /home/alesap/code/cpp/ClickHouse/dbms/src/Interpreters/executeQuery.cpp:577:30 (clickhouse+0x10299bc1)
    #8 DB::TCPHandler::runImpl() /home/alesap/code/cpp/ClickHouse/dbms/programs/server/TCPHandler.cpp:249:24 (clickhouse+0x953ca87)
    #9 DB::TCPHandler::run() /home/alesap/code/cpp/ClickHouse/dbms/programs/server/TCPHandler.cpp:1237:9 (clickhouse+0x9548b87)
    #10 Poco::Net::TCPServerConnection::start() /home/alesap/code/cpp/ClickHouse/contrib/poco/Net/src/TCPServerConnection.cpp:43:3 (clickhouse+0x10ff5802)
    #11 Poco::Net::TCPServerDispatcher::run() /home/alesap/code/cpp/ClickHouse/contrib/poco/Net/src/TCPServerDispatcher.cpp:114:20 (clickhouse+0x10ff60a6)
    #12 Poco::PooledThread::run() /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/src/ThreadPool.cpp:199:14 (clickhouse+0x139795c6)
    #13 Poco::(anonymous namespace)::RunnableHolder::run() /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/src/Thread.cpp:55:11 (clickhouse+0x13977c7f)
    #14 Poco::ThreadImpl::runnableEntry(void*) /home/alesap/code/cpp/ClickHouse/contrib/poco/Foundation/src/Thread_POSIX.cpp:345:27 (clickhouse+0x13976447)

@alesapin alesapin requested a review from tavplubix March 27, 2020 10:57
queue_task_handle.reset();

{
/// Queue can trigger queue_task_handle itself. So we ensure that all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's also possible with atomic flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how? Operations with the flag are atomic, but t1 can check the flag and it will say that pointer is OK then sleep for some time after that shutdown thread t2 will be called and sets both flag and pointer to zero. After that t1 will wake up and call wake on empty pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

@alexey-milovidov alexey-milovidov merged commit e75cf7b into master Mar 28, 2020
@alexey-milovidov alexey-milovidov deleted the fix_optimize_drop_race branch March 28, 2020 01:32
akuzm pushed a commit that referenced this pull request Apr 10, 2020
Fix race condition between optimize and drop

(cherry picked from commit e75cf7b)
alesapin added a commit that referenced this pull request Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-docs-needed pr-bugfix Pull request with bugfix, not backported by default
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants