-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Fix for cleos and keosd race condition #8671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better approach would be something like Matt's old PR #5705.
You can avoid the issues with threading that he mentioned with something along these lines:
- open a pipe
- fork
3.1) in the child, start appbase, then close the pipe.
3.2) in the parent, wait on the pipe for EOF and then exit.
programs/cleos/main.cpp
Outdated
try_local_port(2000); | ||
|
||
std::string keos_msg; | ||
constexpr size_t TOTOAL_RETRIES = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TOTAL.
programs/cleos/main.cpp
Outdated
for(; tries<TOTOAL_RETRIES; ++tries) | ||
{ | ||
std::this_thread::sleep_for(std::chrono::milliseconds(KEOS_WAIT_TIME_MS)); | ||
std::getline(keos_out, keos_msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getline is blocking. There's no need to wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It behaves as syncing, if keosd returns -1 before (daemonizing) sending "keosd::config::keosd_started_msg", pstream will make getline return immediately. Then, cleos knows that retry fails, print out err msg, then move on, no wait.
if keosd is already there before cleos runs, cleos won't wait. Otherwise, cleos starts keosd, and wait for keosd finishing startup (right before daemonizing). For any reason, if keosd hangs within startup, cleos will hang there too.
If cleos should run independently from keosd, then correct, will change to non-block read instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new approach is making http_plug startup yield to wallet-api startup. So, no change here.
The problem seems to me is that the cleos does not know when keosd finishes startup (before daemonizing). not sure why using fork and pipe helps sync the state. current pstream is the wrapper around pipe between parent and child. Using async read will make cleos and keosd independent. |
Did you look at #5705? I meant to use fork at startup in keosd.
You shouldn't abuse stdout for this synchronization. keosd may write other things to stdout and will break if it's a pipe whose other end is closed instead of /dev/null. |
thanks steve, only saw matt's issue description but did not see his PR. understand what you mean now. |
chatted with @spoonincode, and suggestion is focusing on why it is broken and not before, could be related to some subtle asio priority issue. At the same time, chatted with @swatanabe-b1, and could be the issue is always there. I will do more research on this before further action. |
From code, the issue is cleos thinks keosd is ready when http port is pingable. But, very often wallet api has not finished loading yet when port is pingable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let Steven comment on approach. These are mainly style comments.
programs/cleos/CLI11.hpp
Outdated
static constexpr uint32_t WAIT_DURATION_MS = 8000; | ||
}; | ||
} // cleos | ||
} // eosio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not go in this file. This file is copied from an open source project.
programs/cleos/main.cpp
Outdated
if (!cleos::KeosdSignalHndl::wait_keosd_start()) | ||
{ | ||
std::cerr << "Failed to launch " << binPath << std::endl; | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow code style found in files you modify.
programs/cleos/main.cpp
Outdated
@@ -77,6 +77,7 @@ Usage: ./cleos create account [OPTIONS] creator name OwnerKey ActiveKey | |||
#include <fc/exception/exception.hpp> | |||
#include <fc/variant_object.hpp> | |||
#include <fc/static_variant.hpp> | |||
#include <thread> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include not needed.
programs/keosd/main.cpp
Outdated
@@ -43,12 +60,23 @@ int main(int argc, char** argv) | |||
.default_unix_socket_path = keosd::config::key_store_executable_name + ".sock", | |||
.default_http_port = 0 | |||
}); | |||
const int parent_id = get_parent_pid(argc, argv); | |||
const int argc_no_parent_id = (parent_id > 0) ? argc - 2 : argc; | |||
char** const argv_no_parent_id = (parent_id > 0) ? &argv[2] : argv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very fragile. What if it is not the first argument?
programs/keosd/main.cpp
Outdated
{ | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use consistent code style found used in files you modify.
programs/cleos/CLI11.hpp
Outdated
|
||
namespace eosio { | ||
namespace cleos { | ||
struct KeosdSignalHndl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use snake case for all struct/classes. Follow code style found in existing code.
programs/cleos/CLI11.hpp
Outdated
return (ready_signal_received == KEOSD_READY); | ||
} | ||
static state ready_signal_received; | ||
static constexpr uint32_t WAIT_DURATION_MS = 8000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All caps is used for macros not constants.
programs/cleos/CLI11.hpp
Outdated
namespace eosio { | ||
namespace cleos { | ||
struct KeosdSignalHndl { | ||
enum state : size_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use enum class
programs/cleos/CLI11.hpp
Outdated
enum state : size_t | ||
{ | ||
KEOSD_NOT_READY = 0, | ||
KEOSD_READY = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use all caps for constants
programs/cleos/CLI11.hpp
Outdated
} | ||
return (ready_signal_received == KEOSD_READY); | ||
} | ||
static state ready_signal_received; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init variable
The current approach, as cheesy at it is, has worked correctly in the past. From a quick code review, I am a little suspicious that the http thread pool may have introduced this problem. The thread_pool starts event handling immediately, so when the HTTP server starts listening in http_pligin::plugin_startup() it is able to service incoming requests immediately. However, it's not until wallet_api_plugin::plugin_startup() that the wallet endpoints are registered. I'm pretty sure http_plugin startup will always run before wallet_api_plugin::plugin_startup(), so it's ready to serve requests before wallet endpoints are registered. Previously back when there was no http thread pool, the event loop http plugin was using wouldn't have been pumped until all the plugin startup()s had been called. I wonder if we simply moved the Though, fwiw, I'm much in favor of a better startup solution. But the approach in this PR and the previous daemonization approach I did don't support win32. That's something that in my opinion we need to be mindful about. |
Well, if the above theory is really the problem, probably having http_plugin |
@spoonincode made a quick try, your theory works. Let me see how to make the http_plugin post() |
programs/cleos/CLI11.hpp
Outdated
} | ||
|
||
static void handle_keosd_start_signal(int) { | ||
ready_signal_received = KEOSD_READY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a data race. See [intro.races]
"Two actions are potentially concurrent if... they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation."
"The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks, it is good reading.
programs/keosd/main.cpp
Outdated
|
||
if (parent_id > 0) | ||
{ | ||
const int ret_sig_que = kill(parent_id, SIGUSR1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the parent has already exited?
What happens if keosd initialization fails (e.g. someone else launched keosd concurrently).
Added post to delay the threadpool and related port initialization in http_plugin's startup. If any exception during delayed initialization, shutdown the app. Tested on my mac desktop, seems working well. Please let me know any other tests or test cases. |
plugins/http_plugin/http_plugin.cpp
Outdated
if(my->listen_endpoint) { | ||
app().post(appbase::priority::high, [this] () | ||
{ | ||
my->thread_pool.emplace( "http", my->thread_pool_size ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved into the try-catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
The new approach is to make http start_up yield to wallet api start_up, no change on cleos side
@swatanabe-b1 could you please help to give a review of this new approach. @heifner asks your review on the approach (even it was for the previous one), |
These changes were supposed to fix the problem of keosd auto-start by cleos, which manifests itself in the following error message: $ ./programs/cleos/cleos wallet list .../programs/keosd/keosd" launched Unable to connect to keosd, if keosd is running please kill the ...process and try again. I cannot confirm that the fix works, but the patch from #8671 is applied verbatim without conflicts.
Change Description
issue: #8614
When keosd is not running, cleos will start keosd, then immediately try to connect to its socket, then fails because wallet_api has not started yet right after http port starts taking incoming traffic.
The fix is to let http_plugin startup yield to others' (including wallet api) startup finish.
Consensus Changes
API Changes
Documentation Additions