-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config set
command to modify corresponding redis nodes dynamically
#95
Conversation
Could you add some tests for this feature please? |
Thanks for the pull request. |
config.node need by modified in config_add, update_slots and setremotes. all reading stages accquire a read lock using a rwlock, is this acceptable? |
When worker threads are processing commands, they will choose a redis connection and may access |
fine. I will make a lock-free implement then. |
now it's lock free impl. is it necessary to write a separately test? |
@@ -54,6 +55,8 @@ static const char *rep_auth_not_set = "-ERR Client sent AUTH, but no password is | |||
struct cmd_item cmds[] = {CMD_DO(CMD_BUILD_MAP)}; | |||
const size_t CMD_NUM = sizeof(cmds) / sizeof(struct cmd_item); | |||
static struct dict command_map; | |||
//need not destroy in linux | |||
static pthread_mutex_t lock_setremotes = PTHREAD_MUTEX_INITIALIZER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it's ok not to destroy mutex, I think we should do it explicitly. Not destroying it is confusing.
@@ -453,6 +456,60 @@ int cmd_proxy(struct command *cmd, struct redis_data *data) | |||
|
|||
if (strcasecmp(type, "INFO") == 0) { | |||
return cmd_proxy_info(cmd); | |||
} else if (strcasecmp(type, "SETREMOTES") == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using the CONFIG SET command. It's more intuitive to use redis command to telll the user what exactly this command do. In fact I didn't realize what SETREMOTES
is until I read the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure to change this?
because the PR in redisctl almost be merged so we must decide it quickly.
CONFIG SET NODE HOST1 PORT1 HOST2 PORT2 ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this cmd is implemented to support operations. It's mainly for REDIS-CTL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because later we may let more config options changable in run time. I think we should keep all this to only one CONFIG SET
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, no problem
the cmd's syntax will be "CONFIG SET NODE ip:port,ip1:port1", is that ok?
and i want to figure out if there is a method to check "cluster_ok" in proxy, need by redisctl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONFIG SET NODE ip port ip port...
is easier to parse. And I suggest NODE
should not be case sensitive.
What does cluster_ok
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"CONFIG SET NODE ip:port,ip1:port1" is that i want to reuse config_add and easy to understand comparing to config syntax, is this ok? Maybe we will reuse more in config_add
when we add more functions.
there's a polling monitor in REDIS-CTL and it tries to fetch the status of cluster in proxy. The value will turn to false when cluster is down or update-slots-thread fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- OK.
- Corvus doesn't check the status of cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ok. Maybe i will always set this to "ok" for corvus proxy.
//lock global | ||
pthread_mutex_lock(&lock_setremotes); | ||
if (data->elements % 2 != 0 || data->elements < 4) { | ||
cmd_mark_fail(cmd, rep_addr_err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct your indention.
@@ -453,6 +456,60 @@ int cmd_proxy(struct command *cmd, struct redis_data *data) | |||
|
|||
if (strcasecmp(type, "INFO") == 0) { | |||
return cmd_proxy_info(cmd); | |||
} else if (strcasecmp(type, "SETREMOTES") == 0) { | |||
//lock global | |||
pthread_mutex_lock(&lock_setremotes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The critical section here is too large. A smaller one provides better performance and makes it more easier to understand what it protects.
} | ||
if (pos_to_str(&data->element[i + 1].pos, port_s) == CORVUS_ERR) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the input is illegal, reject it with error message instead of just ignoring it.
} | ||
{ | ||
struct node_conf *newnode = cv_malloc(sizeof(struct node_conf)); | ||
memset(newnode, 0, sizeof(struct node_conf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cv_calloc
is better.
for (i = 0; i < config.node.len; i++) { | ||
server = conn_get_server_from_pool(ctx, &config.node.addr[i], false); | ||
struct node_conf *node = config.node; | ||
conf_node_inc_ref(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is race condition here. After you get node
, other threads may call conf_node_dec_ref
and destroy it which will cause corruption in conf_node_inc_ref(node)
. We do need lock here and this is not likely to bring notable performance problem since the critical section here is so small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid locking, is it possible to increase refcount first then get pointer?
{
ATOMIC_INC(ref)
return ref
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No... Before you increase reference count, you have to get its address first. But between them, other threads may free the content of that address. ATOMIC_INC does not guarantee the address is safe to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with a mutex lock and change inc() to return a pointer
@@ -83,7 +84,9 @@ int main(int argc, const char *argv[]) | |||
build_contexts(); | |||
struct context *contexts = get_contexts(); | |||
|
|||
memcpy(&config.node, &conf, sizeof(config.node)); | |||
config.node = cv_malloc(sizeof(struct node_conf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you used valgrind
to check whether there is memory leak in test program?
Please add more complete tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, checked for newest version and change all malloc to calloc
15393e3
to
8d187eb
Compare
hi, there is a new feature is needed to support running in dockers: is that in the corvus road map? and surely some basic configurations' modifying in runtime can be made together. |
We originally don't have plans for overriding configurations on the fly. But it's indeed a must have feature for running in docker etc. It's quite honoured to have your pull request. In regard to hot configuration modifying, we currently lack enough man power to fully test this feature, so it'll be sometime before we're fully confident to merge this request. But
seems a simple feature, your pull request is quite welcome. |
@wooparadog |
when would this PR be merged ... Now corvus automatically discovers redis nodes, and when I use |
@tonicbupt what kind of cluster do you deploy? generally multi (M-S) or more slaves in a "slot group" may not cause such problem. The slave of the broken master should become master automatically. And once a request failed corvus would to update the mapping once. @wooparadog am i right? I just say it according to the debug log and i remember having done such experiment before. |
Corvus will redirect and retry when receive |
@jasonjoo2010 here's how I reproduce this issue: |
@tonicbupt In your case corvus will not redirect |
@doyoubi |
@doyoubi maybe i know what tonic is saying about. when he remote a node from cluster using ruskit, the node removed is still accessable (no conn err and no redirect error), so corvus will not retry during this request. All other request will succeed. maybe corvus only update its map with a update thread timeout(does it exists?) |
@tonicbupt I reproduce your case and get |
@doyoubi actually I mean after removing nodes, |
@tonicbupt Of course, corvus will find the new node by redirecting according to |
@jasonjoo2010 Hey, Just in case that you forgot this. This feature is officially in our schedule, there are still changes requested, if you are not active on this PR anymore, we'll continue your work by ourselves. |
@tevino Or is there any more reviews? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to your contribution. Sorry for leaving this PR for such a long time.
@@ -54,7 +59,7 @@ static const char *rep_auth_not_set = "-ERR Client sent AUTH, but no password is | |||
struct cmd_item cmds[] = {CMD_DO(CMD_BUILD_MAP)}; | |||
const size_t CMD_NUM = sizeof(cmds) / sizeof(struct cmd_item); | |||
static struct dict command_map; | |||
|
|||
static pthread_mutex_t lock_config = PTHREAD_MUTEX_INITIALIZER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this lock used to guarantee that there is always only one thread modifying the config.node? I think we can do this in another place. See the reason below.
pthread_mutex_lock(&lock_config); | ||
if (strcasecmp(option, "NODE") == 0) { | ||
// config set node host:port,host1:port1 | ||
char value[data->element[3].pos.str_len + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the length of element
may be less than 4. I think we should add check before this.
if (data->elements != 4) { | ||
cmd_mark_fail(cmd, rep_config_parse_err); | ||
} else if (data->element[3].pos.str_len | ||
< 9|| pos_to_str(&data->element[3].pos, value) != CORVUS_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add some comment about that 9
?
|
||
void config_init() | ||
{ | ||
memset(config.cluster, 0, CLUSTER_NAME_SIZE + 1); | ||
strncpy(config.cluster, "default", CLUSTER_NAME_SIZE); | ||
|
||
config.bind = 12345; | ||
memset(&config.node, 0, sizeof(struct node_conf)); | ||
config.node = cv_calloc(1, sizeof(struct node_conf)); | ||
memset(config.node, 0, sizeof(struct node_conf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to memset
again.
if (socket_parse_ip(p, &config.node.addr[config.node.len]) == -1) { | ||
cv_free(config.node.addr); | ||
return -1; | ||
addr = cv_realloc(addr, sizeof(struct address) * (addr_cnt + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct your indention.
newnode->len = addr_cnt; | ||
newnode->refcount = 1; | ||
struct node_conf *oldnode = config.node; | ||
config.node = newnode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we should add lock to protect the swapping pointers here. This is corresponding to the locking in conf_node_inc_ref
;
Think about this case:
(1) oldnode
get its value.
(2) conf_node_dec_ref
is called somewhere else and destroy this oldnode
.
Well, even though once initialization, config.node will only be modified by cmd_config
protected by lock_config
, conf_node_dec_ref
also exists in conn_get_raw_server which may be confusing - Is this place will free the current config.node?
Further the the critical section protected by lock_config
is a little bit long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node_dec_ref has to be reserved to make things simple if you just take it as "i dont want to use it anymore".
lock_config has been removed and it considered to be a config cmd lock. but no logic cannot be multi-thead currently i just remove it now.
i add same lock in pointer swapping.
void conf_node_dec_ref(struct node_conf *node) | ||
{ | ||
pthread_mutex_lock(&lock_conf_node); | ||
int refcount = ATOMIC_DEC(node->refcount, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can safely remove the lock here. Thanks to the atomic operation multiple calling of conf_node_dec_ref
will end up with only one zero refcount
;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, so just remove it
return CORVUS_ERR; | ||
} | ||
if (strcasecmp(type, "SET") == 0) { | ||
// `config set` generelly need global lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is outdated now.
p = strtok(NULL, ","); | ||
} | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to correct your indention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And remember to fix conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is that new PR mainly about? Is that a new feature?
I recommend fixing the conflicts here and creating a separate PR for new features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, conflicts have been resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@doyoubi OK, now
did these slowly due to poor internet trafic
pthread_mutex_lock(&lock_conf_node); | ||
struct node_conf *oldnode = config.node; | ||
config.node = newnode; | ||
conf_node_dec_ref(oldnode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conf_node_dec_ref(oldnode);
can be safely moved outside of the critical section.
node_list.len++; | ||
} | ||
conf_node_dec_ref(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move line 40 and 46 outside of the critical section.
} | ||
} | ||
} else { | ||
cmd_mark_fail(cmd, rep_addr_err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you want here is rep_config_err
.
8269e60
to
a1642dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning rep_addr_err
is a little bit confusing in line 551 of command.c.
Others LGTM.
cmd_mark_fail(cmd, rep_addr_err); | ||
} | ||
} else if (strcasecmp(type, "GET") == 0) { | ||
if (strcasecmp(option, "NODE") == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better check data->elements
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, sorry about changes after merging.
I change the assert for elements judging ensure that.
proxy setremotes
commandconfig set
command to modify corresponding redis nodes dynamically
Thanks for your contribution @jasonjoo2010 . We will finally merge this PR after we finish testing. |
@jasonjoo2010 Thanks for your work! 😋 |
🍻 |
for redis-ctl monitor system
add setremotes support for op purpose