Add `config set` command to modify corresponding redis nodes dynamically #95

jasonjoo2010 · 2016-10-14T16:46:22Z

for redis-ctl monitor system
add setremotes support for op purpose

wooparadog · 2016-10-17T02:37:49Z

Could you add some tests for this feature please?

doyoubi · 2016-10-17T02:40:01Z

Thanks for the pull request.
Note that there are some other places using config.node and they also need lock.
Further, using lock here will block all worker threads when processing the setremotes command. Maybe you need to test whether the blocking time is acceptable. If not, consider using pointer and reference count to implement a lock free method.

jasonjoo2010 · 2016-10-17T03:44:41Z

config.node need by modified in config_add, update_slots and setremotes.

all reading stages accquire a read lock using a rwlock, is this acceptable?

doyoubi · 2016-10-17T07:14:05Z

When worker threads are processing commands, they will choose a redis connection and may access config.node where the whole worker thread is possibly be blocked by setremotes and wait until it finish updating config.node. I think you should test whether this will be a performance problem.

jasonjoo2010 · 2016-10-17T07:19:39Z

fine.

I will make a lock-free implement then.

jasonjoo2010 · 2016-10-17T11:31:51Z

now it's lock free impl.
and test's main has been updated.

is it necessary to write a separately test?

doyoubi · 2016-10-17T15:40:33Z

src/command.c

@@ -54,6 +55,8 @@ static const char *rep_auth_not_set = "-ERR Client sent AUTH, but no password is
 struct cmd_item cmds[] = {CMD_DO(CMD_BUILD_MAP)};
 const size_t CMD_NUM = sizeof(cmds) / sizeof(struct cmd_item);
 static struct dict command_map;
+//need not destroy in linux
+static pthread_mutex_t lock_setremotes = PTHREAD_MUTEX_INITIALIZER;


Even if it's ok not to destroy mutex, I think we should do it explicitly. Not destroying it is confusing.

doyoubi · 2016-10-17T15:47:24Z

src/command.c

@@ -453,6 +456,60 @@ int cmd_proxy(struct command *cmd, struct redis_data *data)

    if (strcasecmp(type, "INFO") == 0) {
        return cmd_proxy_info(cmd);
+    } else if (strcasecmp(type, "SETREMOTES") == 0) {


Consider using the CONFIG SET command. It's more intuitive to use redis command to telll the user what exactly this command do. In fact I didn't realize what SETREMOTES is until I read the code.

sure to change this?
because the PR in redisctl almost be merged so we must decide it quickly.

CONFIG SET NODE HOST1 PORT1 HOST2 PORT2 ... ?

this cmd is implemented to support operations. It's mainly for REDIS-CTL.

Yes, because later we may let more config options changable in run time. I think we should keep all this to only one CONFIG SET command.

ok, no problem
the cmd's syntax will be "CONFIG SET NODE ip:port,ip1:port1", is that ok?

and i want to figure out if there is a method to check "cluster_ok" in proxy, need by redisctl.

CONFIG SET NODE ip port ip port... is easier to parse. And I suggest NODE should not be case sensitive.
What does cluster_ok mean?

"CONFIG SET NODE ip:port,ip1:port1" is that i want to reuse config_add and easy to understand comparing to config syntax, is this ok? Maybe we will reuse more in config_add when we add more functions.

there's a polling monitor in REDIS-CTL and it tries to fetch the status of cluster in proxy. The value will turn to false when cluster is down or update-slots-thread fails.

OK.

Corvus doesn't check the status of cluster.

Ok. Maybe i will always set this to "ok" for corvus proxy.

doyoubi · 2016-10-17T15:48:06Z

src/command.c

+    	//lock global
+    	pthread_mutex_lock(&lock_setremotes);
+    	if (data->elements % 2 != 0 || data->elements < 4) {
+    		cmd_mark_fail(cmd, rep_addr_err);


Please correct your indention.

doyoubi · 2016-10-17T15:59:40Z

src/command.c

@@ -453,6 +456,60 @@ int cmd_proxy(struct command *cmd, struct redis_data *data)

    if (strcasecmp(type, "INFO") == 0) {
        return cmd_proxy_info(cmd);
+    } else if (strcasecmp(type, "SETREMOTES") == 0) {
+    	//lock global
+    	pthread_mutex_lock(&lock_setremotes);


The critical section here is too large. A smaller one provides better performance and makes it more easier to understand what it protects.

doyoubi · 2016-10-17T16:02:33Z

src/command.c

+				}
+				if (pos_to_str(&data->element[i + 1].pos, port_s) == CORVUS_ERR) {
+					continue;
+				}


When the input is illegal, reject it with error message instead of just ignoring it.

doyoubi · 2016-10-17T16:07:05Z

src/command.c

+    		}
+			{
+				struct node_conf *newnode = cv_malloc(sizeof(struct node_conf));
+				memset(newnode, 0, sizeof(struct node_conf));


I think cv_calloc is better.

doyoubi · 2016-10-17T16:24:56Z

src/connection.c

-    for (i = 0; i < config.node.len; i++) {
-        server = conn_get_server_from_pool(ctx, &config.node.addr[i], false);
+    struct node_conf *node = config.node;
+    conf_node_inc_ref(node);


Note that there is race condition here. After you get node, other threads may call conf_node_dec_ref and destroy it which will cause corruption in conf_node_inc_ref(node). We do need lock here and this is not likely to bring notable performance problem since the critical section here is so small.

to avoid locking, is it possible to increase refcount first then get pointer?

{ ATOMIC_INC(ref) return ref }

No... Before you increase reference count, you have to get its address first. But between them, other threads may free the content of that address. ATOMIC_INC does not guarantee the address is safe to use.

Done with a mutex lock and change inc() to return a pointer

doyoubi · 2016-10-17T16:28:59Z

tests/corvus_test.c

@@ -83,7 +84,9 @@ int main(int argc, const char *argv[])
    build_contexts();
    struct context *contexts = get_contexts();

-    memcpy(&config.node, &conf, sizeof(config.node));
+    config.node = cv_malloc(sizeof(struct node_conf));


Have you used valgrind to check whether there is memory leak in test program?
Please add more complete tests.

yeah, checked for newest version and change all malloc to calloc

jasonjoo2010 · 2016-10-24T07:53:36Z

hi,
how is everything going guys?

there is a new feature is needed to support running in dockers:
the configuration items can be overrided by ones specified in parameters.
eg: corvus -c corvus.conf -n 10.10.55.99:6379,10.10.55.98:6382

is that in the corvus road map?
or could i make a PR for it?

and surely some basic configurations' modifying in runtime can be made together.

wooparadog · 2016-10-24T08:23:39Z

@jasonjoo2010

We originally don't have plans for overriding configurations on the fly. But it's indeed a must have feature for running in docker etc. It's quite honoured to have your pull request. In regard to hot configuration modifying, we currently lack enough man power to fully test this feature, so it'll be sometime before we're fully confident to merge this request.

But

corvus -c corvus.conf -n 10.10.55.99:6379,10.10.55.98:6382

seems a simple feature, your pull request is quite welcome.

jasonjoo2010 · 2016-10-24T08:31:30Z

@wooparadog
fine. I would try to commit this PR soon.
good job for days and thanks for all your efforts in opening sources such a wonderful proxy.

tonicmuroq · 2016-12-05T10:29:29Z

when would this PR be merged ... Now corvus automatically discovers redis nodes, and when I use GET xxx after I deleted a node which holds value for key xxx, it returns error, and I need to retry to trigger auto discovery nodes of corvus, which I think is not better than SETREMOTES because clients may not do retry ..... = =

jasonjoo2010 · 2016-12-05T11:18:49Z

@tonicbupt what kind of cluster do you deploy? generally multi (M-S) or more slaves in a "slot group" may not cause such problem. The slave of the broken master should become master automatically. And once a request failed corvus would to update the mapping once.

@wooparadog am i right? I just say it according to the debug log and i remember having done such experiment before.

doyoubi · 2016-12-05T11:51:32Z

Corvus will redirect and retry when receive MOVED or ASK. So no need for clients to do that.
Corvus keep a list of nodes(16 at most) in memory and will update them if it find the cluster has changed.
But there are still two reasons for adding SETREMOTES:
(1) When all of the 16 nodes kept in memory are moved out from the cluster at the same time, corvus can't find correct nodes automatically any more.
(2) After several times of deleting cluster nodes, the node in config file may become outdated. Once restarted corvus can't find the correct nodes.

tonicmuroq · 2016-12-06T03:33:12Z

@jasonjoo2010 here's how I reproduce this issue:
I create 3 redis instance, say, A, B, C, and use ruskit to make them a cluster, all of them are master. then I create a corvus instance to proxy redis requests to this cluster, SET a 1, SET b 1, SET c 1, all success. then I use ruskit to remove one node, say, C, then this cluster only has two masters, A, B. now SET a 1, SET b 1 will succeed but SET c 1 will return cluster is down. Of course cluster now is down, because corvus may use C as the node to proxy request to, then if I retry SET c 1, will succeed, which means corvus updated nodes information after cluster is down. here you will see, clients may need to do second retry, which is not that easy for them... so I think SETREMOTES is really useful...

doyoubi · 2016-12-06T08:20:43Z

@tonicbupt In your case corvus will not redirect SET c 1 to node A or B. If you use ruskit delete to remove node C, all slots of C will be moved to A and B and there shouldn't be cluster is down. What's the response of CLUSTER NODES after you delete node C?
Corvus is always stateless and will never build the hash table itself. It will only get the slots table from redis node. If the cluster is really down, corvus can't redirect slots of C to A or B.

tonicmuroq · 2016-12-06T08:25:51Z

@doyoubi INFO shows A, B, C nodes in corvus. corvus didn't update its mapping when I use ruskit to remove one node, so it might pass request to the deleted node and get answer Cluster is down, then corvus tries to update its mapping using A, B, and got right result, then SET c 1 will succeed. This is the probable process I guess... I kind of hope corvus will retry automatically when it got answer Cluster is down, and this is not that possible, so SETREMOTES will be really useful, I can use SETREMOTE to update corvus' mapping manually.

jasonjoo2010 · 2016-12-06T08:46:08Z

@doyoubi maybe i know what tonic is saying about.

when he remote a node from cluster using ruskit, the node removed is still accessable (no conn err and no redirect error), so corvus will not retry during this request.

All other request will succeed.

maybe corvus only update its map with a update thread timeout(does it exists?)

doyoubi · 2016-12-06T09:09:59Z

@tonicbupt I reproduce your case and get cluster is down successfully. But CLUSTER NODES showed there were only A and B.
Actually corvus did update the mapping when it find cluster is down in a separate thread but may get its job done after the request exceeded max retry. So your first request failed while the second didn't.
But if corvus doesn't received SETREMOTES in time or it does but soon update its mapping from a redis node with outdated slots mapping, your clients still get the down error.
I think there is no such way for a stateless corvus to overcome this problem unless it keep retrying...

tonicmuroq · 2016-12-06T09:18:42Z

@doyoubi actually I mean after removing nodes, INFO shows A, B, C, and after got Cluster is down, INFO shows A, B ... anyway I got what you mean... there's no difference whether using SETREMOTES or not when remove a node. But if I add one node to cluster, will corvus automatically find the new node? I add node D to cluster and migrate slots to D, error will occur before the migration is done I guess...

doyoubi · 2016-12-06T09:32:43Z

@tonicbupt Of course, corvus will find the new node by redirecting according to MOVED and ASK.

tevino · 2017-01-03T09:44:25Z

@jasonjoo2010 Hey, Just in case that you forgot this.

This feature is officially in our schedule, there are still changes requested, if you are not active on this PR anymore, we'll continue your work by ourselves.

jasonjoo2010 · 2017-01-03T11:48:29Z

@tevino
All changes requested had been fixed already ASAP before and rebase to one commit as requested.
So you may not seen the just in position difference.

Or is there any more reviews?

doyoubi

Thanks to your contribution. Sorry for leaving this PR for such a long time.

doyoubi · 2017-01-03T12:11:49Z

src/command.c

@@ -54,7 +59,7 @@ static const char *rep_auth_not_set = "-ERR Client sent AUTH, but no password is
 struct cmd_item cmds[] = {CMD_DO(CMD_BUILD_MAP)};
 const size_t CMD_NUM = sizeof(cmds) / sizeof(struct cmd_item);
 static struct dict command_map;
-
+static pthread_mutex_t lock_config = PTHREAD_MUTEX_INITIALIZER;


Is this lock used to guarantee that there is always only one thread modifying the config.node? I think we can do this in another place. See the reason below.

doyoubi · 2017-01-03T12:15:51Z

src/command.c

+        pthread_mutex_lock(&lock_config);
+        if (strcasecmp(option, "NODE") == 0) {
+            // config set node host:port,host1:port1
+            char value[data->element[3].pos.str_len + 1];


Here the length of element may be less than 4. I think we should add check before this.

doyoubi · 2017-01-03T12:20:00Z

src/command.c

+            if (data->elements != 4) {
+                cmd_mark_fail(cmd, rep_config_parse_err);
+            } else if (data->element[3].pos.str_len
+                    < 9|| pos_to_str(&data->element[3].pos, value) != CORVUS_OK) {


Would you add some comment about that 9?

doyoubi · 2017-01-03T12:24:42Z

src/corvus.c


 void config_init()
 {
    memset(config.cluster, 0, CLUSTER_NAME_SIZE + 1);
    strncpy(config.cluster, "default", CLUSTER_NAME_SIZE);

    config.bind = 12345;
-    memset(&config.node, 0, sizeof(struct node_conf));
+    config.node = cv_calloc(1, sizeof(struct node_conf));
+    memset(config.node, 0, sizeof(struct node_conf));


No need to memset again.

doyoubi · 2017-01-03T12:25:17Z

src/corvus.c

-            if (socket_parse_ip(p, &config.node.addr[config.node.len]) == -1) {
-                cv_free(config.node.addr);
-                return -1;
+        	addr = cv_realloc(addr, sizeof(struct address) * (addr_cnt + 1));


Please correct your indention.

doyoubi · 2017-01-03T12:53:09Z

src/corvus.c

+			newnode->len = addr_cnt;
+			newnode->refcount = 1;
+			struct node_conf *oldnode = config.node;
+			config.node = newnode;


Actually we should add lock to protect the swapping pointers here. This is corresponding to the locking in conf_node_inc_ref;
Think about this case:
(1) oldnode get its value.
(2) conf_node_dec_ref is called somewhere else and destroy this oldnode.
Well, even though once initialization, config.node will only be modified by cmd_config protected by lock_config, conf_node_dec_ref also exists in conn_get_raw_server which may be confusing - Is this place will free the current config.node?
Further the the critical section protected by lock_config is a little bit long.

node_dec_ref has to be reserved to make things simple if you just take it as "i dont want to use it anymore".

lock_config has been removed and it considered to be a config cmd lock. but no logic cannot be multi-thead currently i just remove it now.

i add same lock in pointer swapping.

doyoubi · 2017-01-03T13:10:55Z

src/corvus.c

+void conf_node_dec_ref(struct node_conf *node)
+{
+    pthread_mutex_lock(&lock_conf_node);
+    int refcount = ATOMIC_DEC(node->refcount, 1);


I think we can safely remove the lock here. Thanks to the atomic operation multiple calling of conf_node_dec_ref will end up with only one zero refcount;

yeah, so just remove it

doyoubi · 2017-01-04T06:11:30Z

src/command.c

+        return CORVUS_ERR;
+    }
+    if (strcasecmp(type, "SET") == 0) {
+        // `config set` generelly need global lock


This comment is outdated now.

doyoubi · 2017-01-04T06:16:30Z

src/corvus.c

            p = strtok(NULL, ",");
        }
+		{


Remember to correct your indention.

And remember to fix conflicts.

What is that new PR mainly about? Is that a new feature?
I recommend fixing the conflicts here and creating a separate PR for new features.

yeah, conflicts have been resolved

@doyoubi OK, now

did these slowly due to poor internet trafic

doyoubi · 2017-01-04T06:19:31Z

src/corvus.c

+			pthread_mutex_lock(&lock_conf_node);
+			struct node_conf *oldnode = config.node;
+			config.node = newnode;
+			conf_node_dec_ref(oldnode);


conf_node_dec_ref(oldnode); can be safely moved outside of the critical section.

doyoubi · 2017-01-04T06:21:55Z

src/slot.c

        node_list.len++;
    }
+    conf_node_dec_ref(node);


Please move line 40 and 46 outside of the critical section.

doyoubi · 2017-01-04T06:23:53Z

src/command.c

+                }
+            }
+        } else {
+            cmd_mark_fail(cmd, rep_addr_err);


I think what you want here is rep_config_err.

doyoubi

I think returning rep_addr_err is a little bit confusing in line 551 of command.c.
Others LGTM.

doyoubi · 2017-01-04T08:59:50Z

src/command.c

+            cmd_mark_fail(cmd, rep_addr_err);
+        }
+    } else if (strcasecmp(type, "GET") == 0) {
+        if (strcasecmp(option, "NODE") == 0) {


We'd better check data->elements here.

aha, sorry about changes after merging.
I change the assert for elements judging ensure that.

doyoubi · 2017-01-04T09:39:05Z

Thanks for your contribution @jasonjoo2010 . We will finally merge this PR after we finish testing.

tevino · 2017-01-04T09:43:24Z

@jasonjoo2010 Thanks for your work! 😋

jasonjoo2010 · 2017-01-04T13:35:23Z

@doyoubi @tevino We just take it as an important online component, too. So we are also working on it.

Thank you all for the great, cute work.

tonicmuroq · 2017-01-11T06:16:56Z

🍻

wooparadog changed the title ~~add setremotes support~~ Add proxy setremotes command Oct 17, 2016

jasonjoo2010 force-pushed the master branch from 31c7761 to 5f5fc17 Compare October 17, 2016 11:23

doyoubi suggested changes Oct 17, 2016

View reviewed changes

jasonjoo2010 force-pushed the master branch 4 times, most recently from 15393e3 to 8d187eb Compare October 18, 2016 03:32

add supporting for config.node modification in runtime

f083bd2

jasonjoo2010 force-pushed the master branch from 8d187eb to f083bd2 Compare October 18, 2016 07:30

jasonjoo2010 force-pushed the master branch from 406064b to f083bd2 Compare November 10, 2016 12:47

doyoubi reviewed Jan 3, 2017

View reviewed changes

doyoubi suggested changes Jan 4, 2017

View reviewed changes

remove config lock now and add 'config get node'

a1642dd

jasonjoo2010 force-pushed the master branch 2 times, most recently from 8269e60 to a1642dd Compare January 4, 2017 08:05

Merge branch 'master' into master

88f54f7

doyoubi reviewed Jan 4, 2017

View reviewed changes

doyoubi approved these changes Jan 4, 2017

View reviewed changes

change error msg when config set unknow cmd

0f42521

doyoubi changed the title ~~Add proxy setremotes command~~ Add config set command to modify corresponding redis nodes dynamically Jan 4, 2017

jasonjoo2010 force-pushed the master branch from 378b093 to 0f42521 Compare January 4, 2017 09:34

doyoubi merged commit 0318b8b into eleme:master Jan 10, 2017

doyoubi mentioned this pull request Jan 11, 2017

Add config rewrite #104

Merged

doyoubi mentioned this pull request May 9, 2017

Bump version to v0.2.6 #126

Merged

Add config set command to modify corresponding redis nodes dynamically #95

Add config set command to modify corresponding redis nodes dynamically #95

Conversation

jasonjoo2010 commented Oct 14, 2016

wooparadog commented Oct 17, 2016

doyoubi commented Oct 17, 2016

jasonjoo2010 commented Oct 17, 2016

doyoubi commented Oct 17, 2016

jasonjoo2010 commented Oct 17, 2016

jasonjoo2010 commented Oct 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonjoo2010 commented Oct 24, 2016

wooparadog commented Oct 24, 2016

jasonjoo2010 commented Oct 24, 2016

tonicmuroq commented Dec 5, 2016

jasonjoo2010 commented Dec 5, 2016

doyoubi commented Dec 5, 2016 • edited Loading

tonicmuroq commented Dec 6, 2016

doyoubi commented Dec 6, 2016

tonicmuroq commented Dec 6, 2016

jasonjoo2010 commented Dec 6, 2016

doyoubi commented Dec 6, 2016

tonicmuroq commented Dec 6, 2016 • edited Loading

doyoubi commented Dec 6, 2016

tevino commented Jan 3, 2017 • edited Loading

jasonjoo2010 commented Jan 3, 2017

doyoubi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

doyoubi Jan 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

doyoubi left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

doyoubi commented Jan 4, 2017

tevino commented Jan 4, 2017

jasonjoo2010 commented Jan 4, 2017

tonicmuroq commented Jan 11, 2017

Add `config set` command to modify corresponding redis nodes dynamically #95

Add `config set` command to modify corresponding redis nodes dynamically #95

doyoubi commented Dec 5, 2016 •

edited

Loading

tonicmuroq commented Dec 6, 2016 •

edited

Loading

tevino commented Jan 3, 2017 •

edited

Loading

doyoubi Jan 3, 2017 •

edited

Loading

doyoubi left a comment •

edited

Loading