-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cluster mode] inconsistent errors #56
Comments
Redis configuration at the point of running: {
hosts: [
{ host: 'localhost', port: 30001 },
{ host: 'localhost', port: 30002 },
{ host: 'localhost', port: 30003 },
{ host: 'localhost', port: 30004 },
{ host: 'localhost', port: 30005 },
{ host: 'localhost', port: 30006 }
],
options: {
friendlyStackTraces: true,
lazyConnect: true
}
} Connection sequence: before(function (done) {
exec(createRedisCluster, [ 'start' ], { cwd: redisClusterCwd }, done);
});
before(function (done) {
this.timeout(10000);
exec(createRedisCluster, [ 'create' ], { cwd: redisClusterCwd }, done);
});
before(function () {
this.service = require('../lib/service.js')(overwrite);
return this.service;
});
// tests...
after(function () {
return this.service.call('close');
});
after(function (done) {
exec(createRedisCluster, [ 'stop' ], { cwd: redisClusterCwd }, done);
});
after(function (done) {
exec(createRedisCluster, [ 'clean' ], { cwd: redisClusterCwd }, done);
});
}); |
I assume that it would be this line that fails, though stack traces don't really give me anything no matter how I try https://github.com/luin/ioredis/blob/master/lib/cluster.js#L296 |
Just to clarify - operations that I perform is basically setex, set, get, hmset, hsetnx, but doubt this is of any help |
Seems like you are invoking commands before cluster is connected. I saw your pr and the changes you making would cause this error. |
Well, how do I make sure that cluster is connected? I've waited for 'ready' event, but it's emitted as soon as one node connects, but not the other. In the pr what I basically do is add 'lazyConnect' to make sure that it actually connect, but it's one and the same with the 'ready' event. Assume the following code: var redis = require('ioredis');
var instance = new redis.Cluster(hosts, { lazyConnect: true });
instance.connect().then(function () {
return instance.set(Date.now().toString(), 'somerandomvalue');
}); I bet if you do this it will eventually throw the same error. I have to emphasize that I do wait for the ready event / connect resolve on my code |
In most cases, you don't need to make sure the cluster is connected since var instance = new redis.Cluster(hosts, { lazyConnect: true });
instance.set(Date.now().toString(), 'somerandomvalue'); If
|
I'll run more tests, but I still had the same results with the lazyConnect: false, its just random and seem dependant on whether it was able to connect to a node that holds needed slots or not before the event |
Let me know if there's anything I can do :-) |
I tried turning off the lazyConnect, and still it fails from time to time instance = new redis.Cluster(config.redis.hosts, config.redis.options || {}); some redis commands
|
Ok, I've added logs to Cluster.prototype.sendCommand = function (command, stream, node) {
if (this.status === 'end') {
command.reject(new Error('Connection is closed.'));
return command.promise;
}
var targetSlot = node ? node.slot : command.getSlot();
var ttl = {};
var reject = command.reject;
var _this = this;
if (!node) {
command.reject = function (err) {
_this.handleError(err, ttl, {
moved: function (node, slot, hostPort) {
debug('command %s is moved to %s:%s', command.name, hostPort[0], hostPort[1]);
_this.slots[slot] = node;
tryConnection();
_this.refreshSlotsCache();
},
ask: function (node, slot, hostPort) {
debug('command %s is required to ask %s:%s', command.name, hostPort[0], hostPort[1]);
tryConnection(false, node);
},
clusterDown: tryConnection,
connectionClosed: tryConnection,
maxRedirections: function (redirectionError) {
reject.call(command, redirectionError);
},
defaults: function () {
reject.call(command, err);
}
});
};
}
tryConnection();
function tryConnection (random, asking) {
if (_this.status === 'end') {
command.reject(new Error('Cluster is ended.'));
return;
}
if (_this.status === 'ready') {
var redis;
if (node && node.redis) {
redis = node.redis;
} else if (_.includes(Command.FLAGS.ENTER_SUBSCRIBER_MODE, command.name) ||
_.includes(Command.FLAGS.EXIT_SUBSCRIBER_MODE, command.name)) {
redis = _this.subscriber;
} else {
if (typeof targetSlot === 'number') {
redis = _this.slots[targetSlot];
}
if (asking && !random) {
redis = asking;
redis.asking();
}
if (random || !redis) {
redis = _this.selectRandomNode();
}
}
if (node && !node.redis) {
node.redis = redis;
}
// this is where redis is undefined and crashes the system
redis.sendCommand(command, stream);
} else if (_this.options.enableOfflineQueue) {
_this.offlineQueue.push({
command: command,
stream: stream,
node: node
});
} else {
command.reject(new Error('Cluster isn\'t ready and enableOfflineQueue options is false'));
}
}
return command.promise;
}; |
Also, here is the stack trace (finally managed to get it)
|
Further debugging:
Whole trace:
|
Added keys listing to random nodes call, and as I've suspected, the nodes array is empty. Question is why.
|
@luin I assume I know what happens. During the tests I create a new cluster each time with redis-trib.rb and then connect to it. It is possible that at a certain time when ioredis requests slot information - there is none, even though connection can be established. After that it deletes all nodes and crashes. Probably this case wont really happen in production unless the whole cluster dies/resets, but I think it should be handled as an error during refresh slot call |
@AVVS That makes sense. At first glance ioredis should retry refreshing the cluster info(related option: |
Should be fixed in 1.4.0 |
@luin I've actually tested it more and pinpointed that to response from cluster that is not fully initialized. During my tests I start the cluster with redis-trib util via spawn process, then allocate the slots and its completed by the timeout, which is, of course, far from ideal. Either way, when redis receives some weird / incomplete response from a cluster, which contains 0 nodes (they were not able to connect yet) and I suppose that would still be the case for completely degraded cluster - it will make the process crash with
|
Yes, When the
|
I submitted a pr to fix this issue. I'll check it again tomorrow and welcome to give it a try if you have time. |
Cluster is created and then destroyed before and after the tests. So test runs are idempotent.
For some reason, sometimes redis cluster client instance doesn't have
sendCommand
on it and sometimes it does. I assume that context can be rewritten on retry or smth and become 'undefined'The text was updated successfully, but these errors were encountered: