Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster mode] inconsistent errors #56

Closed
AVVS opened this issue Jun 2, 2015 · 19 comments
Closed

[cluster mode] inconsistent errors #56

AVVS opened this issue Jun 2, 2015 · 19 comments
Labels

Comments

@AVVS
Copy link
Contributor

AVVS commented Jun 2, 2015

Cluster is created and then destroyed before and after the tests. So test runs are idempotent.

For some reason, sometimes redis cluster client instance doesn't have sendCommand on it and sometimes it does. I assume that context can be rewritten on retry or smth and become 'undefined'

Vitalys-MacBook-Pro:mservice-users vitaly$ npm test

> mservice-users@1.0.0 test /Users/vitaly/projects/mservice-users
> BLUEBIRD_DEBUG=1 multi='list=- xunit=test-report.xml' ./node_modules/.bin/mocha -R mocha-multi --timeout 10000


20:06:56 users: connected to rabbit@Vitalys-MacBook-Pro v3.5.2
20:06:56 users: queue "amq.gen-HS1YKx5rwxfoVL6TOyEKiA" created
20:06:56 users: queue "amq.gen-HS1YKx5rwxfoVL6TOyEKiA" binded to exchange "test-mocha" on route "test.users.*"actions#register() should fail on incorrect payload: 5ms
  ․ actions#register() should create user, activate it, no captcha specified: 51ms
  ․ actions#register() should reject creating user, because it already exists: 59ms
  ․ actions#register() should fail when invalid captcha is passed: 36ms

  4 passing (4s)

Vitalys-MacBook-Pro:mservice-users vitaly$ npm test

> mservice-users@1.0.0 test /Users/vitaly/projects/mservice-users
> BLUEBIRD_DEBUG=1 multi='list=- xunit=test-report.xml' ./node_modules/.bin/mocha -R mocha-multi --timeout 10000


20:07:11 users: connected to rabbit@Vitalys-MacBook-Pro v3.5.2
20:07:11 users: queue "amq.gen-57pK5eobgqjeLt8s_IU7Cw" created
20:07:11 users: queue "amq.gen-57pK5eobgqjeLt8s_IU7Cw" binded to exchange "test-mocha" on route "test.users.*"actions#register() should fail on incorrect payload: 6ms
  1) actions#register() should create user, activate it, no captcha specified
  2) actions#register() should reject creating user, because it already exists
  3) actions#register() should fail when invalid captcha is passed

  1 passing (22s)
  3 failing

  1) actions#register() should create user, activate it, no captcha specified:
     Uncaught TypeError: Cannot call method 'sendCommand' of undefined
  ---------------------------------------------
  ---------------------------------------------
  ---------------------------------------------
      at module.exports (lib/connectors/redis.js:12:21)
      at module.exports (lib/service.js:106:9)
      at Context.<anonymous> (test/register.js:56:52)
      at exithandler (child_process.js:656:7)
      at maybeClose (child_process.js:766:16)
      at Socket.<anonymous> (child_process.js:979:11)
      at Pipe.close (net.js:466:12)

  2) actions#register() should reject creating user, because it already exists:
     Error: timeout of 10000ms exceeded. Ensure the done() callback is being called in this test.


  3) actions#register() should fail when invalid captcha is passed:
     Error: timeout of 10000ms exceeded. Ensure the done() callback is being called in this test.
// connectors/redis.js
'use strict';

var redis = require('ioredis');
var instance;

module.exports = function (config) {
    if (instance) {
        return instance;
    }

    instance = new redis.Cluster(config.redis.hosts, config.redis.options || {});
    // forked version used with lazyConnect enabled
    return instance.connect();
};
@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

Redis configuration at the point of running:

{
            hosts: [
                { host: 'localhost', port: 30001 },
                { host: 'localhost', port: 30002 },
                { host: 'localhost', port: 30003 },
                { host: 'localhost', port: 30004 },
                { host: 'localhost', port: 30005 },
                { host: 'localhost', port: 30006 }
            ],
            options: {
                friendlyStackTraces: true,
                lazyConnect: true
            }
        }

Connection sequence:

    before(function (done) {
        exec(createRedisCluster, [ 'start' ], { cwd: redisClusterCwd }, done);
    });

    before(function (done) {
        this.timeout(10000);
        exec(createRedisCluster, [ 'create' ], { cwd: redisClusterCwd }, done);
    });

    before(function () {
        this.service = require('../lib/service.js')(overwrite);
        return this.service;
    });


// tests...

    after(function () {
        return this.service.call('close');
    });

    after(function (done) {
        exec(createRedisCluster, [ 'stop' ], { cwd: redisClusterCwd }, done);
    });

    after(function (done) {
        exec(createRedisCluster, [ 'clean' ], { cwd: redisClusterCwd }, done);
    });
});

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

I assume that it would be this line that fails, though stack traces don't really give me anything no matter how I try

https://github.com/luin/ioredis/blob/master/lib/cluster.js#L296

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

Just to clarify - operations that I perform is basically setex, set, get, hmset, hsetnx, but doubt this is of any help

@luin
Copy link
Collaborator

luin commented Jun 2, 2015

Seems like you are invoking commands before cluster is connected. I saw your pr and the changes you making would cause this error.

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

Well, how do I make sure that cluster is connected?

I've waited for 'ready' event, but it's emitted as soon as one node connects, but not the other.

In the pr what I basically do is add 'lazyConnect' to make sure that it actually connect, but it's one and the same with the 'ready' event.

Assume the following code:

var redis = require('ioredis');
var instance = new redis.Cluster(hosts, { lazyConnect: true });

instance.connect().then(function () {
   return instance.set(Date.now().toString(), 'somerandomvalue');
});

I bet if you do this it will eventually throw the same error. I have to emphasize that I do wait for the ready event / connect resolve on my code

@luin
Copy link
Collaborator

luin commented Jun 2, 2015

In most cases, you don't need to make sure the cluster is connected since Redis.Cluster buffers all commands invoked before the cluster is connected and will execute them when cluster is ready. So the following codes works:

var instance = new redis.Cluster(hosts, { lazyConnect: true });
instance.set(Date.now().toString(), 'somerandomvalue');

If lazyConnect is enabled, Redis.Cluster wouldn't connect to the every node in the Cluster but a few(typical only one) of them just to get the cluster info. When a command is invoked and the target node hasn't been connected, ioredis will connent to it automatically.

ready event in Redis.Cluster simply means ioredis has gotten the info of the Cluster from one of the nodes.

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

I'll run more tests, but I still had the same results with the lazyConnect: false, its just random and seem dependant on whether it was able to connect to a node that holds needed slots or not before the event

@luin
Copy link
Collaborator

luin commented Jun 2, 2015

Let me know if there's anything I can do :-)

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

I tried turning off the lazyConnect, and still it fails from time to time

instance = new redis.Cluster(config.redis.hosts, config.redis.options || {});

some redis commands

 Uncaught TypeError: Cannot call method 'sendCommand' of undefined

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

Ok, I've added logs to

Cluster.prototype.sendCommand = function (command, stream, node) {
  if (this.status === 'end') {
    command.reject(new Error('Connection is closed.'));
    return command.promise;
  }

  var targetSlot = node ? node.slot : command.getSlot();
  var ttl = {};
  var reject = command.reject;
  var _this = this;
  if (!node) {
    command.reject = function (err) {
      _this.handleError(err, ttl, {
        moved: function (node, slot, hostPort) {
          debug('command %s is moved to %s:%s', command.name, hostPort[0], hostPort[1]);
          _this.slots[slot] = node;
          tryConnection();
          _this.refreshSlotsCache();
        },
        ask: function (node, slot, hostPort) {
          debug('command %s is required to ask %s:%s', command.name, hostPort[0], hostPort[1]);
          tryConnection(false, node);
        },
        clusterDown: tryConnection,
        connectionClosed: tryConnection,
        maxRedirections: function (redirectionError) {
          reject.call(command, redirectionError);
        },
        defaults: function () {
          reject.call(command, err);
        }
      });
    };
  }
  tryConnection();

  function tryConnection (random, asking) {
    if (_this.status === 'end') {
      command.reject(new Error('Cluster is ended.'));
      return;
    }
    if (_this.status === 'ready') {
      var redis;
      if (node && node.redis) {
        redis = node.redis;
      } else if (_.includes(Command.FLAGS.ENTER_SUBSCRIBER_MODE, command.name) ||
                 _.includes(Command.FLAGS.EXIT_SUBSCRIBER_MODE, command.name)) {
        redis = _this.subscriber;
      } else {
        if (typeof targetSlot === 'number') {
          redis = _this.slots[targetSlot];
        }
        if (asking && !random) {
          redis = asking;
          redis.asking();
        }
        if (random || !redis) {
          redis = _this.selectRandomNode();
        }
      }
      if (node && !node.redis) {
        node.redis = redis;
      }
      // this is where redis is undefined and crashes the system
      redis.sendCommand(command, stream);
    } else if (_this.options.enableOfflineQueue) {
      _this.offlineQueue.push({
        command: command,
        stream: stream,
        node: node
      });
    } else {
      command.reject(new Error('Cluster isn\'t ready and enableOfflineQueue options is false'));
    }
  }
  return command.promise;
};

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

Also, here is the stack trace (finally managed to get it)

  1) actions#verify() "before all" hook: add a set of jwt tokens:
     Uncaught TypeError: Cannot call method 'sendCommand' of undefined
      at tryConnection (/Users/vitaly/projects/ioredis/lib/cluster.js:324:13)
      at Cluster.sendCommand (/Users/vitaly/projects/ioredis/lib/cluster.js:294:3)
      at Cluster.executeOfflineCommands (/Users/vitaly/projects/ioredis/lib/cluster.js:255:12)
      at Cluster.refreshListener (/Users/vitaly/projects/ioredis/lib/cluster.js:82:12)
      at /Users/vitaly/projects/ioredis/lib/cluster.js:227:15
      at /Users/vitaly/projects/ioredis/lib/cluster.js:404:5
      at run (/Users/vitaly/projects/ioredis/lib/utils/index.js:146:16)

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

Further debugging:

  1. it check target slot, it's undefined

    if (typeof targetSlot === 'number') {
      redis = _this.slots[targetSlot];
      console.log('clause <target slot>');
    }
    
  2. it selects random node, apparently still undefined. Do we have empty nodes array? Probably this is the only way random node will return undefined

if (random || !redis) {
  redis = _this.selectRandomNode();
  console.log('clause <random node>');
}

Whole trace:

clause <target slot>
clause <random node>
Trace
    at tryConnection (/Users/vitaly/projects/ioredis/lib/cluster.js:330:27)
    at Cluster.sendCommand (/Users/vitaly/projects/ioredis/lib/cluster.js:294:3)
    at Cluster.executeOfflineCommands (/Users/vitaly/projects/ioredis/lib/cluster.js:255:12)
    at Cluster.refreshListener (/Users/vitaly/projects/ioredis/lib/cluster.js:82:12)
    at Cluster.g (events.js:180:16)
    at Cluster.emit (events.js:92:17)
    at /Users/vitaly/projects/ioredis/lib/cluster.js:227:15
    at /Users/vitaly/projects/ioredis/lib/cluster.js:411:5
    at run (/Users/vitaly/projects/ioredis/lib/utils/index.js:146:16)
    at tryCatcher (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/util.js:24:31)
    at Promise.successAdapter (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/nodeify.js:22:30)
    at Promise._settlePromiseAt (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/promise.js:528:21)
    at Promise._settlePromises (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/promise.js:646:14)
    at Async._drainQueue (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/async.js:177:16)
    at Async._drainQueues (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/async.js:187:10)
    at Async.drainQueues (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/async.js:15:14)
    at process._tickDomainCallback (node.js:492:13)

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

Added keys listing to random nodes call, and as I've suspected, the nodes array is empty. Question is why.

random nodes from: localhost:30001, localhost:30002, localhost:30003, localhost:30004, localhost:30005, localhost:30006
clause 3
clause <target slot>
random nodes from: 
clause <random node>
Trace
    at tryConnection (/Users/vitaly/projects/ioredis/lib/cluster.js:331:27)
    at Cluster.sendCommand (/Users/vitaly/projects/ioredis/lib/cluster.js:295:3)
    at Cluster.executeOfflineCommands (/Users/vitaly/projects/ioredis/lib/cluster.js:256:12)
    at Cluster.refreshListener (/Users/vitaly/projects/ioredis/lib/cluster.js:82:12)
    at Cluster.g (events.js:180:16)
    at Cluster.emit (events.js:92:17)
    at /Users/vitaly/projects/ioredis/lib/cluster.js:228:15
    at /Users/vitaly/projects/ioredis/lib/cluster.js:412:5
    at run (/Users/vitaly/projects/ioredis/lib/utils/index.js:146:16)
    at tryCatcher (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/util.js:24:31)
    at Promise.successAdapter (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/nodeify.js:22:30)
    at Promise._settlePromiseAt (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/promise.js:528:21)
    at Promise._settlePromises (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/promise.js:646:14)
    at Async._drainQueue (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/async.js:177:16)
    at Async._drainQueues (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/async.js:187:10)
    at Async.drainQueues (/Users/vitaly/projects/ioredis/node_modules/bluebird/js/main/async.js:15:14)
    at process._tickDomainCallback (node.js:492:13)

@AVVS
Copy link
Contributor Author

AVVS commented Jun 2, 2015

@luin I assume I know what happens. During the tests I create a new cluster each time with redis-trib.rb and then connect to it. It is possible that at a certain time when ioredis requests slot information - there is none, even though connection can be established. After that it deletes all nodes and crashes. Probably this case wont really happen in production unless the whole cluster dies/resets, but I think it should be handled as an error during refresh slot call

@luin
Copy link
Collaborator

luin commented Jun 2, 2015

@AVVS That makes sense. At first glance ioredis should retry refreshing the cluster info(related option: clusterRetryStrategy) when none of the nodes are ready. I'm going on a trip today and will look into this issue on the plane. Thank you for the information

@luin luin added the bug label Jun 2, 2015
@luin
Copy link
Collaborator

luin commented Jun 3, 2015

Should be fixed in 1.4.0

@luin luin closed this as completed Jun 4, 2015
@AVVS
Copy link
Contributor Author

AVVS commented Aug 20, 2015

@luin I've actually tested it more and pinpointed that to response from cluster that is not fully initialized. During my tests I start the cluster with redis-trib util via spawn process, then allocate the slots and its completed by the timeout, which is, of course, far from ideal. Either way, when redis receives some weird / incomplete response from a cluster, which contains 0 nodes (they were not able to connect yet) and I suppose that would still be the case for completely degraded cluster - it will make the process crash with

Uncaught TypeError: Cannot call method 'sendCommand' of undefined

@luin
Copy link
Collaborator

luin commented Aug 20, 2015

Yes, When the cluster slots command returns an empty array(see the below snippet), ioredis would raise the TypeError, which is obvious a bug.

127.0.0.1:8888> cluster slots
(empty list or set)

@luin
Copy link
Collaborator

luin commented Aug 20, 2015

I submitted a pr to fix this issue. I'll check it again tomorrow and welcome to give it a try if you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants