Skip to content
Justin R edited this page Apr 25, 2017 · 6 revisions

Overview

Currently, a hub cluster is 3 nodes. Three nodes is the minimum required to have redundancy during a rolling restart.

While vertical scaling of hub clusters has been viable for the last 3 years, AWS instance sizes increase by a factor of two, which means the cost also doubles for each step increase. It would be valuable to scale horizontally, to allow more incremental additions to capacity, and to tune the cluster's parameters according to its usage.

Adding a fourth node in the existing scheme would add more overhead to writes and queries, and the size of the cluster's Spoke cache would not be increased.

Goals

  • Allow Spoke cache size to increase linearly with increasing cluster size
  • Minimize administrative overhead of a cluster's group membership
  • Minimize performance impact of node addition and subtraction
  • Minimize performance impact of rolling restarts
  • Maintain replication factor of 3 in most scenarios

Proposal

  • Use consistent hashing to map both channel names and hub nodes to a ring.
  • Always use the most recent group membership to determine write nodes.
  • Use historical ring maps to determine available servers for reads and queries
  • Automatic cluster configuration will apply during rolling restarts, deploys, node addtions and node removals.

Examples

Rolling Restart for 3 node cluster

With a 3 node cluster, all writes, reads and queries go to all running nodes.
A rolling restart means that, while each server is not in the cluster, requests will only go to two servers.
This behavior is unchanged under the new proposal.

Rolling Restart for 4+ node cluster

As each node is restarted, it will be removed and re-added to the cluster.
If we use all the rings created during a rolling restart, we will generate 33% more reads and queries.
In the 3 server example, we already tolerate brief reductions in the replication factor from 3 to 2.
We can achieve this same tolerance by ignoring brief outages (less than ~10 minutes) by a single server that follow the restart pattern:

TimeA [1, 2, 3, 4]
TimeB [2, 3, 4]
TimeC [1, 2, 3, 4]

Adding/Removing a node from a cluster

Anytime the cluster's group membership changes, the change will result in a new mapping.
From then on, writes will only use the new mapping.
Reads and queries will use the time appropriate mapping(s) until the old mapping expires (is older than the Spoke TTL)

Clustering Mechanism

The state of the cluster (which nodes are responsible for which ranges) will be stored in ZooKeeper.
We will use each host's fully qualified domain name to identify nodes in the mapping, and the hub will internally convert that to an ip address to avoid DNS overhead for every spoke call.

A new node will not return a /health success until it has joined the cluster and is ready to server Hub and Spoke requests.