-
Notifications
You must be signed in to change notification settings - Fork 35
Horizontal Scaling
Currently, a hub cluster is 3 nodes. Three nodes is the minimum required to have redundancy during a rolling restart.
While vertical scaling of hub clusters has been viable for the last 3 years, AWS instance sizes increase by a factor of two, which means the cost also doubles for each step increase. It would be valuable to scale horizontally, to allow more incremental additions to capacity, and to tune the cluster's parameters according to its usage.
Adding a fourth node in the existing scheme would add more overhead to writes and queries, and the size of the cluster's Spoke cache would not be increased.
- Allow Spoke cache size to increase linearly with increasing cluster size
- Minimize administrative overhead of a cluster's group membership
- Minimize performance impact of node addition and subtraction
- Minimize performance impact of rolling restarts
- Maintain replication factor of 3 in most scenarios
- Use consistent hashing to map both channel names and hub nodes to a ring.
- Always use the most recent group membership to determine write nodes.
- Use historical ring maps to determine available servers for reads and queries
- Automatic cluster configuration will apply during rolling restarts, deploys, node addtions and node removals.
With a 3 node cluster, all writes, reads and queries go to all running nodes.
A rolling restart means that, while each server is not in the cluster, requests will only go to two servers.
This behavior is unchanged under the new proposal.
As each node is restarted, it will be removed and re-added to the cluster.
If we use all the rings created during a rolling restart, we will generate 33% more reads and queries.
In the 3 server example, we already tolerate brief reductions in the replication factor from 3 to 2.
We can achieve this same tolerance by ignoring brief outages (less than ~10 minutes) by a single server that follow the restart pattern:
TimeA [1, 2, 3, 4]
TimeB [2, 3, 4]
TimeC [1, 2, 3, 4]
Anytime the cluster's group membership changes, the change will result in a new mapping.
From then on, writes will only use the new mapping.
Reads and queries will use the time appropriate mapping(s) until the old mapping expires (is older than the Spoke TTL)
The state of the cluster (which nodes are responsible for which ranges) will be stored in ZooKeeper.
We will use each host's fully qualified domain name to identify nodes in the mapping, and the hub will internally
convert that to an ip address to avoid DNS overhead for every spoke call.
A new node will not return a /health success until it has joined the cluster and is ready to server Hub and Spoke requests.