Skip to content

Commit

Permalink
Merge pull request #1466 from rzetelskik/multidc-docs
Browse files Browse the repository at this point in the history
Add Multi Datacenter ScyllaDB cluster deployment documentation
  • Loading branch information
scylla-operator-bot[bot] authored Oct 17, 2023
2 parents 84e7422 + c5c29d8 commit c69d4f4
Show file tree
Hide file tree
Showing 5 changed files with 914 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Scylla Operator Documentation
migration
nodeoperations/index
exposing
multidc/index
performance
upgrade
releases
Expand Down Expand Up @@ -55,6 +56,7 @@ Currently it supports:
* :doc:`Setting up Monitoring using Prometheus and Grafana <monitoring>`
* :doc:`Node operations <nodeoperations/index>`
* :doc:`Exposing ScyllaCluster to other networks <exposing>`
* :doc:`Deploying multi-datacenter ScyllaDB clusters in Kubernetes <multidc/index>`
* :doc:`Performance tuning [Experimental] <performance>`
* :doc:`Upgrade procedures <upgrade>`
* :doc:`Releases <releases>`
Expand Down
168 changes: 168 additions & 0 deletions docs/source/multidc/eks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Build multiple Amazon EKS clusters with inter-Kubernetes networking

This document describes the process of creating multiple Amazon EKS clusters in different regions, using separate VPCs, and explains the steps necessary for configuring inter-Kubernetes networking between the clusters.
The interconnected clusters can serve as a platform for [deploying a multi-datacenter ScyllaDB cluster](multidc.md).

This guide will walk you through the process of creating and configuring EKS clusters in two distinct regions. Although it is only an example setup, it can easily be built upon to create infrastructure tailored to your specific needs.
For simplicity, several predefined values are used throughout the document. The values are only exemplary and can be adjusted to your preference.

## Prerequisites

To follow the below guide, you first need to install and configure the tools that you will need to create and manage AWS and Kubernetes resources:
- eksctl – A command line tool for working with EKS clusters.
- kubectl – A command line tool for working with Kubernetes clusters.

For more information see [Getting started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) in AWS documentation.

## Create EKS clusters

### Create the first EKS cluster

Below is the required specification for the first cluster.

```yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: scylladb-us-east-1
region: us-east-1

availabilityZones:
- us-east-1a
- us-east-1b
- us-east-1c

vpc:
cidr: 10.0.0.0/16

nodeGroups:
...
```
Specify the first cluster's configuration file and save it as `cluster-us-east-1.yaml`.
Refer to [Creating an EKS cluster](../../eks#creating-an-eks-cluster) section of ScyllaDB Operator documentation for the reference of the configuration of node groups.

To deploy the first cluster, use the below command:
```shell
eksctl create cluster -f=cluster-us-east-1.yaml
```

Run the following command to learn the status and VPC ID of the cluster:
```shell
eksctl get cluster --name=scylladb-us-east-1 --region=us-east-1
```

You will need to get the cluster's context for future operations. To do so, use the below command:
```shell
kubectl config current-context
```

For any `kubectl` commands that you will want to run against this cluster, use the `--context` flag with the value returned by the above command.

#### Deploy ScyllaDB Operator

Once the cluster is ready, refer to [Deploying Scylla on a Kubernetes Cluster](../generic.md) to deploy the ScyllaDB Operator and its prerequisites.

#### Prepare nodes for running ScyllaDB

Then, prepare the nodes for running ScyllaDB workloads and deploy a volume provisioner following the steps described in [Deploying Scylla on EKS](../../eks#prerequisites) in ScyllaDB Operator documentation.

### Create the second EKS cluster

Below is the required specification for the second cluster. As was the case with the first cluster, the provided values are only exemplary and can be adjusted according to your needs.

``` caution::
It is required that the VPCs of the two EKS clusters have non-overlapping IPv4 network ranges.
```
```yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: scylladb-us-east-2
region: us-east-2

availabilityZones:
- us-east-2a
- us-east-2b
- us-east-2c

vpc:
cidr: 172.16.0.0/16

nodeGroups:
...
```
Follow analogous steps to create the second EKS cluster and prepare it for running ScyllaDB.
## Configure the network
The prepared Kubernetes clusters each have a dedicated VPC network.
To be able to route the traffic between the two VPC networks, you need to create a networking connection between them, otherwise known as [VPC peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html).
### Create VPC peering
Refer to [Create a VPC peering connection](https://docs.aws.amazon.com/vpc/latest/peering/create-vpc-peering-connection.html#create-vpc-peering-connection-local) in AWS documentation for instructions on creating a VPC peering connection between the two earlier created VPCs.
In this example, the ID of the created VPC peering connection is `pcx-08077dcc008fbbab6`.

### Update route tables

To enable private IPv4 traffic between the instances in the VPC peered network, you need to establish a communication channel by adding a route to the route tables associated with all the subnets associated with the instances for both VPCs.
The destination of the new route in a given route table is the CIDR of the VPC of the other cluster and the target is the ID of the VPC peering connection.

The following is an example of the route tables that enable communication of instances in two peered VPCs. Each table has a local route and the added route which sends traffic targeted at the other VPC to the peered network connection. The other preconfigured routes are omitted for readability.

<table>
<thead>
<tr>
<th>Route table</th>
<th>Destination</th>
<th>Target</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">eksctl-scylladb-us-east-1-cluster/PublicRouteTable</td>
<td>10.0.0.0/16</td>
<td>local</td>
</tr>
<tr>
<td>172.16.0.0/16</td>
<td>pcx-08077dcc008fbbab6</td>
</tr>
<tr>
<td rowspan="2">eksctl-scylladb-us-east-2-cluster/PublicRouteTable</td>
<td>172.16.0.0/16</td>
<td>local</td>
</tr>
<tr>
<td>10.0.0.0/16</td>
<td>pcx-08077dcc008fbbab6</td>
</tr>
</tbody>
</table>


Refer to [Update your route tables for a VPC peering connection](https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-routing.html) in AWS documentation for more information.

### Update security groups

To allow traffic to flow to and from instances associated with security groups in the peered VPC, you need to update the inbound rules of the VPCs' shared security groups.

Below is an example of the inbound rules that to be added to the corresponding security groups of the two VPCs.

| Security group name | Type | Protocol | Port range | Source |
|--------------------------------------------------------------------------------|-------------|----------|------------|----------------------|
| eksctl-scylladb-us-east-1-cluster-ClusterSharedNodeSecurityGroup-TD05V9EVU3B8 | All traffic | All | All | Custom 172.16.0.0/16 |
| eksctl-scylladb-us-east-2-cluster-ClusterSharedNodeSecurityGroup-1FR9YDLU0VE7M | All traffic | All | All | Custom 10.0.0.0/16 |

The names of the shared security groups of your VPCs should be similar to the ones presented in the example.

---

Having followed the above steps, you should now have a platform prepared for deploying a multi-datacenter ScyllaDB cluster.
Refer to [Deploy a multi-datacenter ScyllaDB cluster in multiple interconnected Kubernetes clusters](multidc.md) in ScyllaDB Operator documentation for guidance.
156 changes: 156 additions & 0 deletions docs/source/multidc/gke.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Build multiple GKE clusters with inter-Kubernetes networking

This document describes the process of creating multiple GKE clusters in a shared VPC and explains the steps necessary for configuring inter-Kubernetes networking between clusters in different regions.
The interconnected clusters can serve as a platform for [deploying a Multi Datacenter ScyllaDB cluster](multidc.md).

This guide will walk you through the process of creating and configuring GKE clusters in two distinct regions. Although it is only an example setup, it can easily be built upon to create infrastructure tailored to your specific needs.
For simplicity, several predefined values are used throughout the document. The values are only exemplary and can be adjusted to your preference.

## Prerequisites

To follow the below guide, you first need to install and configure the following tools that you will need to create and manage GCP and Kubernetes resources:
- gcloud CLI - Google Cloud Command Line Interface, a command line tool for working with Google Cloud resources and services directly.
- kubectl – A command line tool for working with Kubernetes clusters.

See [Install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install-sdk) in GCP documentation and [Install Tools](https://kubernetes.io/docs/tasks/tools/) in Kubernetes documentation for reference.

## Create and configure a VPC network

For the clusters to have inter-Kubernetes networking, you will create a virtual network shared between all the instances, with dedicated subnets for each of the clusters.
To create the subnets manually, create the network in custom subnet mode.

### Create the VPC network

Run the below command to create the network:
```shell
gcloud compute networks create scylladb --subnet-mode=custom
```

With the VPC network created, create a dedicated subnet with secondary CIDR ranges for their Pod and Service pools in each region which the clusters will reside in.

### Create VPC network subnets

To create a subnet for the first cluster in region `us-east1`, run the below command:
```shell
gcloud compute networks subnets create scylladb-us-east1 \
--region=us-east1 \
--network=scylladb \
--range=10.0.0.0/20 \
--secondary-range='cluster=10.1.0.0/16,services=10.2.0.0/20'
```

To create a subnet for the second cluster in region `us-west1`, run the below command:
```shell
gcloud compute networks subnets create scylladb-us-west1 \
--region=us-west1 \
--network=scylladb \
--range=172.16.0.0/20 \
--secondary-range='cluster=172.17.0.0/16,services=172.18.0.0/20'
```

``` caution::
It is required that the IPv4 address ranges of the subnets allocated for the GKE clusters do not overlap.
```

Refer to [Create a VPC-native cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips) and [Alias IP ranges](https://cloud.google.com/vpc/docs/alias-ip) in GKE documentation for more information about VPC native clusters and alias IP ranges.

## Create GKE clusters

With the VPC network created, you will now create two VPC native GKE clusters in dedicated regions.

### Create the first GKE cluster

Run the following command to create the first GKE cluster in the `us-east1` region:
```shell
gcloud container clusters create scylladb-us-east1 \
--location=us-east1-b \
--node-locations='us-east1-b,us-east1-c' \
--machine-type=n1-standard-8 \
--num-nodes=1 \
--disk-type=pd-ssd \
--disk-size=20 \
--image-type=UBUNTU_CONTAINERD \
--no-enable-autoupgrade \
--no-enable-autorepair \
--enable-ip-alias \
--network=scylladb \
--subnetwork=scylladb-us-east1 \
--cluster-secondary-range-name=cluster \
--services-secondary-range-name=services
```

Refer to [Creating a GKE cluster](../../gke#creating-a-gke-cluster) section of ScyllaDB Operator documentation for more information regarding the configuration and deployment of additional node pools, including the one dedicated for ScyllaDB nodes.

You will need to get the cluster's context for future operations. To do so, use the below command:
```shell
kubectl config current-context
```

For any `kubectl` commands that you will want to run against this cluster, use the `--context` flag with the value returned by the above command.

#### Deploy ScyllaDB Operator

Once the cluster is ready, refer to [Deploying Scylla on a Kubernetes Cluster](../generic.md) to deploy the ScyllaDB Operator and its prerequisites.

#### Prepare nodes for running ScyllaDB

Then, prepare the nodes for running ScyllaDB workloads and deploy a volume provisioner following the steps described in [Deploying Scylla on GKE](../gke.md) page of the documentation.

### Create the second GKE cluster

Run the following command to create the second GKE cluster in the `us-west1` region:
```shell
gcloud container clusters create scylladb-us-west1 \
--location=us-west1-b \
--node-locations='us-west1-b,us-west1-c' \
--machine-type=n1-standard-8 \
--num-nodes=1 \
--disk-type=pd-ssd \
--disk-size=20 \
--image-type=UBUNTU_CONTAINERD \
--no-enable-autoupgrade \
--no-enable-autorepair \
--enable-ip-alias \
--network=scylladb \
--subnetwork=scylladb-us-west1 \
--cluster-secondary-range-name=cluster \
--services-secondary-range-name=services
```

Follow analogous steps to create the second GKE cluster and prepare it for running ScyllaDB.

## Configure the firewall rules

When creating a cluster, GKE creates several ingress firewall rules that enable the instances to communicate with each other.
To establish interconnectivity between the two created Kubernetes clusters, you will now add the allocated IPv4 address ranges to their corresponding source address ranges.

First, retrieve the name of the firewall rule associated with the first cluster, which permits traffic between all Pods on a cluster, as required by the Kubernetes networking model.
The rule name is in the following format: `gke-[cluster-name]-[cluster-hash]-all`.

To retrieve it, run the below command:
```shell
gcloud compute firewall-rules list --filter='name~gke-scylladb-us-east1-.*-all'
```

The output should resemble the following:
```console
NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED
gke-scylladb-us-east1-f17db261-all scylladb INGRESS 1000 udp,icmp,esp,ah,sctp,tcp False
```

Modify the rule by updating the rule's source ranges with the allocated Pod IPv4 address ranges of both clusters:
```shell
gcloud compute firewall-rules update gke-scylladb-us-east1-f17db261-all --source-ranges='10.1.0.0/16,172.17.0.0/16'
```

Follow the analogous steps for the other cluster. In this example, its corresponding firewall rule name is `gke-scylladb-us-west1-0bb60902-all`. To update it, you would run:
```shell
gcloud compute firewall-rules update gke-scylladb-us-west1-0bb60902-all --source-ranges='10.1.0.0/16,172.17.0.0/16'
```

Refer to [Automatically created firewall rules](https://cloud.google.com/kubernetes-engine/docs/concepts/firewall-rules) in GKE documentation for more information.

---

Having followed the above steps, you should now have a platform prepared for deploying a multi-datacenter ScyllaDB cluster.
Refer to [Deploy a multi-datacenter ScyllaDB cluster in multiple interconnected Kubernetes clusters](multidc.md) in ScyllaDB Operator documentation for guidance.
25 changes: 25 additions & 0 deletions docs/source/multidc/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
==========================================================
Deploying multi-datacenter ScyllaDB clusters in Kubernetes
==========================================================

Prepare a platform for a multi datacenter ScyllaDB cluster deployment:

.. toctree::
:hidden:
:maxdepth: 2

eks
gke

* :doc:`Build multiple Amazon EKS clusters with Inter-Kubernetes networking <eks>`
* :doc:`Build multiple GKE clusters with Inter-Kubernetes networking <gke>`

Deploy a multi-datacenter ScyllaDB cluster in Kubernetes:

.. toctree::
:hidden:
:maxdepth: 2

multidc

* :doc:`Deploy a multi-datacenter ScyllaDB cluster in multiple interconnected Kubernetes clusters <multidc>`
Loading

0 comments on commit c69d4f4

Please sign in to comment.