Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push preliminary AWS deployment documentation #467

Merged
merged 19 commits into from
Jun 18, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 211 additions & 0 deletions docs/howto/operate/add-aws-hub.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# Add a new hub in a AWS kops-based cluster

The idea behind this guide is showcase the process of setting up a
[kops](https://kops.sigs.k8s.io/getting_started/aws/)-based AWS
cluster and manually deploy a new hub on top of it using our deployer tool.
This is a preliminary but fully functional and manual process. Once
[#381](https://github.com/2i2c-org/pilot-hubs/issues/381) is resolved, we should be able
to automate the hub deployment process as we currently do with the GKE-based hubs.

```{note}
We will continue working toward a definitive one once we figured out some of the
discussions outlined in [#431](https://github.com/2i2c-org/pilot-hubs/issues/431).
```

## Pre-requisites

1. Follow the instructions outlined in
[Set up and use the the deployment scripts locally](./manual-deploy.md) to set up the
local environment and prepare `sops` to encrypt and decrypt files.

2. Install the awscli tool (you can use pip or conda to install it in the environment)
and configure it to use the provided AWS user credentials. Follow this
[guide](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html#cli-configure-quickstart-config)
for a quick configuration process.
damianavila marked this conversation as resolved.
Show resolved Hide resolved

```{note}
The customer with AWS admin privileges should have created a user for you with full
privileges. We will probably explore fine-graining the permissions actually needed
in the short-term.
```

3. Export some helpful AWS environment variables (because "aws configure" doesn't export
damianavila marked this conversation as resolved.
Show resolved Hide resolved
these environments vars for kops to use, so we do it manually).

```bash
export AWS_PROFILE=<cluster_name>
export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an easier / safer way is to put these in ~/.aws/credentials and just export AWS_PROFILE. Does aws configure populate ~/.aws/credentials?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does aws configure populate ~/.aws/credentials?

Yep, it does. In fact, those are already in the credentials file (this is why you can get them with aws configure get).
But AFAIK, when you export the profile name, the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY env variables are not exported as well and kops seems to need those accordingly to the docs (check the last box at that section): https://kops.sigs.k8s.io/getting_started/aws/#setup-iam-user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is minor, I will not block the merge on it.
We can always iterate and change it later down the road.
Btw, feel free to continue the discussion.

export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key)
```

## Create an AWS kops-based cluster

1. From the root directory on this repo, `cd` into the `kops` directory
2. Set up a *state* bucket for kops. This bucket will store the cluster
[state](https://kops.sigs.k8s.io/state/).

``` bash
export KOPS_STATE_STORE=s3://<2i2c>-<hub-name>-kops-state
damianavila marked this conversation as resolved.
Show resolved Hide resolved
damianavila marked this conversation as resolved.
Show resolved Hide resolved
aws s3 mb $KOPS_STATE_STORE --region <region>
damianavila marked this conversation as resolved.
Show resolved Hide resolved
```

```{note}
Kops [recommends](https://kops.sigs.k8s.io/getting_started/aws/#cluster-state-storage)
versioning the S3 bucket in case you ever need to revert or recover a previous state
damianavila marked this conversation as resolved.
Show resolved Hide resolved
store.
```

### Create and render a kops config file

You can use
[one](https://github.com/2i2c-org/pilot-hubs/blob/master/kops/farallon.jsonnet) of the
existing [jsonnet](https://jsonnet.org/) specifications as a "template" for your cluster.
You may need to tweak zones, names and instances, the rest is boilerplate to create a
damianavila marked this conversation as resolved.
Show resolved Hide resolved
kops-based cluster accordingly to some specification already outlined in
[#28](https://github.com/2i2c-org/pangeo-hubs/issues/28). Once you have your jsonnet
specification ready, you need to render it to create the config file kops understand.
damianavila marked this conversation as resolved.
Show resolved Hide resolved

1. Render the `kops` config file with

```bash
jsonnet <cluster_name>.jsonnet -y > <cluster_name>.kops.yaml
```

2. Regrettably, the rendering creates yaml file with with 3 dots at the end, you can
damianavila marked this conversation as resolved.
Show resolved Hide resolved
delete it with

```bash
sed -i '' -e '$ d' <cluster_name>.kops.yaml
```

```{note}
In Linux you will need to use instead: `sed -i '$ d' <cluster_name>.kops.yaml`
```

### Create the cluster

1. Create the cluster configuration and push it to s3 with

```bash
kops create -f <cluster_name>.kops.yaml
```

2. You will need to a ssh key pair before actually creating the cluster

```bash
ssh-keygen -f ssh-key
damianavila marked this conversation as resolved.
Show resolved Hide resolved
mv ssh-key <cluster_name>.key
mv ssh-key.pub <cluster_name>.key.pub
```

```{note}
You will need to relocate and encrypt the private key before pushing your changes to the
damianavila marked this conversation as resolved.
Show resolved Hide resolved
repository.
```

3. Build the cluster with (notice that you are passing the ssh public key you just
damianavila marked this conversation as resolved.
Show resolved Hide resolved
created in the previous step)

```bash
kops update cluster <cluster_name>hub.k8s.local --yes --ssh-public-key <cluster_name>.key.pub --admin
```

```{note}
The `--admin` at the end will modify your `~/.kube/config` file to point to the new
cluster.
```

If everything went as expected, the cluster will be created after some minutes, and you
should be able to validate it with

```bash
kops validate cluster --wait 10m
damianavila marked this conversation as resolved.
Show resolved Hide resolved
```

But validation will not pass until this next section is done.

### Apply workaround to run CoreDNS on the master node

More details on this [issue](https://github.com/kubernetes/kops/issues/11199)
damianavila marked this conversation as resolved.
Show resolved Hide resolved

1. After the failed validation finished (you can run these command in another terminal
damianavila marked this conversation as resolved.
Show resolved Hide resolved
if you are impatient ;-) patch CoreDNS with

```bash
kubectl -n kube-system patch deployment kube-dns --type json --patch '[{"op": "add", "path": "/spec/template/spec/tolerations", "value": [{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule"}]}]'
deployment.apps/kube-dns patched

kubectl -n kube-system patch deployment kube-dns-autoscaler --type json --patch '[{"op": "add", "path": "/spec/template/spec/tolerations", "value": [{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule"}]}]'
```

### Create an EFS for your cluster

1. Install boto3 with pip or conda
damianavila marked this conversation as resolved.
Show resolved Hide resolved

2. Create an [EFS](https://aws.amazon.com/efs/) file system for this hub with

```bash
python3 setup-efs.py <cluster_name>hub.k8s.local us-east-2
damianavila marked this conversation as resolved.
Show resolved Hide resolved
```

This will output an fs-<xxxxxxxx> id. You should use that value
(it should be something like `fs-<id>.efs.<region>.amazonaws.com`) in
the `basehub.nfsPVC.nfs.serverIP` at you hub config file.

## Deploy the new hub

1. First, `cd` back to the root of the repository

2. Generate a kubeconfig that will be used by the deployer with

```bash
KUBECONFIG=secrets/<cluster_name>.yaml kops export kubecfg --admin=730h <cluster_name>hub.k8s.local
```

3. Encrypt (in-place) the generated kubeconfig with sops

```bash
sops -i -e secrets/<cluster_name>.yaml
```

4. Generate a new config file for your cluster

You can use
[one](https://github.com/2i2c-org/pilot-hubs/blob/master/config/hubs/farallon.cluster.yaml)
of the existing cluster config files as a "template" for your cluster.
You may need to tweak names, `serverIP` and singleuser's images references. Make sure
damianavila marked this conversation as resolved.
Show resolved Hide resolved
you set up the `profileList` section to be compatible with your kops cluster (ie. match
the `node_selector` with the proper `instance-type`).

5. Set `proxy.https.enabled` to `false`. This creates the hubs without trying to give
them HTTPS, so we can appropriately create DNS entries for them.

6. Deploy the hub (or hubs in case you are deploying more than one) without running the
test with

```bash
python3 deployer deploy <cluster_name> --skip-hub-health-test
```

7. Get the AWS external IP for your hub with (supposing your hub is `staging`):

```bash
kubectl -n staging get svc proxy-public
```

Create a CNAME record for `staging.foo.2i2c.cloud` and point it to the AWS external IP.
damianavila marked this conversation as resolved.
Show resolved Hide resolved


```{note}
Wait for about 10 minutes to make sure the DNS records actually resolves properly.
```

8. Set `proxy.https.enabled` to `true` in the cluster config file so we can get HTTPS.

9. Finally run the deployer again with

```bash
python3 deployer deploy <cluster_name>
```

This last run should setup HTTPS and finally run a test suite.
1 change: 1 addition & 0 deletions docs/howto/operate/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ manual-deploy.md
grafana.md
node-administration.md
move-hub.md
add-aws-hub.md
```