[docs] node affinity docs; 0.2.0 release prep

- Changelog entries for 0.2.0 - Docs on new features (tolerations, expanded node affinity)
m3db · May 1, 2019 · 8a0d867 · 8a0d867
1 parent 795973f
commit 8a0d867
Show file tree

Hide file tree

Showing 10 changed files with 411 additions and 88 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,8 +1,42 @@
 # Changelog
 
+## 0.2.0
+
+The theme of this release is usability improvements and more granular control over node placement.
+
+Features such as specifying etcd endpoints directly on the cluster spec eliminate the need to provide a manual
+configuration for custom etcd endpoints. Per-cluster etcd environments will allow users to collocate multiple m3db
+clusters on a single etcd cluster.
+
+Users can now specify more complex affinity terms, and specify taints that their cluster tolerates to allow dedicating
+specific nodes to M3DB. See the [affinity docs][affinity-docs] for more.
+
+* [FEATURE] Allow specifying of etcd endpoints on M3DBCluster spec ([#99][99])
+* [FEATURE] Allow specifying security contexts for M3DB pods ([#107][107])
+* [FEATURE] Allow specifying tolerations of m3db pods ([#111][111])
+* [FEATURE] Allow specifying pod priority classes ([#119][119])
+* [FEATURE] Use a dedicated etcd-environment per-cluster to support sharing etcd clusters ([#99][99])
+* [FEATURE] Support more granular node affinity per-isolation group ([#106][106]) ([#131][131])
+* [ENHANCEMENT] Change default M3DB bootstrapper config to recover more easily when an entire cluster is taken down
+  ([#112][112])
+* [ENHANCEMENT] Build + release with Go 1.12 ([#114][114])
+* [ENHANCEMENT] Continuously reconcile configmaps ([#118][118])
+* [BUGFIX] Allow unknown protobuf fields to be unmarshalled ([#117][117])
+* [BUGFIX] Fix pod removal when removing more than 1 pod at a time ([#125][125])
+
+### Breaking Changes
+
+0.2.0 changes how M3DB stores its cluster topology in etcd to allow for multiple M3DB clusters to share an etcd cluster.
+A [migration script][etcd-migrate] is provided to copy etcd data from the old format to the new format. If migrating an
+operated cluster, run that script (see script for instructions) and then rolling restart your M3DB pods by deleting them
+one at a time.
+
+If using a custom configmap, this same change will require a modification to your configmap. See the
+[warning][configmap-warning] in the docs about how to ensure your configmap is compatible.
+
 ## 0.1.4
 
-* [ENHANCEMENT] Added the ability to use a specific StorageClass per-isolation group (StatefulSet) for clusters without
+* [FEATURE] Added the ability to use a specific StorageClass per-isolation group (StatefulSet) for clusters without
   topology aware volume provisioning ([#98][98])
 * [BUGFIX] Fixed a bug where pods were incorrectly selected if the cluster had labels ([#100][100])
 
@@ -18,13 +52,28 @@
 
 ## 0.1.0
 
-* TODO
+* Fix helm manifests.
 
 ## 0.1.0
 
-* TODO
+* Initial release.
+
+[affinity-docs]: https://operator.m3db.io/configuration/node_affinity/
+[etcd-migrate]: https://github.com/m3db/m3db-operator/blob/master/scripts/migrate_etcd_0.1_0.2.sh
+[configmap-warning]: https://operator.m3db.io/configuration/configuring_m3db/#environment-warning
 
 [94]: https://github.com/m3db/m3db-operator/pull/94
 [97]: https://github.com/m3db/m3db-operator/pull/97
 [98]: https://github.com/m3db/m3db-operator/pull/98
 [100]: https://github.com/m3db/m3db-operator/pull/100
+[106]: https://github.com/m3db/m3db-operator/pull/106
+[107]: https://github.com/m3db/m3db-operator/pull/107
+[111]: https://github.com/m3db/m3db-operator/pull/111
+[112]: https://github.com/m3db/m3db-operator/pull/112
+[114]: https://github.com/m3db/m3db-operator/pull/114
+[117]: https://github.com/m3db/m3db-operator/pull/117
+[118]: https://github.com/m3db/m3db-operator/pull/118
+[119]: https://github.com/m3db/m3db-operator/pull/119
+[99]: https://github.com/m3db/m3db-operator/pull/99
+[125]: https://github.com/m3db/m3db-operator/pull/125
+[131]: https://github.com/m3db/m3db-operator/pull/131
diff --git a/README.md b/README.md
@@ -65,7 +65,7 @@ kubectl apply -f https://raw.githubusercontent.com/m3db/m3db-operator/v0.1.4/exa
 
 Apply manifest with your zones specified for isolation groups:
 
-```
+```yaml
 apiVersion: operator.m3db.io/v1alpha1
 kind: M3DBCluster
 metadata:

diff --git a/docs/Dockerfile b/docs/Dockerfile
@@ -1,14 +1,13 @@
 # Dockerfile for building docs is stored in a separate dir from the docs,
 # otherwise the generated site will unnecessarily contain the Dockerfile
-
-FROM python:3.5-alpine
+FROM python:3.6-alpine3.9
 LABEL maintainer="The M3DB Authors <m3db@googlegroups.com>"
 
 WORKDIR /m3db
 EXPOSE 8000
 
 # mkdocs needs git-fast-import which was stripped from the default git package
 # by default to reduce size
-RUN pip install mkdocs==0.17.3 mkdocs-material==2.7.3 && \
+RUN pip install mkdocs==0.17.3 mkdocs-material==2.7.3 Pygments>=2.2 pymdown-extensions>=4.11 && \
     apk add --no-cache git-fast-import openssh-client
 ENTRYPOINT [ "/bin/ash", "-c" ]
diff --git a/docs/configuration/configuring_m3db.md b/docs/configuration/configuring_m3db.md
@@ -7,4 +7,23 @@ Prometheus reads/writes to the cluster. This template can be found
 To apply custom a configuration for the M3DB cluster, one can set the `configMapName` parameter of the cluster [spec] to
 an existing configmap.
 
+## Environment Warning
+
+If providing a custom config map, the `env` you specify in your [config][config] **must** be `$NAMESPACE/$NAME`, where
+`$NAMESPACE` is the Kubernetes namespace your cluster is in and `$NAME` is the name of the cluster. For example, with
+the following cluster:
+
+```yaml
+apiVersion: operator.m3db.io/v1alpha1
+kind: M3DBCluster
+metadata:
+  name: cluster-a
+  namespace: production
+...
+```
+
+The value of `env` in your config **MUST** be `production/cluster-a`. This restriction allows multiple M3DB clusters to
+safely share the same etcd cluster.
+
 [spec]: ../api
+[config]: https://github.com/m3db/m3db-operator/blob/795973f3329437ced3ac942da440810cd0865235/assets/default-config.yaml#L77
diff --git a/docs/configuration/namespaces.md b/docs/configuration/namespaces.md
@@ -12,7 +12,7 @@ Namespaces are configured as part of an `m3dbcluster` [spec][api-namespaces].
 
 This preset will store metrics at 10 second resolution for 2 days. For example, in your cluster spec:
 
-```
+```yaml
 spec:
 ...
   namespaces:
@@ -24,7 +24,7 @@ spec:
 
 This preset will store metrics at 1 minute resolution for 40 days.
 
-```
+```yaml
 spec:
 ...
   namespaces:
@@ -34,8 +34,32 @@ spec:
 
 ## Custom Namespaces
 
-You can also define your own custom namespaces by setting the `NamespaceOptions` within a cluster spec. See the
-[API][api-ns-options] for all the available fields.
+You can also define your own custom namespaces by setting the `NamespaceOptions` within a cluster spec. The
+[API][api-ns-options] lists all available fields. As an example, a namespace to store 7 days of data may look like:
+```yaml
+...
+spec:
+...
+  namespaces:
+  - name: custom-7d
+    options:
+      bootstrapEnabled: true
+      flushEnabled: true
+      writesToCommitLog: true
+      cleanupEnabled: true
+      snapshotEnabled: true
+      repairEnabled: false
+      retentionOptions:
+        retentionPeriodDuration: 168h
+        blockSizeDuration: 12h
+        bufferFutureDuration: 20m
+        bufferPastDuration: 20m
+        blockDataExpiry: true
+        blockDataExpiryAfterNotAccessPeriodDuration: 5m
+      indexOptions:
+        enabled: true
+        blockSizeDuration: 12h
+```
 
 
 [api-namespaces]: ../api#namespace

diff --git a/docs/configuration/node_affinity.md b/docs/configuration/node_affinity.md
@@ -0,0 +1,192 @@
+# Node Affinity & Cluster Topology
+
+## Node Affinity
+
+Kubernetes allows pods to be assigned to nodes based on various critera through [node affinity][k8s-node-affinity].
+
+M3DB was built with failure tolerance as a core feature. M3DB's [isolation groups][m3db-isogroups] allow shards to be
+placed across failure domains such that the loss of no single domain can cause the cluster to lose quorum. More details
+on M3DB's resiliency can be found in the [deployment docs][m3db-deployment].
+
+By leveraging Kubernetes' node affinity and M3DB's isolation groups, the operator can guarantee that M3DB pods are
+distributed across failure domains. For example, in a Kubernetes cluster spread across 3 zones in a cloud region, the
+`isolationGroups` below config would guarantee that no single zone failure could degrade the M3DB cluster.
+
+M3DB is unaware of the underlying zone topology: it just views the isolation groups as `group1`, `group2`, `group3` in
+its [placement][m3db-placement]. Thanks to the Kubernetes scheduler, however, these groups are actually scheduled across
+separate failure domains.
+
+```yaml
+apiVersion: operator.m3db.io/v1alpha1
+kind: M3DBCluster
+...
+spec:
+  replicationFactor: 3
+  isolationGroups:
+  - name: group1
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-b
+  - name: group2
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-c
+  - name: group3
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-d
+```
+
+## Tolerations
+
+In addition to allowing pods to be assigned to certain nodes via node affinity, Kubernetes allows pods to be _repelled_
+from nodes through [taints][k8s-taints] if they don't tolerate the taint. For example, the following config would ensure:
+
+1. Pods are spread across zones.
+
+2. Pods are only assigned to nodes in the `m3db-dedicated-pool` pool.
+
+3. No other pods could be assigned to those nodes (assuming they were tainted with the taint `m3db-dedicated-taint`).
+
+```yaml
+apiVersion: operator.m3db.io/v1alpha1
+kind: M3DBCluster
+...
+spec:
+  replicationFactor: 3
+  isolationGroups:
+  - name: group1
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-b
+    - key: nodepool
+      values:
+      - m3db-dedicated-pool
+  - name: group2
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-c
+    - key: nodepool
+      values:
+      - m3db-dedicated-pool
+  - name: group3
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-d
+    - key: nodepool
+      values:
+      - m3db-dedicated-pool
+  tolerations:
+  - key: m3db-dedicated
+    effect: NoSchedule
+    operator: Exists
+```
+
+## Example Affinity Configurations
+
+### Zonal Cluster
+
+The examples so far have focused on multi-zone Kubernetes clusters. Some users may only have a cluster in a single zone
+and accept the reduced fault tolerance. The following configuration shows how to configure the operator in a zonal
+cluster.
+
+```yaml
+apiVersion: operator.m3db.io/v1alpha1
+kind: M3DBCluster
+...
+spec:
+  replicationFactor: 3
+  isolationGroups:
+  - name: group1
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-b
+  - name: group2
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-b
+  - name: group3
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-b
+```
+
+### 6 Zone Cluster
+
+In the above examples we created clusters with 1 isolation group in each of 3 zones. Because `values` within a single
+[NodeAffinityTerm][node-affinity-term] are OR'd, we can also spread an isolationgroup across multiple zones. For
+example, if we had 6 zones available to us:
+
+```yaml
+apiVersion: operator.m3db.io/v1alpha1
+kind: M3DBCluster
+...
+spec:
+  replicationFactor: 3
+  isolationGroups:
+  - name: group1
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-a
+      - us-east1-b
+  - name: group2
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-c
+      - us-east1-d
+  - name: group3
+    numInstances: 3
+    nodeAffinityTerms:
+    - key: failure-domain.beta.kubernetes.io/zone
+      values:
+      - us-east1-e
+      - us-east1-f
+```
+
+### No Affinity
+
+If there are no failure domains available, one can have a cluster with no affinity where the pods will be scheduled however Kubernetes would place them by default:
+
+```yaml
+apiVersion: operator.m3db.io/v1alpha1
+kind: M3DBCluster
+...
+spec:
+  replicationFactor: 3
+  isolationGroups:
+  - name: group1
+    numInstances: 3
+  - name: group2
+    numInstances: 3
+  - name: group3
+    numInstances: 3
+```
+
+[k8s-node-affinity]: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
+[k8s-taints]: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
+[m3db-deployment]: https://docs.m3db.io/operational_guide/replication_and_deployment_in_zones/
+[m3db-isogroups]: https://docs.m3db.io/operational_guide/placement_configuration/#isolation-group
+[m3db-placement]: https://docs.m3db.io/operational_guide/placement/
+[node-affinity-term]: ../api/#nodeaffinityterm