Introduce Multi-Casskop, the Operator to manage a single Cassandra Cluster above multiple Kubernetes clusters.
-
PR #145 - Fix Issue #142 PodStatus which rarely fails in unit tests
-
PR #146 - Fix Issue #143 External update of SeedList was not possible
-
PR #147 - Introduce Multi-CassKop operator
-
PR #149 - Get rid of env var SERVICE_NAME and keep current hostname in seedlist
-
PR #151 - Fix Issue #150 Makes JMX port remotely available (again)
- uses New bootstrap Image 0.1.3 : orangeopensource/cassandra-bootstrap:0.1.3
Breaking Change in API
The fields spec.baseImage
and spec.version
have been removed in favor for spec.cassandraImage
witch is a merge of
both of thems.
- PR #128 Fix Issue #96: cluster stay pending
- PR #127 fix Issue #126: update racks in parallel
- PR #124: Add Support for pod & services annotations
- PR #138 Add support for Tolerations
Examples of annotation needed in the CassandraCluster Spec:
service:
annotations:
external-dns.alpha.kubernetes.io/hostname: my.custom.domain.com.
- PR #119 Refactoring Makefile
- tests now uses default cassandra docker image
- initContainerImage and bootstrapContainerImage used to adapt to official cassandra image.
- ReadOnly Container :
Spec.ReadOnlyRootFilesystem
default true
- upgrade to operator-sdk 0.9.0 & go modules (thanks @jsanda)
- Released version
- GitHub open source version
- Add
spec.gcStdout
(default: true): to send gc logs to docker stdout - Add
spec.topology.dc[].numTokens
(default: 256): to specify different number of vnodes for each DC - Move RollingPartition from
spec.RollingPartition
to `spec.topology.dc[].rack[].RollingPartition - Add Cassandra psp config files in deploy
- add
spec.maxPodUnavailable
(default: 1): If there is pod unavailable in the ring casskop will refuse to make change on statefulses. we can bypass this by increasing the maxPodUnavailable value.
- Upgrade to Operator-sdk 0.2.0
- Upgrade to Operator-sdk 0.1.1
- Add and Remove DC
- Decommission now using JMX call instead of exec nodetool
- Configurable operator resyncPeriod via environment RESYNC_PERIOD
- No more uses of the Kubernetes subdomain in Cassandra Seeds --> Need Cassandra Docker Image > cassandra-3.11-v1.1.0
- This also fixes the First node SeedList. we know via dns request if the first node exists or not, and if not it is the first creation of the cluster. So next times we can properly remove node1 from it's seedlist.
- Add new parameter
imagePullPolicy: "IfNotPresent"
to the CRD (default is "Always") - Add
securityContext: runAsUser: 1000
to allow pod operator to launch with higher cluster security
- Fix Issue 60: Error when RollingUpdate on UpdateResource
- Fix Issue 59: Error on UpdateConfigMap vs UpdateStatefulset
-
SeedList Management
- new param
AutoUpdateSeedList
which defines if operator need to automatically compute and apply best seedList
- new param
-
CRD Improvement :
- CRD protection against forbidden changed in the CRD. the operator now refuses to change:
- the
dataCapacity
- the
dataStorageClass
- the
- We can now specify/surcharge the global
nodesPerRack
in each DC section
- CRD protection against forbidden changed in the CRD. the operator now refuses to change:
-
Better Status Management
-
Add Cluster level status to have a global view of whole cluster (composed of several statefulsets)
- lastClusterAction
- lastClusterActionStatus
Thoses status are used to know there is an ongoing action at cluster level, and that enables for instance to completely finish an ScaleUp on all Racks, before executing PodLevel actions such as NodeCleanup.
-
Add new status :
UpdateResources
- if we change requested Pod resources in the CRDUpdateSeedList
- when the operator need to make rolling Update to apply new seedlistWe won't update the Seedist if not All Racks are staged to this modification (no other actions ongoing on the cluster)
-
-
Add
ImagePullSecret
parameter in the CRD to allow provide docker credentials to pull images -
SeedList Management
- SeedList Initialisation before Startup: We try (if available) to take 1 seed in each rack for each DC
- the Operator will try to apply the best SeedList in case of cluster topology evolution (Scaling, Add DC/Racks..)
- The Operator will make a Rolling Update (see nes status above)
- The DFY Cassandra Image in couple with the Operator will make that a Pod in the SeedList will be removed from it's own seedList.
Limitation: The first Pod of the cluster will be in it's own SeedList
- We can manually update the SeedList on the CRD Object, this will RollingUpdate each statefulset sequentially starting with the First
-
Operator Debug
- Allow specific
docker-build-debug
target in the Makefile and in the Pipeline to build debug version of the operator- debug version of go application
- debug version of Image docker
- debug version of helm chart (see below)
- Allow specific
-
Helm Chart Improvment
-
Add Possibility to use images behind authentication (imagePullSecret)
imagePullSecrets:
enabled: true
name: <name of your docker registry secret>
- New way to define Debug Image and delve version API to uses
debug:
enabled: false
image:
repository: orangeopensource/casskop
tag: 0.1.2-debug
version: 2
- When NodeCleanup encounters some errors, we can see the status in the CassandraCluster
- Fix Bug #53: Error which prevent PVC to be Deleted when CRD is delete and
deletePVC
flag is true - Fix Bug #52: The cluster was not deploying if Topology was empty
- Rack Aware Deployment
- Add Topology section to declare Cassandra DC and Racks on their deployment ysing kubernetes nodes labels
- Note: Rename of
nodes
tonodesPerRacks
in the CRD yaml file
- add
hardAntiAffinity
flag in CRD to manage if we allow only 1 Cassandra Node per Kubernetes Nodes.
Limitation: This parameter check only for Pods in the same kubernetes Namespace!!
-
add
deletePVC
flag in CRD to allow to delete all PersistentVolumesClaims in case we delete the cluster -
Uses Jolokia for nodetool cleanup operation
-
Add
autoPilot
flag in CRD to enable to automatically execute Pod Operation cleanup after a ScaleUp, or to allow to do the Operation manually by editing Pods Labels status to Manual to ToDo
-
Rack Aware Deployment
- Pod level get infos for Rack & DC. PR #33
- Exposes CASSANDRA_RACK env var in the Pod from
cassandraclusters.db.orange.com.rack
Pod Labels - Exposes CASSANDRA_DC env var in the Pod from
cassandraclusters.db.orange.com.dc
Pod Labels
- Exposes CASSANDRA_RACK env var in the Pod from
- Pod level get infos for Rack & DC. PR #33
-
Make Uses of OLM (Operator Lifecycle Management) to manage the Operator
- #25: change declaration of local-storage in PersistentVolumeClaim
- Upgrade Operator SDK version to latest master (revision=a719b04752a51e5fe723467c7e66bc35830eb179)
- Add start time and end time labels on Pods during Pod Actions
- Add a Test on Operation Name for detecting an end in Cleanup Action
- in ensureDecommission
- Re-Order Status in ensureDecommission
- Add test on CassandraNode status to know if decommissioned is ongoing or not
- Add asynchronous for nodetool decommission operation
- Add Helm charts to deploy the operator
- Add a Pod Disruption Budget which allows to have only 2 cassandra node down at a same time while working on the kubernetes cluster
- Add a Jolokia client to interract with Cassandra
- Remove old unused code
- Add a test on the Pod Readiness before say ScaleUp is Done
- Increase HealthCheck Periods and Timeouts
- Add output messages in health checks requests for debug
- Fix GetLastPod is number of pods > 10
- Better management of decommission status (check with nodetool netstats to get node status), and adapt behaviour
- On scale down, test Date on pod label to not execute several time nodetool decommission until status change from NORMAL to LEAVING
- Add test on field readyReplicas of the Statefulset to know operation is Done
- add sample directory for demo manifests.
- Add plantuml algorithm documentation
- If no dataCapacity is specified in the CRD, then No PersistentVolumeClaim is created
- WARNING this is useful for dev but unsafe for production meaning that no datas will be persistent..
- Increase Timeout for HealthCheck Status from 5 to 40 and add PeriodSeconds to 50 between each healthcheck
- remove
nodetool drain
from the PreStop instruction - Add PodDisruptionBudget with MaxUnavailable=2
- Initial version port from cassandra-kooper-operator propject