Dependency watchdog prober command takes command-line-flags which are meant to fine-tune the prober. In addition a ConfigMap
is also mounted to the container which provides tuning knobs for the all probes that the prober starts.
Prober can be configured via the following flags:
Flag Name | Type | Required | Default Value | Description |
---|---|---|---|---|
kube-api-burst | int | No | 10 | Burst to use while talking with kubernetes API server. The number must be >= 0. If it is 0 then a default value of 10 will be used |
kube-api-qps | float | No | 5.0 | Maximum QPS (queries per second) allowed when talking with kubernetes API server. The number must be >= 0. If it is 0 then a default value of 5.0 will be used |
concurrent-reconciles | int | No | 1 | Maximum number of concurrent reconciles |
config-file | string | Yes | NA | Path of the config file containing the configuration to be used for all probes |
metrics-bind-addr | string | No | ":9643" | The TCP address that the controller should bind to for serving prometheus metrics |
health-bind-addr | string | No | ":9644" | The TCP address that the controller should bind to for serving health probes |
enable-leader-election | bool | No | false | In case prober deployment has more than 1 replica for high availability, then it will be setup in a active-passive mode. Out of many replicas one will become the leader and the rest will be passive followers waiting to acquire leadership in case the leader dies. |
leader-election-namespace | string | No | "garden" | Namespace in which leader election resource will be created. It should be the same namespace where DWD pods are deployed |
leader-elect-lease-duration | time.Duration | No | 15s | The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. |
leader-elect-renew-deadline | time.Duration | No | 10s | The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled. |
leader-elect-retry-period | time.Duration | No | 2s | The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. |
You can view an example kubernetes prober deployment YAML to see how these command line args are configured.
A probe configuration is mounted as ConfigMap
to the container. The path to the config file is configured via config-file
command line argument as mentioned above. Prober will start one probe per Shoot control plane hosted within the Seed cluster. Each such probe will run asynchronously and will periodically connect to the Kube ApiServer of the Shoot. Configuration below will influence each such probe.
You can view an example YAML configuration provided as data
in a ConfigMap
here.
Name | Type | Required | Default Value | Description |
---|---|---|---|---|
kubeConfigSecretName | string | Yes | NA | Name of the kubernetes Secret which has the encoded KubeConfig required to connect to the Shoot control plane Kube ApiServer via an internal domain. This typically uses the local cluster DNS. |
probeInterval | metav1.Duration | No | 10s | Interval with which each probe will run. |
initialDelay | metav1.Duration | No | 30s | Initial delay for the probe to become active. Only applicable when the probe is created for the first time. |
probeTimeout | metav1.Duration | No | 30s | In each run of the probe it will attempt to connect to the Shoot Kube ApiServer. probeTimeout defines the timeout after which a single run of the probe will fail. |
backoffJitterFactor | float64 | No | 0.2 | Jitter with which a probe is run. |
dependentResourceInfos | []prober.DependentResourceInfo | Yes | NA | Detailed below. |
kcmNodeMonitorGraceDuration | metav1.Duration | Yes | NA | It is the node-monitor-grace-period set in the kcm flags. Used to determine whether a node lease can be considered expired. |
nodeLeaseFailureFraction | float64 | No | 0.6 | is used to determine the maximum number of leases that can be expired for a lease probe to succeed. |
If a lease probe fails, then it scales down the dependent resources defined by this property. Similarly, if the lease probe is now successful, then it scales up the dependent resources defined by this property.
Each dependent resource info has the following properties:
Name | Type | Required | Default Value | Description |
---|---|---|---|---|
ref | autoscalingv1.CrossVersionObjectReference | Yes | NA | It is a collection of ApiVersion, Kind and Name for a kubernetes resource thus serving as an identifier. |
optional | bool | Yes | NA | It is possible that a dependent resource is optional for a Shoot control plane. This property enables a probe to determine the correct behavior in case it is unable to find the resource identified via ref . |
scaleUp | prober.ScaleInfo | No | Captures the configuration to scale up this resource. Detailed below. | |
scaleDown | prober.ScaleInfo | No | Captures the configuration to scale down this resource. Detailed below. |
NOTE: Since each dependent resource is a target for scale up/down, therefore it is mandatory that the resource reference points a kubernetes resource which has a
scale
subresource.
How to scale a DependentResourceInfo
is captured in ScaleInfo
. It has the following properties:
Name | Type | Required | Default Value | Description |
---|---|---|---|---|
level | int | Yes | NA | Detailed below. |
initialDelay | metav1.Duration | No | 0s (No initial delay) | Once a decision is taken to scale a resource then via this property a delay can be induced before triggering the scale of the dependent resource. |
timeout | metav1.Duration | No | 30s | Defines the timeout for the scale operation to finish for a dependent resource. |
Determining target replicas
Prober cannot assume any target replicas during a scale-up operation for the following reasons:
- Kubernetes resources could be set to provide highly availability and the number of replicas could wary from one shoot control plane to the other. In gardener the number of replicas of pods in shoot namespace are controlled by the shoot control plane configuration.
- If Horizontal Pod Autoscaler has been configured for a kubernetes dependent resource then it could potentially change the
spec.replicas
for a deployment/statefulset.
Given the above constraint lets look at how prober determines the target replicas during scale-down or scale-up operations.
-
Scale-Up
: Primary responsibility of a probe while performing a scale-up is to restore the replicas of a kubernetes dependent resource prior to scale-down. In order to do that it updates the following for each dependent resource that requires a scale-up:spec.replicas
: Checks ifdependency-watchdog.gardener.cloud/replicas
is set. If it is, then it will take the value stored against this key as the target replicas. To be a valid value it should always be greater than 0.- If
dependency-watchdog.gardener.cloud/replicas
annotation is not present then it falls back to the hard coded default value for scale-up which is set to 1. - Removes the annotation
dependency-watchdog.gardener.cloud/replicas
if it exists.
-
Scale-Down
: To scale down a dependent kubernetes resource it does the following:- Adds an annotation
dependency-watchdog.gardener.cloud/replicas
and sets its value to the current value ofspec.replicas
. - Updates
spec.replicas
to 0.
- Adds an annotation
Level
Each dependent resource that should be scaled up or down is associated to a level. Levels are ordered and processed in ascending order (starting with 0 assigning it the highest priority). Consider the following configuration:
dependentResourceInfos:
- ref:
kind: "Deployment"
name: "kube-controller-manager"
apiVersion: "apps/v1"
scaleUp:
level: 1
scaleDown:
level: 0
- ref:
kind: "Deployment"
name: "machine-controller-manager"
apiVersion: "apps/v1"
scaleUp:
level: 1
scaleDown:
level: 1
- ref:
kind: "Deployment"
name: "cluster-autoscaler"
apiVersion: "apps/v1"
scaleUp:
level: 0
scaleDown:
level: 2
Let us order the dependent resources by their respective levels for both scale-up and scale-down. We get the following order:
Scale Up Operation
Order of scale up will be:
- cluster-autoscaler
- kube-controller-manager and machine-controller-manager will be scaled up concurrently after cluster-autoscaler has been scaled up.
Scale Down Operation
Order of scale down will be:
- kube-controller-manager
- machine-controller-manager after (1) has been scaled down.
- cluster-autoscaler after (2) has been scaled down.
A probe can be configured to ignore scaling of configured dependent kubernetes resources.
To do that one must set dependency-watchdog.gardener.cloud/ignore-scaling
annotation to true
on the scalable resource for which scaling should be ignored.
Dependency watchdog weeder command also (just like the prober command) takes command-line-flags which are meant to fine-tune the weeder. In addition a ConfigMap
is also mounted to the container which helps in defining the dependency of pods on endpoints.
Weeder can be configured with the same flags as that for prober described under command-line-arguments section You can find an example weeder deployment YAML to see how these command line args are configured.
Weeder configuration is mounted as ConfigMap
to the container. The path to the config file is configured via config-file
command line argument as mentioned above. Weeder will start one go routine per podSelector per endpoint on an endpoint event as described in weeder internal concepts.
You can view the example YAML configuration provided as data
in a ConfigMap
here.
Name | Type | Required | Default Value | Description |
---|---|---|---|---|
watchDuration | *metav1.Duration | No | 5m0s | The time duration for which watch is kept on dependent pods to see if anyone turns to CrashLoopBackoff |
servicesAndDependantSelectors | map[string]DependantSelectors | Yes | NA | Endpoint name and its corresponding dependent pods. More info below. |
If the service recovers from downtime, then weeder starts to watch for CrashLoopBackOff pods. These pods are identified by info stored in this property.
Name | Type | Required | Default Value | Description |
---|---|---|---|---|
podSelectors | []*metav1.LabelSelector | Yes | NA | This is a list of Label selector |