Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor probes based on performance feedback #264

Merged
merged 19 commits into from
Mar 26, 2021
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 16 additions & 11 deletions charts/pega/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,17 +393,22 @@ tier:
disktype: ssd
```

### Liveness and readiness probes

[Probes are used by Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) to determine application health. Configure a probe for *liveness* to determine if a Pod has entered a broken state; configure it for *readiness* to determine if the application is available to be exposed. You can configure probes independently for each tier. If not explicitly configured, default probes are used during the deployment. Set the following parameters as part of a `livenessProbe` or `readinessProbe` configuration.

Parameter | Description | Default value
--- | --- | ---
`initialDelaySeconds` | Number of seconds after the container has started before liveness or readiness probes are initiated. | `300`
`timeoutSeconds` | Number of seconds after which the probe times out. | `20`
`periodSeconds` | How often (in seconds) to perform the probe. Some providers such as GCP require this value to be greater than the timeout value. | `30`
`successThreshold` | Minimum consecutive successes for the probe to be considered successful after it determines a failure. | `1`
`failureThreshold` | The number consecutive failures for the pod to be terminated by Kubernetes. | `3`
### Liveness, readiness, and startup probes

[Probes are used by Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) to determine application health. Configure a probe for *liveness* to determine if a Pod has entered a broken state; configure it for *readiness* to determine if the application is available to be exposed; configure it for *startup* to determine if a pod is ready to be checked for liveness. You can configure probes independently for each tier. If not explicitly configured, default probes are used during the deployment. Set the following parameters as part of a `livenessProbe`, `readinessProbe`, or `startupProbe` configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pega supports using liveness, readiness, and startup probes to determine application health in your deployments. For an overview of these probes, see Configure Liveness, Readiness and Startup Probes .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (with minor modifications)


Notes:
* `startupProbe` is only supported as of Kubernetes 1.18. If running a version older than 1.18, `startupProbe` will be ignored and different default values will be used for `livenessProbe` and `readinessProbe`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes 1.18 and later supports startupProbe. If your deployment uses a Kubernetes version older than 1.18, it ignores your startupProbe settings and uses different default values for livenessProbe and readinessProbe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (with minor modifications)

* `timeoutSeconds` cannot be greater than `periodSeconds` in some GCP environments. See [this API library from Google](https://developers.google.com/resources/api-libraries/documentation/compute/v1/csharp/latest/classGoogle_1_1Apis_1_1Compute_1_1v1_1_1Data_1_1HttpHealthCheck.html#a027a3932f0681df5f198613701a83145).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For details, see [this API library from Google]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Default values are listed below in order of liveness, readiness, and startup.

Parameter | Description | Default - 1.18+ | Default - pre 1.18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default - pre-1.18

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

--- | --- | --- | ---
`initialDelaySeconds` | Number of seconds after the container has started before probes are initiated. | `0`, `0`, `10` | `200`, `30`
`timeoutSeconds` | Number of seconds after which the probe times out. | `20`, `20`, `10` | `20`, `10`
`periodSeconds` | How often (in seconds) to perform the probe. | `30`, `30`, `10` | `30`, `10`
`successThreshold` | Minimum consecutive successes for the probe to be considered successful after it determines a failure. | `1`, `1`, `1` | `1`, `2`
`failureThreshold` | The number consecutive failures for the pod to be terminated by Kubernetes. | `3`, `3`, `20` | `3`, `6`

Example:

Expand Down
45 changes: 42 additions & 3 deletions charts/pega/templates/_pega-deployment.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -182,14 +182,15 @@ spec:
{{- end }}
- name: {{ template "pegaVolumeCredentials" }}
mountPath: "/opt/pega/secrets"
{{- if (semverCompare ">= 1.18.0-0" (trimPrefix "v" .root.Capabilities.KubeVersion.GitVersion)) }}
# LivenessProbe: indicates whether the container is live, i.e. running.
{{- $livenessProbe := .node.livenessProbe }}
livenessProbe:
httpGet:
path: "/{{ template "pega.applicationContextPath" . }}/PRRestService/monitor/pingService/ping"
port: 8080
port: 8081
scheme: HTTP
initialDelaySeconds: {{ $livenessProbe.initialDelaySeconds | default 300 }}
initialDelaySeconds: {{ $livenessProbe.initialDelaySeconds | default 0 }}
timeoutSeconds: {{ $livenessProbe.timeoutSeconds | default 20 }}
periodSeconds: {{ $livenessProbe.periodSeconds | default 30 }}
successThreshold: {{ $livenessProbe.successThreshold | default 1 }}
Expand All @@ -201,11 +202,49 @@ spec:
path: "/{{ template "pega.applicationContextPath" . }}/PRRestService/monitor/pingService/ping"
port: 8080
scheme: HTTP
initialDelaySeconds: {{ $readinessProbe.initialDelaySeconds | default 300 }}
initialDelaySeconds: {{ $readinessProbe.initialDelaySeconds | default 0 }}
timeoutSeconds: {{ $readinessProbe.timeoutSeconds | default 20 }}
periodSeconds: {{ $readinessProbe.periodSeconds | default 30 }}
successThreshold: {{ $readinessProbe.successThreshold | default 1 }}
failureThreshold: {{ $readinessProbe.failureThreshold | default 3 }}
# StartupProbe: indicates whether the container has completed its startup process, and delays the LivenessProbe
{{- $startupProbe := .node.startupProbe }}
startupProbe:
httpGet:
path: "/{{ template "pega.applicationContextPath" . }}/PRRestService/monitor/pingService/ping"
port: 8080
scheme: HTTP
initialDelaySeconds: {{ $startupProbe.initialDelaySeconds | default 10 }}
timeoutSeconds: {{ $startupProbe.timeoutSeconds | default 10 }}
periodSeconds: {{ $startupProbe.periodSeconds | default 10 }}
successThreshold: {{ $startupProbe.successThreshold | default 1 }}
failureThreshold: {{ $startupProbe.failureThreshold | default 20 }}
{{- else }}
# LivenessProbe: indicates whether the container is live, i.e. running.
{{- $livenessProbe := .node.livenessProbe }}
livenessProbe:
httpGet:
path: "/{{ template "pega.applicationContextPath" . }}/PRRestService/monitor/pingService/ping"
port: 8081
scheme: HTTP
initialDelaySeconds: {{ $livenessProbe.initialDelaySeconds | default 200 }}
timeoutSeconds: {{ $livenessProbe.timeoutSeconds | default 20 }}
periodSeconds: {{ $livenessProbe.periodSeconds | default 30 }}
successThreshold: {{ $livenessProbe.successThreshold | default 1 }}
failureThreshold: {{ $livenessProbe.failureThreshold | default 3 }}
# ReadinessProbe: indicates whether the container is ready to service requests.
{{- $readinessProbe := .node.readinessProbe }}
readinessProbe:
httpGet:
path: "/{{ template "pega.applicationContextPath" . }}/PRRestService/monitor/pingService/ping"
port: 8080
scheme: HTTP
initialDelaySeconds: {{ $readinessProbe.initialDelaySeconds | default 30 }}
timeoutSeconds: {{ $readinessProbe.timeoutSeconds | default 10 }}
periodSeconds: {{ $readinessProbe.periodSeconds | default 10 }}
successThreshold: {{ $readinessProbe.successThreshold | default 2 }}
failureThreshold: {{ $readinessProbe.failureThreshold | default 6 }}
{{- end }}
# Mentions the restart policy to be followed by the pod. 'Always' means that a new pod will always be created irrespective of type of the failure.
restartPolicy: Always
# Amount of time in which container has to gracefully shutdown.
Expand Down
16 changes: 8 additions & 8 deletions charts/pega/values-large.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

hpa:
Expand All @@ -132,8 +132,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

hpa:
Expand Down Expand Up @@ -192,8 +192,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

hpa:
Expand All @@ -211,8 +211,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

hpa:
Expand Down
8 changes: 4 additions & 4 deletions charts/pega/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

# Optionally overridde default resource specifications
Expand Down Expand Up @@ -136,8 +136,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

# To configure an alternative user for your custom image, set value for runAsUser
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

# Optionally overridde default resource specifications
Expand Down Expand Up @@ -132,8 +132,8 @@ global:

deploymentStrategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate

hpa:
Expand Down
90 changes: 48 additions & 42 deletions terratest/src/test/pega/pega-tier-deployment_test.go
Original file line number Diff line number Diff line change
@@ -1,75 +1,71 @@
package pega

import (
"fmt"
"path/filepath"
"strings"
"testing"

"github.com/gruntwork-io/terratest/modules/helm"
"github.com/stretchr/testify/require"
appsv1 "k8s.io/api/apps/v1"
appsv1beta2 "k8s.io/api/apps/v1beta2"
k8score "k8s.io/api/core/v1"
intstr "k8s.io/apimachinery/pkg/util/intstr"
"path/filepath"
"strings"
"testing"
"fmt"
)


var initContainers = []string{"wait-for-pegasearch", "wait-for-cassandra"}

func TestPegaTierDeployment(t *testing.T){
var supportedVendors = []string{"k8s","openshift","eks","gke","aks","pks"}
var supportedOperations = []string{"deploy","install-deploy","upgrade-deploy"}
func TestPegaTierDeployment(t *testing.T) {
var supportedVendors = []string{"k8s", "openshift", "eks", "gke", "aks", "pks"}
var supportedOperations = []string{"deploy", "install-deploy", "upgrade-deploy"}

helmChartPath, err := filepath.Abs(PegaHelmChartPath)
require.NoError(t, err)

for _, vendor := range supportedVendors {

for _,vendor := range supportedVendors{

for _,operation := range supportedOperations{
for _, operation := range supportedOperations {

fmt.Println(vendor + "-" + operation)

var options = &helm.Options{
var options = &helm.Options{
SetValues: map[string]string{
"global.provider": vendor,
"global.actions.execute": operation,
},
}
},
}

yamlContent := RenderTemplate(t, options, helmChartPath, []string{"templates/pega-tier-deployment.yaml"})
yamlSplit := strings.Split(yamlContent, "---")
assertWeb(t,yamlSplit[1],options)
assertBatch(t,yamlSplit[2],options)
assertStream(t,yamlSplit[3],options)
yamlSplit := strings.Split(yamlContent, "---")
assertWeb(t, yamlSplit[1], options)
assertBatch(t, yamlSplit[2], options)
assertStream(t, yamlSplit[3], options)

}
}
}

func assertStream(t *testing.T, streamYaml string, options *helm.Options){
func assertStream(t *testing.T, streamYaml string, options *helm.Options) {
var statefulsetObj appsv1beta2.StatefulSet
UnmarshalK8SYaml(t,streamYaml,&statefulsetObj)
VerifyPegaStatefulSet(t, &statefulsetObj, pegaDeployment{"pega-stream", initContainers, "Stream", "900"},options)
UnmarshalK8SYaml(t, streamYaml, &statefulsetObj)
VerifyPegaStatefulSet(t, &statefulsetObj, pegaDeployment{"pega-stream", initContainers, "Stream", "900"}, options)
}



func assertBatch(t *testing.T, batchYaml string, options *helm.Options){
func assertBatch(t *testing.T, batchYaml string, options *helm.Options) {
var deploymentObj appsv1.Deployment
UnmarshalK8SYaml(t,batchYaml,&deploymentObj)
UnmarshalK8SYaml(t, batchYaml, &deploymentObj)
VerifyPegaDeployment(t, &deploymentObj,
pegaDeployment{"pega-batch", initContainers, "BackgroundProcessing,Search,Batch,RealTime,Custom1,Custom2,Custom3,Custom4,Custom5,BIX", ""},
options)

}

func assertWeb(t *testing.T, webYaml string, options *helm.Options){
func assertWeb(t *testing.T, webYaml string, options *helm.Options) {
var deploymentObj appsv1.Deployment
UnmarshalK8SYaml(t,webYaml,&deploymentObj)
UnmarshalK8SYaml(t, webYaml, &deploymentObj)
VerifyPegaDeployment(t, &deploymentObj, pegaDeployment{"pega-web", initContainers, "WebUser", "900"}, options)



}

// VerifyPegaStatefulSet - Performs specific Pega statefulset assertions with the values as provided in default values.yaml
Expand All @@ -85,14 +81,13 @@ func VerifyPegaStatefulSet(t *testing.T, statefulsetObj *appsv1beta2.StatefulSet
VerifyDeployment(t, &statefulsetSpec, expectedStatefulset, options)
}


// VerifyPegaDeployment - Performs specific Pega deployment assertions with the values as provided in default values.yaml
func VerifyPegaDeployment(t *testing.T, deploymentObj *appsv1.Deployment, expectedDeployment pegaDeployment, options *helm.Options) {
require.Equal(t, *deploymentObj.Spec.Replicas, int32(1))
require.Equal(t, *deploymentObj.Spec.ProgressDeadlineSeconds, int32(2147483647))
require.Equal(t, expectedDeployment.name, deploymentObj.Spec.Selector.MatchLabels["app"])
require.Equal(t, *deploymentObj.Spec.Strategy.RollingUpdate.MaxSurge, intstr.FromString("25%"))
require.Equal(t, *deploymentObj.Spec.Strategy.RollingUpdate.MaxUnavailable, intstr.FromString("25%"))
require.Equal(t, *deploymentObj.Spec.Strategy.RollingUpdate.MaxSurge, intstr.FromInt(1))
require.Equal(t, *deploymentObj.Spec.Strategy.RollingUpdate.MaxUnavailable, intstr.FromInt(0))
require.Equal(t, deploymentObj.Spec.Strategy.Type, appsv1.DeploymentStrategyType("RollingUpdate"))
require.Equal(t, expectedDeployment.name, deploymentObj.Spec.Template.Labels["app"])
require.NotEmpty(t, deploymentObj.Spec.Template.Annotations["config-check"])
Expand Down Expand Up @@ -143,8 +138,8 @@ func VerifyDeployment(t *testing.T, pod *k8score.PodSpec, expectedSpec pegaDeplo
require.Equal(t, pod.Containers[0].Env[envIndex].Value, "")
envIndex++
require.Equal(t, pod.Containers[0].Env[envIndex].Name, "CATALINA_OPTS")
require.Equal(t, pod.Containers[0].Env[envIndex].Value, "")
envIndex++
require.Equal(t, pod.Containers[0].Env[envIndex].Value, "")
envIndex++
require.Equal(t, pod.Containers[0].Env[envIndex].Name, "INITIAL_HEAP")
require.Equal(t, pod.Containers[0].Env[envIndex].Value, "4096m")
envIndex++
Expand All @@ -159,24 +154,35 @@ func VerifyDeployment(t *testing.T, pod *k8score.PodSpec, expectedSpec pegaDeplo
require.Equal(t, pod.Containers[0].VolumeMounts[0].Name, "pega-volume-config")
require.Equal(t, pod.Containers[0].VolumeMounts[0].MountPath, "/opt/pega/config")

require.Equal(t, pod.Containers[0].LivenessProbe.InitialDelaySeconds, int32(300))
//If these tests start failing, check if the K8s version has passed 1.18. If so,
//update the values to the 1.18 versions and also enable the StartupProbe test.
require.Equal(t, pod.Containers[0].LivenessProbe.InitialDelaySeconds, int32(200))
require.Equal(t, pod.Containers[0].LivenessProbe.TimeoutSeconds, int32(20))
require.Equal(t, pod.Containers[0].LivenessProbe.PeriodSeconds, int32(30))
require.Equal(t, pod.Containers[0].LivenessProbe.SuccessThreshold, int32(1))
require.Equal(t, pod.Containers[0].LivenessProbe.FailureThreshold, int32(3))
require.Equal(t, pod.Containers[0].LivenessProbe.HTTPGet.Path, "/prweb/PRRestService/monitor/pingService/ping")
require.Equal(t, pod.Containers[0].LivenessProbe.HTTPGet.Port, intstr.FromInt(8080))
require.Equal(t, pod.Containers[0].LivenessProbe.HTTPGet.Port, intstr.FromInt(8081))
require.Equal(t, pod.Containers[0].LivenessProbe.HTTPGet.Scheme, k8score.URIScheme("HTTP"))

require.Equal(t, pod.Containers[0].ReadinessProbe.InitialDelaySeconds, int32(300))
require.Equal(t, pod.Containers[0].ReadinessProbe.TimeoutSeconds, int32(20))
require.Equal(t, pod.Containers[0].ReadinessProbe.PeriodSeconds, int32(30))
require.Equal(t, pod.Containers[0].ReadinessProbe.SuccessThreshold, int32(1))
require.Equal(t, pod.Containers[0].ReadinessProbe.FailureThreshold, int32(3))
require.Equal(t, pod.Containers[0].ReadinessProbe.InitialDelaySeconds, int32(30))
require.Equal(t, pod.Containers[0].ReadinessProbe.TimeoutSeconds, int32(10))
require.Equal(t, pod.Containers[0].ReadinessProbe.PeriodSeconds, int32(10))
require.Equal(t, pod.Containers[0].ReadinessProbe.SuccessThreshold, int32(2))
require.Equal(t, pod.Containers[0].ReadinessProbe.FailureThreshold, int32(6))
require.Equal(t, pod.Containers[0].ReadinessProbe.HTTPGet.Path, "/prweb/PRRestService/monitor/pingService/ping")
require.Equal(t, pod.Containers[0].ReadinessProbe.HTTPGet.Port, intstr.FromInt(8080))
require.Equal(t, pod.Containers[0].ReadinessProbe.HTTPGet.Scheme, k8score.URIScheme("HTTP"))

//require.Equal(t, pod.Containers[0].StartupProbe.InitialDelaySeconds, int32(10))
//require.Equal(t, pod.Containers[0].StartupProbe.TimeoutSeconds, int32(10))
//require.Equal(t, pod.Containers[0].StartupProbe.PeriodSeconds, int32(10))
//require.Equal(t, pod.Containers[0].StartupProbe.SuccessThreshold, int32(1))
//require.Equal(t, pod.Containers[0].StartupProbe.FailureThreshold, int32(20))
//require.Equal(t, pod.Containers[0].StartupProbe.HTTPGet.Path, "/prweb/PRRestService/monitor/pingService/ping")
//require.Equal(t, pod.Containers[0].StartupProbe.HTTPGet.Port, intstr.FromInt(8080))
//require.Equal(t, pod.Containers[0].StartupProbe.HTTPGet.Scheme, k8score.URIScheme("HTTP"))

require.Equal(t, pod.ImagePullSecrets[0].Name, "pega-registry-secret")
require.Equal(t, pod.RestartPolicy, k8score.RestartPolicy("Always"))
require.Equal(t, *pod.TerminationGracePeriodSeconds, int64(300))
Expand Down