Bring Temporal Cloud Metrics into your Kubernetes cluster to inform autoscaling of your workers.
This project is essentially just a proxy server. Kubernetes makes an HTTP call which is handled by this service which in turn pulls metrics from Temporal Cloud, converts them to the format that Kubernetes expects, and returns them to k8s.
Kubernetes will poll our service for metrics which become available to HPA's living in the same Kubernetes namespace.
- A Temporal Cloud account
- A Kubernetes compliant cluster (also tested on K3s and minikube)
- The Helm CLI
We need the client mTLS certificate for our Temporal Cloud namespace so that we can load it into our cluster for use in the metrics adapter and worker.
- Copy the certificate into
./certs/client.crt
- Copy the key into
./certs/client.key
A YAML config file is used to define the connection parameters and the specific metrics you'd like to pull into Kubernetes from Temporal Cloud.
There is an example configuration in ./sample-config.yaml
. Copy it to config.yaml
and and make your changes to it. The Helm chart will use this path by default.
Considerations
Autoscaling in Kubernetes is triggered when a target metric value increases beyond a designated threshold, such as CPU usage, memory usage, or request count. Therefore, it is important that the metrics we calculate are positive numbers that increase when the system is under some kind of stress.
The queries in the included example configuration were derived from queries associated with Temporal best practices, but they have been modified to align with these requirements. Let's see an example.
Before
sum by(temporal_namespace) (
rate(
temporal_cloud_v0_poll_success_sync_count{}[1m]
)
)
/
sum by(temporal_namespace) (
rate(
temporal_cloud_v0_poll_success_count{}[1m]
)
)
After
We've made two important changes here: (1) we've swapped the places of the two underlying metrics to invert the resulting number so it will now be positive and increase as the Sync Match Rate falls, and (2) we default the resulting value to 1
in the event no data points are available within the specified time window.
The result is a decimal that starts at 1
when there is a perfect Sync Match Rate and rises as the Sync Match Rate is declines.
(
sum by(temporal_namespace) (
rate(
temporal_cloud_v0_poll_success_count{
temporal_namespace="bitovi.x72yu"
}[1m]
)
)
/
sum by(temporal_namespace) (
rate(
temporal_cloud_v0_poll_success_sync_count{
temporal_namespace="bitovi.x72yu"
}[1m]
)
)
)
unless
(
sum by(temporal_namespace) (
rate(
temporal_cloud_v0_poll_success_sync_count{
temporal_namespace="bitovi.x72yu"
}[1m]
)
) == 0
)
or label_replace(vector(1), "temporal_namespace", "bitovi.x72yu", "", "")
The HPA (Horizontal Pod Autoscaler) defines the desired scaling behavior and bounds, and manages our deployment replicas accordingly.
There is a complete example HPA in ./chart/templates/hpa.yaml
. You may use it as it or adjust it to fit your needs before installing the helm chart.
Install with Existing worker
This allows you to setup autoscaling on an existing deployment.
helm install temporal-cloud-metrics-adapter ./chart --wait \
--namespace staging \
--set-file=temporal.tls.cert=certs/client.crt \
--set-file=temporal.tls.key=certs/client.key \
--set-file=adapter.config=config.yaml \
--set temporal.namespace=xyz.123 \
--set worker.deployment=temporal-workers
Install with Demo worker
This is intended for testing and demos and should never been used in a production environment.
helm install temporal-cloud-metrics-adapter ./chart --wait \
--namespace staging --create-namespace \
--set-file=temporal.tls.cert=certs/client.crt \
--set-file=temporal.tls.key=certs/client.key \
--set-file=adapter.config=config.yaml \
--set temporal.namespace=xyz.123 \
--set temporal.address=xyz.123.tmprl.cloud:7233 \
--set worker.demo=true
Uninstall
helm uninstall -n staging temporal-cloud-metrics-adapter
Helm Values
Option | Type | Example Value | Description |
---|---|---|---|
temporal.tls.cert | File | certs/client.crt |
Path to the client certificate file |
temporal.tls.key | File | certs/client.key |
Path to the client key file |
temporal.namespace | String | xyz.123 |
The target Temporal Cloud namespace |
temporal.address | String | xyz.123.tmprl.cloud:7233 |
Address of the Temporal Cloud instance |
adapter.config | String | ./config.yaml |
The file path for the configuration for the adapter |
worker.deployment | String | temporal-worker |
Name of existing Temporal worker deployment |
worker.demo | Boolean | true or false |
Flag to determine whether to deploy a demo worker |
This repo includes a script to create a burst of workflows to simulate load.
# Startup 50 demo workflows
TEMPORAL_ADDRESS=xyz.123.tmprl.cloud:7233 \
TEMPORAL_NAMESPACE=xyz.123 \
./scripts/execute-demo-workflows 50
Temporal Cloud metrics do not include labels that indicate which Workflow they are associated with. Depending on your architecture, you might need to divide your workers across unique namespaces to obtain metrics for specific Workflows.
HPA Polling Interval
By default, the HorizontalPodAutoscaler
fetches metrics every 15 seconds. This can be configured by setting the --horizontal-pod-autoscaler-sync-period
on the kube controller.
Note: The --horizontal-pod-autoscaler-sync-period
is not currently supported in K3s.
Adjust Metrics Time Window
You can also adjust the timescale used in the query for the Temporal Cloud metrics. To do this, change the time window specified in the queries in the adapter configuration file.
Currently, the time window is set to 1m
(1 minute). This can be reduced to slightly improve the responsiveness of the scaling behavior. Be cautious about going below 45s
(45 seconds) for systems with relatively low throughput, as it can result in dead zones in the resulting metrics.
Adjust HPA Behavior
You can adjust the how quickly the cluster scales up and down our workers.
metrics:
- type: External
external:
metric:
# The name of the metrics to watch
name: temporal_cloud_sync_match_rate
selector:
matchLabels:
# Match a particular Temporal Cloud namespace
temporal_namespace: xyz.123
target:
type: Value
# Scale up when the target metric exceeds 1500 milli values (1.5)
value: 1500m
behavior:
scaleUp:
# The highest value in the last 10 seconds will be used to determine the need to scale up
stabilizationWindowSeconds: 10
selectPolicy: Max
policies:
# Scale up by 5 pods every 10 seconds whole the target threshold is exceeded
- type: Pods
value: 5
periodSeconds: 10
scaleDown:
# The highest value in the last 60 seconds will be used to determine the need to scale down
stabilizationWindowSeconds: 60
selectPolicy: Max
policies:
# Scale up by 5 pods every 10 seconds whole the target threshold is achieved
- type: Pods
value: 3
periodSeconds: 30
You can find a complete example in this manifest. For more detailed information on the HorizontalPodAutoscaler, refer to the official HPA documentation.
In some use cases, you might want your application to scale completely down to zero. This can be achieved by configuring the HorizontalPodAutoscaler
.
To scale to zero, set minReplicas
to 0
. The cluster will then scale down to zero when the targeted metrics fall below the defined threshold.
Note: Scaling to zero may cause a delay in processing new tasks, as it can take time for metrics to propagate to the cluster.