Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency for kube-rbac-proxy and add secure-metrics feature #3833

Merged
merged 12 commits into from
Mar 8, 2024
4 changes: 1 addition & 3 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,6 @@ vars:
CROSSPLANE_OUTPUT:
sh: 'realpath hack/crossplane/config'

KUBE_RBAC_PROXY: gcr.io/kubebuilder/kube-rbac-proxy

# how long to let tests against live resources run for
LIVE_TEST_TIMEOUT: 3h

Expand Down Expand Up @@ -694,7 +692,7 @@ tasks:
deps:
- controller:generate-kustomize
cmds:
- "{{.SCRIPTS_ROOT}}/generate-helm-manifest.sh {{.KUBE_RBAC_PROXY}} {{.LOCAL_REGISTRY_CONTROLLER_DOCKER_IMAGE}} {{.PUBLIC_REGISTRY}} {{.LATEST_VERSION_TAG}} `pwd`/"
- "{{.SCRIPTS_ROOT}}/generate-helm-manifest.sh {{.LOCAL_REGISTRY_CONTROLLER_DOCKER_IMAGE}} {{.PUBLIC_REGISTRY}} {{.LATEST_VERSION_TAG}} `pwd`/"

controller:install-helm:
desc: Generate and install helm chart on cluster
Expand Down
79 changes: 74 additions & 5 deletions docs/hugo/content/guide/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,21 @@ The metrics exposed fall into two groups: Azure based metrics, and reconciler me

## Toggling the metrics

By default, metrics for ASOv2 are turned on and can be toggled by the following options:
By default, secure metrics for ASOv2 are turned on and can be toggled by the following options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they really turned on, given the option below says default: false ?

matthchr marked this conversation as resolved.
Show resolved Hide resolved

- ### ASOv2 Helm Chart
### ASOv2 Helm Chart

While installing the Helm chart, we can turn the metrics _**on**_ and _**off**_ and set the metrics expose address using the
below settings. Also, we can change the settings inside `values.yaml` file for ASOv2 Helm chart.

```
--set metrics.enable=true/false (default: true)
--set metrics.address=0.0.0.0:8080 (default)
--set metrics.secure=true/false (default: true)
--set metrics.profiling=true/false (default: false)
--set metrics.address=0.0.0.0:8443 (default)
```

- ### Deployment YAML
### Deployment YAML

In the deployment yaml, we can turn _**off**_ the metrics by omitting the `metrics-addr` flag. We can also change to use
a different metrics-addr by changing the default value of that same flag.
Expand All @@ -29,8 +31,75 @@ By default, metrics for ASOv2 are turned on and can be toggled by the following
containers:
- args:
- --metrics-addr=0.0.0.0:8080 (default)
- --secure-metrics=true/false (default: true)
- --profiling-metrics=true/false (default: false)
```


## Scraping Metrics Securely via HTTPs using RBAC

A ServiceAccount token is required to scrape metrics securely. The corresponding ServiceAccount needs permissions on the "/metrics" and "debug/pprof" paths.
This can be achieved e.g. by following the [Kubernetes documentation](https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/).

Follow the steps below to scrape metrics securely.

### ASOv2 Helm Chart
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor: Could use one of the table-selector thingies here, like we do for Windows versus Linux in the installation instructions?

```
--set metrics.enable=true
--set metrics.secure=true
--set metrics.profiling=true
--set metrics.address=0.0.0.0:8443
```

### Deployment YAML
```
spec:
containers:
- args:
- --metrics-addr=0.0.0.0:8443
- --secure-metrics=true
- --profiling-metrics=true
```

Deploy the following RBAC configuration. This creates a role that can scrape metrics.
```
cat << EOT | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: default-metrics
rules:
- nonResourceURLs:
- "/metrics"
- "/debug/pprof/*"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: default-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: default-metrics
subjects:
- kind: ServiceAccount
name: default
namespace: default
EOT
```
Test locally:
- Open a port-forward

```
kubectl port-forward deployments/azureserviceoperator-controller-manager -n azureserviceoperator-system 8443
```
- Create a ServiceAccount token and scrape metrics
```
TOKEN=$(kubectl create token default)
curl https://localhost:8443/metrics --header "Authorization: Bearer $TOKEN" -k
```

## Understanding the ASOv2 Metrics

| Metric | Description | Label 1 | Label 2 | Label 3 |
Expand Down
23 changes: 11 additions & 12 deletions scripts/v2/generate-helm-manifest.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@

set -e

KUBE_RBAC_PROXY=$1
LOCAL_REGISTRY_CONTROLLER_DOCKER_IMAGE=$2
PUBLIC_REGISTRY=$3
VERSION=$4
DIR=$5
LOCAL_REGISTRY_CONTROLLER_DOCKER_IMAGE=$1
PUBLIC_REGISTRY=$2
VERSION=$3
DIR=$4

ASO_CHART="$DIR"charts/azure-service-operator
GEN_FILES_DIR="$ASO_CHART"/templates/generated
Expand Down Expand Up @@ -43,8 +42,6 @@ rm "$GEN_FILES_DIR"/*_namespace_* # remove namespace as we will let Helm manage
sed -i "s/\(version: \)\(.*\)/\1${VERSION//v}/g" "$ASO_CHART"/Chart.yaml # find version key and update the value with the current version

# Deployment replacements
grep -E $KUBE_RBAC_PROXY "$GEN_FILES_DIR"/*_deployment_* > /dev/null # Ensure that what we're about to try to replace actually exists (if it doesn't we want to fail)
sed -i "s@$KUBE_RBAC_PROXY.*@{{.Values.image.kubeRBACProxy}}@g" "$GEN_FILES_DIR"/*_deployment_*
sed -i "s@$LOCAL_REGISTRY_CONTROLLER_DOCKER_IMAGE@{{.Values.image.repository}}@g" "$GEN_FILES_DIR"/*_deployment_* # Replace hardcoded ASO image
# Perl multiline replacements - using this because it's tricky to do these sorts of multiline replacements with sed
perl -0777 -i -pe 's/(template:\n.*metadata:\n.*annotations:\n(\s*))/$1\{\{- if .Values.podAnnotations \}\}\n$2\{\{ toYaml .Values.podAnnotations \}\}\n$2\{\{- end \}\}\n$2/igs' "$GEN_FILES_DIR"/*_deployment_* # Add pod annotations
Expand All @@ -54,11 +51,13 @@ perl -0777 -i -pe 's/(spec:\n.*template:\n.*spec:\n(\s*))/$1\{\{- with .Values.a
perl -0777 -i -pe 's/(spec:\n.*template:\n.*spec:\n(\s*))/$1\{\{- with .Values.tolerations \}\}\n$2tolerations:\n$2\{\{- toYaml . | nindent 8 \}\}\n$2\{\{- end \}\}\n$2/igs' "$GEN_FILES_DIR"/*_deployment_* # Add pod annotations

# Metrics Configuration
flow_control "metrics-addr" "metrics-addr" "{{- if .Values.metrics.enable}}" "$GEN_FILES_DIR"/*_deployment_*
flow_control "metrics-addr" "profiling-metrics" "{{- if .Values.metrics.enable}}" "$GEN_FILES_DIR"/*_deployment_*
sed -i "1,/secure-metrics=.*/s/\(secure-metrics=\)\(.*\)/\1{{ .Values.metrics.secure }}/g" "$GEN_FILES_DIR"/*_deployment_*
sed -i "1,/profiling-metrics=.*/s/\(profiling-metrics=\)\(.*\)/\1{{ .Values.metrics.profiling }}/g" "$GEN_FILES_DIR"/*_deployment_*
sed -i "1,/metrics-addr=.*/s/\(metrics-addr=\)\(.*\)/\1{{ tpl .Values.metrics.address . }}/g" "$GEN_FILES_DIR"/*_deployment_*
sed -i 's/containerPort: 8080/containerPort: {{ .Values.metrics.port | default 8080 }}/g' "$GEN_FILES_DIR"/*_deployment_*
sed -i 's/containerPort: 8080/containerPort: {{ .Values.metrics.port | default 8443 }}/g' "$GEN_FILES_DIR"/*_deployment_*
sed -i '1 i {{- if .Values.metrics.enable -}}' "$GEN_FILES_DIR"/*controller-manager-metrics-service*
sed -i 's/port: 8080/port: {{ .Values.metrics.port | default 8080 }}/g' "$GEN_FILES_DIR"/*controller-manager-metrics-service*
sed -i 's/port: 8080/port: {{ .Values.metrics.port | default 8443 }}/g' "$GEN_FILES_DIR"/*controller-manager-metrics-service*
sed -i -e '$a{{- end }}' "$GEN_FILES_DIR"/*controller-manager-metrics-service*
find "$GEN_FILES_DIR" -type f -exec sed -i 's/azureserviceoperator-system/{{ .Release.Namespace }}/g' {} \;

Expand Down Expand Up @@ -87,8 +86,8 @@ flow_control "aadpodidbinding" "aadpodidbinding" "$IF_TENANT" "$GEN_FILES_DIR"/*

flow_control "--enable-leader-election" "--enable-leader-election" "$IF_TENANT" "$GEN_FILES_DIR"/*_deployment_*

# TODO: This bit is tricky to exclude kube-rbac-proxy and webhook stuff.
flow_control "mountPath: \/tmp\/k8s-webhook-server\/serving-certs" "name: https" "$IF_CLUSTER" "$GEN_FILES_DIR"/*_deployment_*
sed -i "/mountPath: \/tmp\/k8s-webhook-server\/serving-certs/i \ \ $IF_CLUSTER" "$GEN_FILES_DIR"/*_deployment_*
sed -i "/nodeSelector:/i \ \ {{- end }}" "$GEN_FILES_DIR"/*_deployment_*
flow_control "- name: cert" "secretName" "$IF_CLUSTER" "$GEN_FILES_DIR"/*_deployment_*
flow_control "--webhook-cert-dir=" "--webhook-cert-dir=" "$IF_CLUSTER" "$GEN_FILES_DIR"/*_deployment_*
sed -i 's/\/tmp\/k8s-webhook-server\/serving-certs/{{ .Values.webhook.certDir }}/g' "$GEN_FILES_DIR"/*_deployment_*
Expand Down
9 changes: 7 additions & 2 deletions v2/charts/azure-service-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,19 @@ webhook:
# repository).
image:
repository: mcr.microsoft.com/k8s/azureserviceoperator:v2.6.0
kubeRBACProxy: gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1

# 'metrics' define settings for the metrics from controller.
# 'address' field defines the metrics binding address on which metrics
metrics:
enable: true
# secure controls whether metrics should be served via 'http' or 'https'.
# Flagging secure as 'true' would use https
# Refer to https://azure.github.io/azure-service-operator/guide/metrics/ for more information
secure: true
# profiling endpoints are only enabled when serving metrics securely
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Say what profiling is (it enables /pprof endpoints), in addition to its restrictions. Though, see below about this restriction because I am not sure it is needed?

profiling: false
address: 0.0.0.0:{{ .Values.metrics.port }}
port: 8080
port: 8443

# installCRDs configures if the operator attempts to install and manage the CRDs associated with ASO.
# If the operator does not install and manage the CRDs on its own, you must manually install the appropriate
Expand Down
14 changes: 12 additions & 2 deletions v2/cmd/controller/app/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,23 @@ import (

type Flags struct {
MetricsAddr string
ProfilingMetrics bool
SecureMetrics bool
HealthAddr string
WebhookPort int
WebhookCertDir string
EnableLeaderElection bool
CRDManagementMode string
CRDPatterns string // This is a ; delimited string containing a collection of patterns
CRDPatterns string // This is a ';' delimited string containing a collection of patterns
PreUpgradeCheck bool
}

func (f Flags) String() string {
return fmt.Sprintf(
"MetricsAddr: %s, HealthAddr: %s, WebhookPort: %d, WebhookCertDir: %s, EnableLeaderElection: %t, CRDManagementMode: %s, CRDPatterns: %s, PreUpgradeCheck: %t",
"MetricsAddr: %s, SecureMetrics: %t, ProfilingMetrics: %t, HealthAddr: %s, WebhookPort: %d, WebhookCertDir: %s, EnableLeaderElection: %t, CRDManagementMode: %s, CRDPatterns: %s, PreUpgradeCheck: %t",
f.MetricsAddr,
f.SecureMetrics,
f.ProfilingMetrics,
f.HealthAddr,
f.WebhookPort,
f.WebhookCertDir,
Expand All @@ -44,6 +48,8 @@ func ParseFlags(args []string) (Flags, error) {
klog.InitFlags(flagSet)

var metricsAddr string
var profilingMetrics bool
var secureMetrics bool
var healthAddr string
var webhookPort int
var webhookCertDir string
Expand All @@ -54,6 +60,9 @@ func ParseFlags(args []string) (Flags, error) {

// default here for 'MetricsAddr' is set to "0", which sets metrics to be disabled if 'metrics-addr' flag is omitted.
flagSet.StringVar(&metricsAddr, "metrics-addr", "0", "The address the metric endpoint binds to.")
flagSet.BoolVar(&secureMetrics, "secure-metrics", true, "Enable secure metrics. This secures the pprof and metrics endpoints via Kubernetes RBAC and HTTPS")
flagSet.BoolVar(&profilingMetrics, "profiling-metrics", true, "Enable pprof metrics, only enabled in conjunction with secure-metrics. This will enable serving pprof metrics endpoints")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should probably be false


flagSet.StringVar(&healthAddr, "health-addr", "", "The address the healthz endpoint binds to.")
flagSet.IntVar(&webhookPort, "webhook-port", 9443, "The port the webhook endpoint binds to.")
flagSet.StringVar(&webhookCertDir, "webhook-cert-dir", "", "The directory the webhook server's certs are stored.")
Expand All @@ -69,6 +78,7 @@ func ParseFlags(args []string) (Flags, error) {

return Flags{
MetricsAddr: metricsAddr,
SecureMetrics: secureMetrics,
HealthAddr: healthAddr,
WebhookPort: webhookPort,
WebhookCertDir: webhookCertDir,
Expand Down
38 changes: 35 additions & 3 deletions v2/cmd/controller/app/setup.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ import (
"context"
"fmt"
"math/rand"
"net/http"
"net/http/pprof"
"os"
"regexp"
"time"
Expand All @@ -31,6 +33,7 @@ import (
"sigs.k8s.io/controller-runtime/pkg/controller"
"sigs.k8s.io/controller-runtime/pkg/healthz"
"sigs.k8s.io/controller-runtime/pkg/manager"
"sigs.k8s.io/controller-runtime/pkg/metrics/filters"
"sigs.k8s.io/controller-runtime/pkg/metrics/server"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
"sigs.k8s.io/controller-runtime/pkg/webhook"
Expand Down Expand Up @@ -131,9 +134,7 @@ func SetupControllerManager(ctx context.Context, setupLog logr.Logger, flgs Flag
LeaderElection: flgs.EnableLeaderElection,
LeaderElectionID: "controllers-leader-election-azinfra-generated",
HealthProbeBindAddress: flgs.HealthAddr,
Metrics: server.Options{
BindAddress: flgs.MetricsAddr,
},
Metrics: getMetricsOpts(flgs),
WebhookServer: webhook.NewServer(webhook.Options{
Port: flgs.WebhookPort,
CertDir: flgs.WebhookCertDir,
Expand Down Expand Up @@ -253,6 +254,37 @@ func SetupControllerManager(ctx context.Context, setupLog logr.Logger, flgs Flag
return mgr
}

func getMetricsOpts(flags Flags) server.Options {
var metricsOptions server.Options

if !flags.SecureMetrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't seem right, because it doesn't return here, it falls through and then overwrites the metricsOptions below?

I think you want something like:

var metricsOptions
if secure {
    // secure
} else {
    // insecure
}

if profiling {
    // profiling
}

return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if you want to only enable profiling if secure, put that bit into the secure bit, and/or (maybe better) add a check that secure is not off with profiling on and if so return an error and crash the pod.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch. I missed return there. I'll update

metricsOptions = server.Options{
BindAddress: flags.MetricsAddr,
}
}

var pprofHandlers map[string]http.Handler
if flags.ProfilingMetrics {
pprofHandlers = map[string]http.Handler{
"/debug/pprof/": http.HandlerFunc(pprof.Index),
"/debug/pprof/cmdline": http.HandlerFunc(pprof.Cmdline),
"/debug/pprof/profile": http.HandlerFunc(pprof.Profile),
"/debug/pprof/symbol": http.HandlerFunc(pprof.Symbol),
"/debug/pprof/trace": http.HandlerFunc(pprof.Trace),
}
}

metricsOptions = server.Options{
BindAddress: flags.MetricsAddr,
SecureServing: true,
FilterProvider: filters.WithAuthenticationAndAuthorization,
// Note that pprof endpoints are meant to be sensitive and shouldn't be exposed publicly.
ExtraHandlers: pprofHandlers,
}

return metricsOptions
}

func getDefaultAzureCredential(cfg config.Values, setupLog logr.Logger) (*identity.Credential, error) {
tokenCred, err := getDefaultAzureTokenCredential(cfg, setupLog)
if err != nil {
Expand Down
1 change: 0 additions & 1 deletion v2/config/manager/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,5 @@ resources:
- manager_metrics_service.yaml

patchesStrategicMerge:
- manager_auth_proxy_patch.yaml
- manager_image_patch.yaml
- manager_pull_policy.yaml
6 changes: 4 additions & 2 deletions v2/config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ spec:
serviceAccountName: default
containers:
- args:
- --metrics-addr=:8080
- --metrics-addr=:8443
- --secure-metrics=true
- --profiling-metrics=false
- --health-addr=:8081
- --enable-leader-election
- --v=2
Expand All @@ -47,7 +49,7 @@ spec:
- containerPort: 8081
name: health-port
protocol: TCP
- containerPort: 8080
- containerPort: 8443
name: metrics-port
protocol: TCP
livenessProbe:
Expand Down
24 changes: 0 additions & 24 deletions v2/config/manager/manager_auth_proxy_patch.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion v2/config/manager/manager_metrics_service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ metadata:
spec:
ports:
- name: metrics
port: 8080
port: 8443
matthchr marked this conversation as resolved.
Show resolved Hide resolved
selector:
control-plane: controller-manager
13 changes: 0 additions & 13 deletions v2/config/rbac/auth_proxy_role.yaml

This file was deleted.

Loading
Loading