Skip to content

Commit

Permalink
feat: Instrument Feast using Prometheus and OpenTelemetry (feast-dev#…
Browse files Browse the repository at this point in the history
…4366)

feat: instrument feature store

This commit adds opentelemetry to monitor Feast

Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
  • Loading branch information
tsisodia10 authored Aug 6, 2024
1 parent 8eceff2 commit a571e08
Show file tree
Hide file tree
Showing 26 changed files with 928 additions and 75 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
[![GitHub Release](https://img.shields.io/github/v/release/feast-dev/feast.svg?style=flat&sort=semver&color=blue)](https://github.com/feast-dev/feast/releases)

## Join us on Slack!

👋👋👋 [Come say hi on Slack!](https://join.slack.com/t/feastopensource/signup)

## Overview
Expand Down Expand Up @@ -231,4 +230,4 @@ Thanks goes to these incredible people:

<a href="https://github.com/feast-dev/feast/graphs/contributors">
<img src="https://contrib.rocks/image?repo=feast-dev/feast" />
</a>
</a>
4 changes: 4 additions & 0 deletions infra/charts/feast-feature-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,12 @@ See [here](https://github.com/feast-dev/feast/tree/master/examples/python-helm-d
| imagePullSecrets | list | `[]` | |
| livenessProbe.initialDelaySeconds | int | `30` | |
| livenessProbe.periodSeconds | int | `30` | |
| metrics.enabled | bool | `false` | |
| metrics.otelCollector.endpoint | string | `""` | |
| metrics.otelCollector.port | int | `4317` | |
| nameOverride | string | `""` | |
| nodeSelector | object | `{}` | |
| otel_service.name | string | `"otelcol"` | |
| podAnnotations | object | `{}` | |
| podSecurityContext | object | `{}` | |
| readinessProbe.initialDelaySeconds | int | `20` | |
Expand Down
108 changes: 108 additions & 0 deletions infra/charts/feast-feature-server/opentelemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
## Adding Monitoring
To add monitoring to the Feast Feature Server, follow these steps:

### Workflow

Feast instrumentation Using OpenTelemetry and Prometheus -
![Workflow](samples/workflow.png)

### Deploy Prometheus Operator
Follow the Prometheus Operator documentation to install the operator -
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md

### Deploy OpenTelemetry Operator
Before installing OTEL Operator, install `cert-manager` and validate the `pods` should spin up --
```
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
```

Follow the documentation for further installation steps -
https://github.com/open-telemetry/opentelemetry-operator

### Configure OpenTelemetry Collector
Add the OpenTelemetry Collector configuration under the metrics section in your values.yaml file.

Example values.yaml:

```
metrics:
enabled: true
otelCollector:
endpoint: "otel-collector.default.svc.cluster.local:4317" #sample
headers:
api-key: "your-api-key"
```

### Add instrumentation annotation and environment variables in the deployment.yaml

```
template:
metadata:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
instrumentation.opentelemetry.io/inject-python: "true"
```

```
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://{{ .Values.service.name }}-collector.{{ .Release.namespace }}.svc.cluster.local:{{ .Values.metrics.endpoint.port}}
- name: OTEL_EXPORTER_OTLP_INSECURE
value: "true"
```

### Add checks
Add metric checks to all manifests and deployment file -

```
{{ if .Values.metrics.enabled }}
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: feast-instrumentation
spec:
exporter:
endpoint: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318 # This is the default port for the OpenTelemetry Collector
env:
propagators:
- tracecontext
- baggage
python:
env:
- name: OTEL_METRICS_EXPORTER
value: console,otlp_proto_http
- name: OTEL_LOGS_EXPORTER
value: otlp_proto_http
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: "true"
{{end}}
```

### Add manifests to the chart
Add Instrumentation, OpenTelemetryCollector, ServiceMonitors, Prometheus Instance and RBAC rules as provided in the [samples/](https://github.com/feast-dev/feast/tree/91540703c483f1cd03b534a1a45bc4ccdcf79f81/infra/charts/feast-feature-server/samples) directory.

For latest updates please refer the official repository - https://github.com/open-telemetry/opentelemetry-operator

### Deploy Feast
Deploy Feast and set `metrics` value to `true`.

Example -
```
helm install feast-release infra/charts/feast-feature-server --set metric=true --set feature_store_yaml_base64=""
```

## See logs
Once the opentelemetry is deployed, you can search the logs to see the required metrics -

```
oc logs otelcol-collector-0 | grep "Name: feast_feature_server_memory_usage\|Value: 0.*"
oc logs otelcol-collector-0 | grep "Name: feast_feature_server_cpu_usage\|Value: 0.*"
```
```
-> Name: feast_feature_server_memory_usage
Value: 0.579426
```
```
-> Name: feast_feature_server_cpu_usage
Value: 0.000000
```
19 changes: 19 additions & 0 deletions infra/charts/feast-feature-server/samples/instrumentation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: feast-instrumentation
spec:
exporter:
endpoint: <endpoint> # eg: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318
env:
propagators:
- tracecontext
- baggage
python:
env:
- name: OTEL_METRICS_EXPORTER
value: console,otlp_proto_http
- name: OTEL_LOGS_EXPORTER
value: otlp_proto_http
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: "true"
53 changes: 53 additions & 0 deletions infra/charts/feast-feature-server/samples/otel-collector.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# API reference https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md
# Refs for v1beta1 config: https://github.com/open-telemetry/opentelemetry-operator/issues/3011#issuecomment-2154118998
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otelcol
spec:
mode: statefulset
image: otel/opentelemetry-collector-contrib:0.102.1
targetAllocator:
enabled: true
serviceAccount: opentelemetry-targetallocator-sa
prometheusCR:
enabled: true
podMonitorSelector: {}
serviceMonitorSelector: {}
## If uncommented, only service monitors with this label will get picked up
# app: feast
config:
receivers:
otlp:
protocols:
grpc: {}
http: {}
prometheus:
config:
scrape_configs:
- job_name: 'otelcol-collector'
scrape_interval: 10s
static_configs:
- targets: [ '0.0.0.0:8888' ]

processors:
batch: {}

exporters:
logging:
verbosity: detailed

service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [logging]
metrics:
receivers: [otlp, prometheus]
processors: []
exporters: [logging]
logs:
receivers: [otlp]
processors: [batch]
exporters: [logging]
16 changes: 16 additions & 0 deletions infra/charts/feast-feature-server/samples/otel-sm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: feast
name: otel-sm-1
spec:
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- <namespace> # helm value - {{ .Release.Namespace }}
selector:
matchLabels:
app.kubernetes.io/component: opentelemetry-collector
app.kubernetes.io/managed-by: opentelemetry-operator
15 changes: 15 additions & 0 deletions infra/charts/feast-feature-server/samples/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
kind: Prometheus
metadata:
name: prometheus
spec:
evaluationInterval: 30s
podMonitorSelector:
matchLabels:
app: feast
portName: web
replicas: 1
scrapeInterval: 30s
serviceAccountName: prometheus-k8s
serviceMonitorSelector:
matchLabels:
app: feast
68 changes: 68 additions & 0 deletions infra/charts/feast-feature-server/samples/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: opentelemetry-targetallocator-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: opentelemetry-targetallocator-role-1
annotations:
meta.helm.sh/release-name: "feast-release"
meta.helm.sh/release-namespace: "feast-val"
labels:
app.kubernetes.io/managed-by: "Helm"
rules:
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
- podmonitors
verbs:
- '*'
- apiGroups: [""]
resources:
- namespaces
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs: ["get", "list", "watch"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: opentelemetry-targetallocator-rb-1
annotations:
meta.helm.sh/release-name: "feast-release"
meta.helm.sh/release-namespace: "feast-val"
labels:
app.kubernetes.io/managed-by: "Helm"
subjects:
- kind: ServiceAccount
name: opentelemetry-targetallocator-sa
namespace: <namespace> # helm value - {{ .Release.Namespace }}
roleRef:
kind: ClusterRole
name: opentelemetry-targetallocator-role-1
apiGroup: rbac.authorization.k8s.io
16 changes: 16 additions & 0 deletions infra/charts/feast-feature-server/samples/service-monitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: feast
name: otel-sm
spec:
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- <namespace> # helm value - {{ .Release.Namespace }}
selector:
matchLabels:
app.kubernetes.io/component: opentelemetry-collector
app.kubernetes.io/managed-by: opentelemetry-operator
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 12 additions & 1 deletion infra/charts/feast-feature-server/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ spec:
{{- with .Values.podAnnotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- if .Values.metrics.enabled }}
instrumentation.opentelemetry.io/inject-python: "true"
{{- end }}
{{- end }}
labels:
{{- include "feast-feature-server.selectorLabels" . | nindent 8 }}
Expand Down Expand Up @@ -48,10 +51,18 @@ spec:
- "feast"
- "serve_registry"
{{- else }}
{{- if .Values.metrics.enlabled }}
- "feast"
- "serve"
- "--metrics"
- "-h"
- "0.0.0.0"
{{- else }}
- "feast"
- "serve"
- "-h"
- "0.0.0.0"
{{- end }}
{{- end }}
ports:
- name: {{ .Values.feast_mode }}
Expand Down Expand Up @@ -88,4 +99,4 @@ spec:
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
6 changes: 6 additions & 0 deletions infra/charts/feast-feature-server/templates/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,11 @@ spec:
targetPort: {{ .Values.feast_mode }}
protocol: TCP
name: http
{{- if .Values.metrics.enabled }}
- name: metrics
port: 8000
protocol: TCP
targetPort: 8000 # metrics port
{{- end }}
selector:
{{- include "feast-feature-server.selectorLabels" . | nindent 4 }}
6 changes: 6 additions & 0 deletions infra/charts/feast-feature-server/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

metrics:
enabled: false
otelCollector:
endpoint: "" # sample endpoint: "otel-collector.default.svc.cluster.local:4317"
port: 4317

# feature_store_yaml_base64 -- [required] a base64 encoded version of feature_store.yaml
feature_store_yaml_base64: ""

Expand Down
Loading

0 comments on commit a571e08

Please sign in to comment.