Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

feat: Upgrade to Keptn 0.17 #345

Merged
merged 2 commits into from
Jul 18, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .github/workflows/integration-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
strategy:
fail-fast: false
matrix:
keptn-version: ["0.14.2", "0.15.1", "0.16.0"] # https://github.com/keptn/keptn/releases
keptn-version: ["0.14.2", "0.15.1", "0.16.0", "0.17.0"] # https://github.com/keptn/keptn/releases
prometheus-version: ["15.10.1"]
env:
GO_VERSION: 1.17
Expand Down Expand Up @@ -123,6 +123,13 @@ jobs:
with:
KEPTN_VERSION: ${{ matrix.keptn-version }}
HELM_VALUES: |
# Keptn 0.17 and newer
apiGatewayNginx:
type: LoadBalancer
features:
automaticProvisioning:
serviceURL: http://keptn-gitea-provisioner-service.default
# Keptn 0.16 compatibility
control-plane:
apiGatewayNginx:
type: LoadBalancer
Expand Down
104 changes: 67 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
# Prometheus Service

![GitHub release (latest by date)](https://img.shields.io/github/v/release/keptn-contrib/prometheus-service)
[![Go Report Card](https://goreportcard.com/badge/github.com/keptn-contrib/prometheus-service)](https://goreportcard.com/report/github.com/keptn-contrib/prometheus-service)

The *prometheus-service* is a [Keptn](https://keptn.sh) integration responsible for:

1. configuring Prometheus for monitoring services managed by Keptn,
2. receiving alerts (on port 8080) from Prometheus Alertmanager and translating the alert payload to a cloud event (remediation.triggered) that is sent to the Keptn API,
3. retrieving Service Level Indicators (SLIs) from a Prometheus API endpoint.
1. configuring Prometheus for monitoring services managed by Keptn,
2. receiving alerts (on port 8080) from Prometheus Alertmanager and translating the alert payload to a cloud event (
remediation.triggered) that is sent to the Keptn API,
3. retrieving Service Level Indicators (SLIs) from a Prometheus API endpoint.

## Compatibility Matrix

Please always double-check the version of Keptn you are using compared to the version of this service, and follow the compatibility matrix below.
Please always double-check the version of Keptn you are using compared to the version of this service, and follow the
compatibility matrix below.

| Keptn Version\* | [Prometheus Service Image](https://hub.docker.com/r/keptncontrib/prometheus-service/tags) |
|:---------------:|:-----------------------------------------------------------------------------------------:|
Expand All @@ -23,22 +26,27 @@ Please always double-check the version of Keptn you are using compared to the ve
| 0.15.1 | keptncontrib/prometheus-service:0.8.1\*** |
| 0.16.0 | keptncontrib/prometheus-service:0.8.2\*** |
| 0.16.0 | keptncontrib/prometheus-service:0.8.3 |
| 0.17.0 | keptncontrib/prometheus-service:0.8.4 |

\* This is the Keptn version we aim to be compatible with. Other versions should work too, but there is no guarantee.

\** This version is only compatible with Keptn 0.14.2 and potentially newer releases of Keptn 0.14.x due to a breaking change in NATS cluster name.
\** This version is only compatible with Keptn 0.14.2 and potentially newer releases of Keptn 0.14.x due to a breaking
change in NATS cluster name.

\*** These versions are not compatible with Prometheus Alertmanager <= 0.24

You can find more information and older releases on the [Releases](https://github.com/keptn-contrib/prometheus-service/releases) page.
You can find more information and older releases on
the [Releases](https://github.com/keptn-contrib/prometheus-service/releases) page.

## Installation instructions

### Setup Prometheus Monitoring

Keptn does not install or manage Prometheus and its components. Users need to install Prometheus and Prometheus Alert manager as a prerequisite.
Keptn does not install or manage Prometheus and its components. Users need to install Prometheus and Prometheus Alert
manager as a prerequisite.

The easiest way would be to setup Prometheus using helm, e.g.:

```console
kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
Expand All @@ -47,39 +55,43 @@ helm install prometheus prometheus-community/prometheus --namespace monitoring

### Optional: Verify Prometheus in your Kubernetes cluster

* To verify that the Prometheus scrape jobs are correctly set up, you can access Prometheus by enabling port-forwarding for the prometheus-server:
* To verify that the Prometheus scrape jobs are correctly set up, you can access Prometheus by enabling port-forwarding
for the prometheus-server:

```bash
kubectl port-forward svc/prometheus-server 8080:80 -n monitoring
```

Prometheus is then available on [localhost:8080/targets](http://localhost:8080/targets) where you can see the targets for the service.

Prometheus is then available on [localhost:8080/targets](http://localhost:8080/targets) where you can see the targets
for the service.

### Install prometheus-service

Please replace the placeholders in the commands below. Examples are provided.

* `<VERSION>`: prometheus-service version, e.g., `0.8.3`
* `<PROMETHEUS_NS>`: If prometheus is installed in the same Kubernetes cluster, the namespace needs to be provided, e.g., `monitoring`
* `<PROMETHEUS_ENDPOINT>`: Endpoint for prometheus (primarily used for fetching metrics), e.g., `http://prometheus-server.monitoring.svc.cluster.local:80`
* `<ALERT_MANAGER_NS>`: if prometheus alert manager is installed in the same Kubernetes cluster, the namespace needs to be provided, e.g., `monitoring`

* `<PROMETHEUS_NS>`: If prometheus is installed in the same Kubernetes cluster, the namespace needs to be provided,
e.g., `monitoring`
* `<PROMETHEUS_ENDPOINT>`: Endpoint for prometheus (primarily used for fetching metrics),
e.g., `http://prometheus-server.monitoring.svc.cluster.local:80`
* `<ALERT_MANAGER_NS>`: if prometheus alert manager is installed in the same Kubernetes cluster, the namespace needs to
be provided, e.g., `monitoring`

Once this is done, you can go ahead and install prometheus-service:

*Note*: Make sure to replace `<VERSION>` with the version you want to install.

* Install Keptn prometheus-service in Kubernetes using the following command. This will install the prometheus-service into
the `keptn` namespace and will assume that prometheus and the alertmanager are installed in the `monitoring` namespace.
* Install Keptn prometheus-service in Kubernetes using the following command. This will install the prometheus-service
into the `keptn` namespace and will assume that prometheus and the alertmanager are installed in the `monitoring`
namespace.

```bash
helm upgrade --install -n keptn prometheus-service \
https://github.com/keptn-contrib/prometheus-service/releases/download/<VERSION>/prometheus-service-<VERSION>.tgz \
--reuse-values
```

* (Optional) If you want to customize the namespaces of Keptn or the Prometheus installation, replace the environment
* (Optional) If you want to customize the namespaces of Keptn or the Prometheus installation, replace the environment
variable values according to the use case and apply the manifest:

```bash
Expand All @@ -102,7 +114,6 @@ Once this is done, you can go ahead and install prometheus-service:
keptn configure monitoring prometheus --project=sockshop --service=carts
```


### Advanced Options

You can customize prometheus-service with the following environment variables:
Expand Down Expand Up @@ -141,32 +152,42 @@ You can customize prometheus-service with the following environment variables:

Per default, the service works with the following assumptions regarding the setup of the Prometheus instance:

- Each **service** within a **stage** of a **project** has a Prometheus scrape job definition with the name: `<service>-<project>-<stage>`
- Each **service** within a **stage** of a **project** has a Prometheus scrape job definition with the
name: `<service>-<project>-<stage>`

For example, if `project=sockshop`, `stage=production` and `service=carts`, the scrape job name would have to be `carts-sockshop-production`.
For example, if `project=sockshop`, `stage=production` and `service=carts`, the scrape job name would have to
be `carts-sockshop-production`.

- Every service provides the following metrics for its corresponding scrape job:
- http_response_time_milliseconds (Histogram)
- http_requests_total (Counter)

This metric has to contain the `status` label, indicating the HTTP response code of the requests handled by the service.
It is highly recommended that this metric also provides a label to query metric values for specific endpoints, e.g. `handler`.
This metric has to contain the `status` label, indicating the HTTP response code of the requests handled by the
service. It is highly recommended that this metric also provides a label to query metric values for specific
endpoints, e.g. `handler`.

An example of an entry would look like this: `http_requests_total{method="GET",handler="VersionController.getInformation",status="200",} 4.0`
An example of an entry would look like
this: `http_requests_total{method="GET",handler="VersionController.getInformation",status="200",} 4.0`

- Based on those metrics, the queries for the SLIs are built as follows:

- **throughput**: `sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s]))`
- **error_rate**: `sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary",status!~'2..'}[<test_duration_in_seconds>s]))/sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s]))`
- **response_time_p50**: `histogram_quantile(0.50, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))`
- **response_time_p90**: `histogram_quantile(0.90, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))`
- **response_time_p95**: `histogram_quantile(0.95, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))`
- **
throughput**: `sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s]))`
- **
error_rate**: `sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary",status!~'2..'}[<test_duration_in_seconds>s]))/sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s]))`
- **
response_time_p50**: `histogram_quantile(0.50, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))`
- **
response_time_p90**: `histogram_quantile(0.90, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))`
- **
response_time_p95**: `histogram_quantile(0.95, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))`
TannerGabriel marked this conversation as resolved.
Show resolved Hide resolved

## Advanced Usage

### Using an external Prometheus instance

To use an external Prometheus instance for a certain project, a secret containing the URL and the access credentials has to be created using the `keptn` cli (don't forget to replace the `<project>` placeholder with the name of your project):
To use an external Prometheus instance for a certain project, a secret containing the URL and the access credentials has
to be created using the `keptn` cli (don't forget to replace the `<project>` placeholder with the name of your project):

```console
PROMETHEUS_USER=test
Expand All @@ -176,11 +197,13 @@ PROMETHEUS_URL=http://prometheus-server.monitoring.svc.cluster.local
keptn create secret prometheus-credentials-<project> --scope="keptn-prometheus-service" --from-literal="PROMETHEUS_USER=$PROMETHEUS_USER" --from-literal="PROMETHEUS_PASSWORD=$PROMETHEUS_PASSWORD" --from-literal="PROMETHEUS_URL=$PROMETHEUS_URL"
```

Note: This creates an actual Kubernetes secret, with some Kubernetes labels (`app.kubernetes.io/managed-by=keptn-secret-service`, `app.kubernetes.io/scope=prometheus-service`) and is bound to the correct role (`keptn-prometheus-svc-read`) which allow prometheus-service to access it.
Note: This creates an actual Kubernetes secret, with some Kubernetes
labels (`app.kubernetes.io/managed-by=keptn-secret-service`, `app.kubernetes.io/scope=prometheus-service`) and is bound
to the correct role (`keptn-prometheus-svc-read`) which allow prometheus-service to access it.

### User-defined Service Level Indicators (SLIs)

Users can override the predefined queries, as well as add custom queries by creating a SLI configuration.
Users can override the predefined queries, as well as add custom queries by creating a SLI configuration.

* A SLI configuration is a yaml file as shown below:

Expand All @@ -191,33 +214,39 @@ Users can override the predefined queries, as well as add custom queries by crea
cpu_usage: avg(rate(container_cpu_usage_seconds_total{namespace="$PROJECT-$STAGE",pod_name=~"$SERVICE-primary-.*"}[5m]))
response_time_p95: histogram_quantile(0.95, sum by(le) (rate(http_response_time_milliseconds_bucket{handler="ItemsController.addToCart",job="$SERVICE-$PROJECT-$STAGE-canary"}[$DURATION_SECONDS])))
```
This file contains a list of keys (e.g., `cpu_usage`) and a prometheus metric expressions (e.g., `avg(rate(...{filters}[timeframe]))`).
This file contains a list of keys (e.g., `cpu_usage`) and a prometheus metric expressions (
e.g., `avg(rate(...{filters}[timeframe]))`).

* To store this configuration, you need to add this file to a Keptn's configuration store, e.g., using the [keptn add-resource](https://keptn.sh/docs/0.14.x/reference/cli/commands/keptn_add-resource/) command:
* To store this configuration, you need to add this file to a Keptn's configuration store, e.g., using
the [keptn add-resource](https://keptn.sh/docs/0.14.x/reference/cli/commands/keptn_add-resource/) command:

```console
keptn add-resource --project <project> --service <service> --stage <stage> --resource=sli.yaml --resourceUri=prometheus/sli.yaml
```

---

Within the user-defined queries, the following variables can be used to dynamically build the query, depending on the project/stage/service, and the time frame:
Within the user-defined queries, the following variables can be used to dynamically build the query, depending on the
project/stage/service, and the time frame:

- `$PROJECT`: will be replaced with the name of the project
- `$STAGE`: will be replaced with the name of the stage
- `$SERVICE`: will be replaced with the name of the service
- `$DEPLOYMENT`: type of the deployment (e.g., direct, canary, primary)
- `$DURATION_SECONDS`: will be replaced with the test run duration, e.g. 30s

For example, if an evaluation for the service **carts** in the stage **production** of the project **sockshop** is triggered, and the tests ran for 30s these will be the resulting queries:
For example, if an evaluation for the service **carts** in the stage **production** of the project **sockshop** is
triggered, and the tests ran for 30s these will be the resulting queries:

```
rate(my_custom_metric{job='$SERVICE-$PROJECT-$STAGE',handler=~'$handler'}[$DURATION_SECONDS]) => rate(my_custom_metric{job='carts-sockshop-production',handler=~'$handler'}[30s])
```

### Manually creating configmaps and alerts

By default, the `prometheus-service` automatically creates all the needed configmaps for targets and alerts without needing to configure anything. In some cases, the user might want to manually create the configmaps and alerts instead, which can be enabled by changing the following flags inside the `values.yaml` file:
By default, the `prometheus-service` automatically creates all the needed configmaps for targets and alerts without
needing to configure anything. In some cases, the user might want to manually create the configmaps and alerts instead,
which can be enabled by changing the following flags inside the `values.yaml` file:

- `prometheus.createTargets` (default: true) - Enable or disable the automatic creation of Prometheus targets
- `prometheus.createAlerts` (default: true) - Enable or disable the automatic creation of Prometheus alerts
Expand All @@ -228,4 +257,5 @@ Take a look at the [TROUBLESHOOTING](TROUBLESHOOTING.md) page for common errors

# Contributions

You are welcome to contribute using Pull Requests against the **master** branch. Before contributing, please read our [Contributing Guidelines](CONTRIBUTING.md).
You are welcome to contribute using Pull Requests against the **master** branch. Before contributing, please read
our [Contributing Guidelines](CONTRIBUTING.md).
2 changes: 1 addition & 1 deletion chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ distributor:
image:
repository: docker.io/keptn/distributor # Container Image Name
pullPolicy: IfNotPresent # Kubernetes Image Pull Policy
tag: "0.16.0" # Container Tag
tag: "0.17.0" # Container Tag
config:
queueGroup:
enabled: true # Enable connection via Nats queue group to support exactly-once message processing
Expand Down
16 changes: 8 additions & 8 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ require (
github.com/golang/mock v1.6.0
github.com/google/uuid v1.3.0
github.com/kelseyhightower/envconfig v1.4.0
github.com/keptn/go-utils v0.16.1-0.20220624075633-4d49101f88b4
github.com/keptn/go-utils v0.17.0
github.com/mitchellh/mapstructure v1.5.0
github.com/prometheus/alertmanager v0.24.0
github.com/prometheus/client_golang v1.12.2
Expand All @@ -35,7 +35,8 @@ require (
github.com/felixge/httpsnoop v1.0.2 // indirect
github.com/go-kit/log v0.2.0 // indirect
github.com/go-logfmt/logfmt v0.5.1 // indirect
github.com/go-logr/logr v1.2.2 // indirect
github.com/go-logr/logr v1.2.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-openapi/jsonpointer v0.19.5 // indirect
github.com/go-openapi/jsonreference v0.19.6 // indirect
github.com/go-openapi/swag v0.21.1 // indirect
Expand All @@ -61,16 +62,15 @@ require (
github.com/prometheus/common/sigv4 v0.1.0 // indirect
github.com/prometheus/procfs v0.7.3 // indirect
github.com/spf13/pflag v1.0.5 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.27.0 // indirect
go.opentelemetry.io/otel v1.2.0 // indirect
go.opentelemetry.io/otel/internal/metric v0.25.0 // indirect
go.opentelemetry.io/otel/metric v0.25.0 // indirect
go.opentelemetry.io/otel/trace v1.2.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.32.0 // indirect
go.opentelemetry.io/otel v1.7.0 // indirect
go.opentelemetry.io/otel/metric v0.30.0 // indirect
go.opentelemetry.io/otel/trace v1.7.0 // indirect
go.uber.org/atomic v1.9.0 // indirect
go.uber.org/multierr v1.6.0 // indirect
go.uber.org/zap v1.19.0 // indirect
golang.org/x/net v0.0.0-20220225172249-27dd8689420f // indirect
golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8 // indirect
golang.org/x/oauth2 v0.0.0-20220608161450-d0670ef3b1eb // indirect
golang.org/x/sys v0.0.0-20220209214540-3681064d5158 // indirect
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 // indirect
golang.org/x/text v0.3.7 // indirect
Expand Down
Loading