Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support to collect metrics using the OTEL library #927

Merged
merged 2 commits into from
Apr 24, 2024

Conversation

zorocloud
Copy link
Contributor

@zorocloud zorocloud commented Feb 20, 2024

Adding support to the interceptor proxy to allow for the collection of metrics via a prometheus compatible endpoint or via a OTEL HTTP collector. Resolves #910

Checklist

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice! Could you add some documentation about how to enable these metrics? I think that we could create a new section like Operate and include this information about how to enable prometheus/otel and which metrics are available.

do you think that adding an e2e test could be useful? In KEDA core we cover the metrics too using e2e tests, so we could just add an e2e test like them

@zorocloud
Copy link
Contributor Author

zorocloud commented Feb 27, 2024

Looks really nice! Could you add some documentation about how to enable these metrics? I think that we could create a new section like Operate and include this information about how to enable prometheus/otel and which metrics are available.

Thanks! Yeah, sure. Will draft some docs up and get the PR updated.

do you think that adding an e2e test could be useful? In KEDA core we cover the metrics too using e2e tests, so we could just add an e2e test like them

That is a good shout. Will get a test for both cases added in and update the PR.

@zorocloud
Copy link
Contributor Author

I have added a brief overview of how to configure each exporter in a Operate.md doc as per your suggestion @JorTurFer and added in some e2e tests (using the core Keda set as a reference). Let me know your thoughts :)

config/interceptor/deployment.yaml Outdated Show resolved Hide resolved
config/interceptor/deployment.yaml Outdated Show resolved Hide resolved
config/interceptor/deployment.yaml Show resolved Hide resolved
config/interceptor/scaledobject.yaml Outdated Show resolved Hide resolved
docs/operate.md Outdated
Comment on lines 1 to 13
# Configuring metrics for the KEDA HTTP Add-on interceptor proxy

### Exportable metrics:
* **Pending request count** - the number of pending requests for a given host.
* **Total request count** - the total number of requests for a given host with method, path and response code attributes.

There are currently 2 supported methods for exposing metrics from the interceptor proxy service - via a Prometheus compatible metrics endpoint or by pushing metrics to a OTEL HTTP collector.

### Configuring the Prometheus compatible metrics endpoint
When configured the interceptor proxy can expose metrics on a Prometheus compatible endpoint. This endpoint can be enabled by setting the `KEDA_HTTP_OTEL_PROM_EXPORTER_ENABLED` environment variable to `true` on the interceptor deployment (`true` by default) and by setting `KEDA_HTTP_OTEL_PROM_EXPORTER_PORT` to an unused port for the endpoint to be made avaialble on (`2223` by default).

### Configuring the OTEL HTTP exporter
When configured the interceptor proxy can export metrics to a OTEL HTTP collector. The OTEL exporter can be enabled by setting the `KEDA_HTTP_OTEL_HTTP_EXPORTER_ENABLED` environment variable to `true` on the interceptor deployment (`false` by default). When enabled the `KEDA_HTTP_OTEL_HTTP_COLLECTOR_ENDPOINT` environment variable must also be configured so the exporter knows what collector to send the metrics to (e.g. opentelemetry-collector.open-telemetry-system:4318). If the collector is exposed on a unsecured endpoint then you can set the `KEDA_HTTP_OTEL_HTTP_COLLECTOR_INSECURE` environment variable to `true` (`false` by default) which will disable client security on the exporter. If you need to provide any headers such as authentication details in order to utilise your OTEL collector you can add them into the `KEDA_HTTP_OTEL_HTTP_HEADERS` environment variable. The frequency at which the metrics are exported can be configured by setting `KEDA_HTTP_OTEL_METRIC_EXPORT_INTERVAL` to the number of seconds you require between each export interval (`30` by default).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -122,3 +177,35 @@ func TestDeployKEDAHttpAddOn(t *testing.T) {
t.Log(string(out))
t.Log("KEDA Http Add-on deployed successfully using 'make deploy' command")
}

func TestSetupOpentelemetryComponents(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the removal of the collection too please? (during the teardown)

@zorocloud zorocloud force-pushed the interceptor-metrics branch 2 times, most recently from 4a10aa4 to a8e5a05 Compare March 22, 2024 13:00
assert.True(t, ok, "interceptor_request_count_total is available")

requestCount := getMetricsValue(val)
assert.Equal(t, float64(1), requestCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assert this using greater or equal because the Prometheus metrics are persisted over the time, and 2nd and 3rd tries can fail because the metric returns 2,3, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is a good shout - when I had tested I was just tearing it down after every run so only ever had a count of 1 but that won't always be true

- "exec tail -f /dev/null"`
)

func TestMetricGeneration(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This e2e test has failed due to compiling issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, left a small typo in there. Removed.

assert.True(t, ok, "interceptor_request_count_total is available")

requestCount := getMetricsValue(val)
assert.Equal(t, float64(1), requestCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assert this using greater or equal because the Prometheus metrics are persisted over the time, and 2nd and 3rd tries can fail because the metric returns 2,3, etc

@zorocloud
Copy link
Contributor Author

Will take a look why the e2e tests failed when I get a sec

@zorocloud zorocloud force-pushed the interceptor-metrics branch 2 times, most recently from 71f7a82 to 7dd67c3 Compare March 27, 2024 14:51
Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@JorTurFer
Copy link
Member

There are some static code errors, could you take a look please? https://github.com/kedacore/http-add-on/actions/runs/8453796744/job/23280002430?pr=927

Copy link
Member

@tomkerkhove tomkerkhove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs LGTM, nice work!

docs/operate.md Outdated Show resolved Hide resolved
docs/operate.md Outdated Show resolved Hide resolved
@zorocloud
Copy link
Contributor Author

Thanks for the feedback. Remaining suggestions should be addressed and the static checks fixed. Just noticed I need to fix some conflicts too. Will sort that when I next get a sec 👍

@zorocloud
Copy link
Contributor Author

zorocloud commented Apr 17, 2024

I have just rebased onto main and the unit/e2e tests look like they are passing fine so we should be good now 👍

Edit: @JorTurFer it looks like I have a Snyk failure but can't view them via the details link. Is there another way to view the results?

@JorTurFer JorTurFer enabled auto-merge (squash) April 21, 2024 11:18
@JorTurFer
Copy link
Member

We can ignore Synk, it has transient failures from time to time xD I have marked it as okey manually :)

@JorTurFer
Copy link
Member

There is other PR ongoing and once that's merged, I'll update your PR and merge it. Thanks for your AWESOME contribution ❤️

Comment on lines +20 to +23
config:
exporters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config:
exporters:
image:
repository: "otel/opentelemetry-collector-contrib"
config:
exporters:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential breaking change that requires the change above, please commit it 🙏
https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/UPGRADING.md#0880-to-0890

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @JorTurFer :)

Also, I've pushed a change to move the pending request count from a Gauge to a UpDownCounter (noticed some memory issues with the Gauge). Left it in a separate commit rather than squashing locally so it's easier for you to review.

auto-merge was automatically disabled April 23, 2024 16:28

Head branch was pushed to by a user without write access

@JorTurFer
Copy link
Member

There are merge conflicts to be solved, could you solve them? (or enabling me to update your PR xD)

…ible endpoint or by sending metrics to an OTEL http endpoint

Signed-off-by: Joe Wogan <joe.wogan@10xbanking.com>
specify otel image for e2e tests

Signed-off-by: Joe Wogan <joe.wogan@10xbanking.com>
@zorocloud
Copy link
Contributor Author

Should be sorted now @JorTurFer :)

Also, allowed maintainers to edit the PR, not sure why I didn't select this previously.

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ❤️

@JorTurFer JorTurFer enabled auto-merge (squash) April 24, 2024 07:39
@JorTurFer JorTurFer merged commit 0237611 into kedacore:main Apr 24, 2024
19 checks passed
@zorocloud zorocloud deleted the interceptor-metrics branch April 24, 2024 08:15
@JorTurFer
Copy link
Member

I didn't noticed during the review that we aren't respected the OTEL spec ENV vars: https://opentelemetry.io/docs/specs/otel/protocol/exporter/

I've created an issue to track it, as I guess that you are using the metrics, I share it with you because we will have to change them for future versions: #1031

@zorocloud
Copy link
Contributor Author

Ah sugar, I completely missed that too 🤦. I'm happy to get a PR raised toward the end of next week to fix this if someone isn't already working on it.

@JorTurFer
Copy link
Member

I'm happy to get a PR raised toward the end of next week to fix this if someone isn't already working on it.

I'd be nice, I just pinged you to make you aware about a possible breaking change, but if you are willing to update the code, it's awesome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interceptor Proxy - OTEL Instrumentation
3 participants