Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GRPC client and server metrics #5615

Merged
merged 9 commits into from
Apr 10, 2024

Conversation

aliaqel-stripe
Copy link
Contributor

@aliaqel-stripe aliaqel-stripe commented Mar 20, 2024

This PR adds GRPC Client and Server interceptors that emit GRPC request metrics from the perspective of the Client and Server. This allows developers to observe the performance of the adapter -> operator grpc connection.

To implement this, I use the upstream grpc-middleware with default prometheus metrics. I create helpers to create the metrics objects in metricscollector and then add them to the client/server if prometheus metrics are enabled.

Tested on cluster, screenshot from Grafana:
image

Checklist

Fixes #5502

Relates to kedacore/keda-docs#1340

@aliaqel-stripe aliaqel-stripe changed the title Add GRPC metrics interceptors Add GRPC client and server metrics Mar 20, 2024
@aliaqel-stripe aliaqel-stripe marked this pull request as ready for review March 21, 2024 00:03
@aliaqel-stripe aliaqel-stripe requested a review from a team as a code owner March 21, 2024 00:03
@SpiritZhou
Copy link
Contributor

SpiritZhou commented Mar 21, 2024

Could you add some unit tests/integration tests for this feature?

@aliaqel-stripe
Copy link
Contributor Author

Could you add some unit tests/integration tests for this feature?

Oh yes, for sure. I found the metrics tests, I will add checks for these there.

@aliaqel-stripe
Copy link
Contributor Author

Could you add some unit tests/integration tests for this feature?

@SpiritZhou Can the tests run in Kind? We don't use Helm on our clusters nor do we use kubernetes networking, so I have to run locally.

@JorTurFer
Copy link
Member

@SpiritZhou Can the tests run in Kind? We don't use Helm on our clusters nor do we use kubernetes networking, so I have to run locally.

Hello!
Definitively you can! You can install KEDA in your local cluster using this guide. Then, you can execute the tests directly pointing to your local cluster. (If you want to execute OTEL metrics test too, you need to deploy also the collector)

@JorTurFer
Copy link
Member

JorTurFer commented Mar 24, 2024

/run-e2e prometheus_metrics
Update: You can check the progress here

@aliaqel-stripe
Copy link
Contributor Author

@JorTurFer can you rerun the prom tests? I set the family name incorrectly in the tests and that's why the histogram tests failed.

@aliaqel-stripe
Copy link
Contributor Author

this guide

lol our security tools won't let me run mockgen outside of the dev container on my laptop lol

@JorTurFer can you rerun the prometheus e2e tests?

@JorTurFer
Copy link
Member

JorTurFer commented Mar 26, 2024

/run-e2e prometheus_metrics
Update: You can check the progress here

@aliaqel-stripe
Copy link
Contributor Author

I ended up removing the test for the histogram, there are three metrics in that family and that's causing the crash so I'm just going to assume if the other 4 metrics are there it's there as this is upstream metrics functionality and not something we're reporting to our selves.

@aliaqel-stripe
Copy link
Contributor Author

@JorTurFer lol, can you run the e2e tests one more time :)

@JorTurFer
Copy link
Member

JorTurFer commented Mar 26, 2024

/run-e2e prometheus_metrics
Update: You can check the progress here

cmd/adapter/main.go Outdated Show resolved Hide resolved
cmd/adapter/main.go Outdated Show resolved Hide resolved
@aliaqel-stripe
Copy link
Contributor Author

@JorTurFer Can you run e2e on these new changes?

@zroubalik
Copy link
Member

zroubalik commented Apr 8, 2024

/run-e2e prometheus_metrics
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition. My only concern is about naming of these metrics. The grpc there is too generic. Maybe keda_metricservice_ or something around that?

@aliaqel-stripe
Copy link
Contributor Author

This is a great addition. My only concern is about naming of these metrics. The grpc there is too generic. Maybe keda_metricservice_ or something around that?

I can make that change.

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@zroubalik
Copy link
Member

This is a great addition. My only concern is about naming of these metrics. The grpc there is too generic. Maybe keda_metricservice_ or something around that?

I can make that change.

Omg, I am very sorry, the prefix should be keda_internal_metricsservice_, sorry forgot about that.

@aliaqel-stripe
Copy link
Contributor Author

This is a great addition. My only concern is about naming of these metrics. The grpc there is too generic. Maybe keda_metricservice_ or something around that?

I can make that change.

Omg, I am very sorry, the prefix should be keda_internal_metricsservice_, sorry forgot about that.

Haha all good, will fix it quickly!

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@aliaqel-stripe
Copy link
Contributor Author

This is a great addition. My only concern is about naming of these metrics. The grpc there is too generic. Maybe keda_metricservice_ or something around that?

I can make that change.

Omg, I am very sorry, the prefix should be keda_internal_metricsservice_, sorry forgot about that.

Updated the name! Can you re-run e2e tests when they're ready?

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@aliaqel-stripe
Copy link
Contributor Author

@zroubalik I can't see the snyk test failure because I don't have access to their UI

@zroubalik
Copy link
Member

zroubalik commented Apr 9, 2024

/run-e2e prometheus_metrics
Update: You can check the progress here

@zroubalik
Copy link
Member

@zroubalik I can't see the snyk test failure because I don't have access to their UI

that's fine

@zroubalik zroubalik merged commit 08aeb57 into kedacore:main Apr 10, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics for gRPC requests between metrics-api-server and operator
4 participants