Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics Service Support #220

Merged

Conversation

ramaraochavali
Copy link
Contributor

Signed-off-by: Rama ramaraochavali@gmail.com

Signed-off-by: Rama <ramaraochavali@gmail.com>
@ramaraochavali
Copy link
Contributor Author

@mattklein123 Sorry. Had to create a new PR to get around the DCO issues. Here is the original PR #210. Please go through the PR for more details

Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for making this happen.

visibility = ["//visibility:public"],
)

py_proto_library(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would skip the Python rules unless someone is going to use them for now. Bazel really should get native py_proto_library support that builds on proto_library to make this cleaner, see bazelbuild/bazel#2626 and bazelbuild/bazel#3935.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I skip? Initially I did not have. Then I got into some py related error. So I thought it is mandatory to have py and added this.

import "google/api/annotations.proto";
import "metrics.proto";

//Service to fetch metrics from Proxy. This uses Promotheus data model to represent metrics returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: // Service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change

service MetricsService {
rpc FetchMetrics (MetricsRequest) returns (MetricsResponse) {
option (google.api.http) = {
post: "/v2/metrics"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, unlike the other services in data-plane-api, this is the first to be served by Envoy rather than have Envoy act as client. I'm wondering if we somehow want to distinguish this namespace wise. Also, can you clarify if this is intended to be a gRPC service, REST or combination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have strong opinion on putting it in different name space. I was originally thinking of it as a gRPC service. For REST, I think the admin end point with ?format=json is sufficient I guess.

Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
Signed-off-by: Rama <ramaraochavali@gmail.com>
@mattklein123
Copy link
Member

@ramaraochavali thanks for bearing with us here. :)

I still think think is a major decision point. Is this
a) A push API that Envoy will call to stream metrics (I would actually vote for this).
b) A pull API that some metrics puller will call to get metrics (what is written).

We should be clear about what we are going after here.

If we do b), I would prefer to do this in the context of https://github.com/envoyproxy/data-plane-api/issues/158 and to define a new admin namespace per @htuch where we can start specifying protos for all the admin output.

Signed-off-by: Rama <ramaraochavali@gmail.com>
@ramaraochavali
Copy link
Contributor Author

@htuch I made the py optional only for metrics and build passes.

@ramaraochavali
Copy link
Contributor Author

@mattklein123 No issues. I see your point. I think a) is more valuable in data-plane-api context. While b) may be good - it is in the context of admin. I think we should go after a). I am actually modifying the service def accordingly. I am just setting bazel stuff trying to compile with promotheus proto with basic service def.

Signed-off-by: Rama <ramaraochavali@gmail.com>
@ramaraochavali
Copy link
Contributor Author

@mattklein123 I have changed it to suit the definition for a). Please review. I also think b) might also have some value but we can do it under #158 may be if needed.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this is a useful addition. @rshriram et al, does this look ok from Istio side? Anyone else have any thoughts on this?

// Service for streaming metrics to connected end point.
service MetricsService {
rpc StreamMetrics(stream StreamMetricsMessage) returns (StreamMetricsResponse) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mega nit: I would del newline.

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally useful. Would like some opinion from @douglas-reid or @ZackButcher

import "metrics.proto";

// Service for streaming metrics to connected end point.
service MetricsService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to explicitly call out PrometheusMetricsService ? After all the format is for Prometheus. If not this, atleast have a type field or something in streammessage that says what type of metric is coming out of Envoy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed it in the PR #210 that we do not want to call this PrometheusMetricsService. We picked Promotheus model because it is more comprehensive representation of metrics model. We tried to look at OpenMetrics but since it is far off and any way is heavily inspired from Promotheus model. So we decided to go with this model for now.

message StreamMetricsResponse {}

message StreamMetricsMessage {
repeated io.prometheus.client.MetricFamily proxy_metrics = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envoy_metrics. We don’t care if this is proxy or sidecar or LB or anything :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change

@douglas-reid
Copy link
Contributor

douglas-reid commented Oct 31, 2017 via email

Signed-off-by: Rama <ramaraochavali@gmail.com>
@ramaraochavali
Copy link
Contributor Author

@mattklein123 do we need identifier like Node to be sent along with this so that server identifies the node from which the metrics are coming from?
Another couple of points

  1. I skipped python rules for this as per @htuch comments above.
  2. I also did not add this to go_protos and go_grpc? Do I have to? If I have to add, I need to find a way to reference "promotheus" proto there.

@ramaraochavali
Copy link
Contributor Author

ramaraochavali commented Nov 1, 2017

I enabled go compilation and am running in to the following build problem
ERROR: /home/rama.rao/git/data-plane-api/api/BUILD:38:1: GoCompile api/normalgo_default_library~/api.a failed (Exit 1)
2017/11/01 11:29:55 missing strict dependencies:
bazel-out/local-fastbuild/bin/api/go_default_library/api/metrics_service.pb.go: import of ., which is not a direct dependency
INFO: Elapsed time: 16.925s, Critical Path: 3.11s
FAILED: Build did NOT complete successfully
and in the metrics_service.pb.go has the following import which is causing trouble
import io_prometheus_client "."
Any ideas on how to get around this?
@htuch do you know how to get around this? It looks a bazel rule issue with protoc generator with go..

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should Node as an optional field to be included in the first field, for consistency with other APIs. Looks good other than remaining comments.

import "google/api/annotations.proto";
import "metrics.proto";

// Service for streaming metrics to connected end point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the wording here to describe in more detail what the endpoint is, and also update README.md in the root directory to describe this new service that we are connecting to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change..

Signed-off-by: Rama <ramaraochavali@gmail.com>
Copy link
Contributor

@douglas-reid douglas-reid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable for Istio usage.

@@ -1,4 +1,5 @@
GOOGLEAPIS_SHA = "5c6df0cd18c6a429eab739fb711c27f6e1393366" # May 14, 2017
PROMETHEUS_SHA = "6f3806018612930941127f2a7c6c453ba2c527d2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be worth annotating with date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

import "google/api/annotations.proto";
import "metrics.proto";

// Service for streaming metrics to server that consumes the metrics data. It uses Prometheus metric data model as a standard to represent metrics information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please line break around 100 cols

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do


message StreamMetricsMessage {
// The node sending the metric messages over the stream.
Node node = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you take a look at my access log API and copy what we did there with Identifier? We only need to send node once.

import "metrics.proto";

// Service for streaming metrics to server that consumes the metrics data. It uses Prometheus metric data model as a standard to represent metrics information.
service MetricsService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to provide a config struct with at least cluster in it. See access log API.

Signed-off-by: Rama <ramaraochavali@gmail.com>
@ramaraochavali
Copy link
Contributor Author

@mattklein123 I think addressed all comments and should be good to go. Can you please look at it?

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you! Any final comments from anyone else?

@rshriram
Copy link
Member

rshriram commented Nov 2, 2017 via email

@@ -0,0 +1,39 @@
syntax = "proto3";

package envoy.api.v2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: blank line after package.

}

// Identifier data that will only be sent in the first message on the stream. This is effectively
// structured metadata and is a performance optimization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably be two sentences, the conjunction of the description of metadata and then the performance optimization note doesn't make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Signed-off-by: Rama <ramaraochavali@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants