-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics to Elastic Package Registry #827
Conversation
31cbca6
to
941d546
Compare
7d8206b
to
840b435
Compare
Add metrics for every request in flight requests, response size, request size and duration for every request including as label code, method and path. Uptime of the application could be obtanied using metric "process_start_time_seconds"
840b435
to
352170c
Compare
A new middleware has been added to get all the metrics per request, and they have been exposed in /metrics following the prometheus standard. These first metrics are:
The prometheus client offers also a set of default metrics. From these metrics, at least "uptime" values can be obtained from "process_start_time_seconds" metric:
For all the metrics except "in flight requests", it has been added as labels: code, method and path. That would allow us to check in more detail the values, e.g. check specific endpoints. As an example of these:
@elastic/ecosystem I was wondering if you could take a look at this PR before adding more metrics (I still need to check about metrics related to storage). I wanted to check if you see any issue about the middleware or how the metrics are being added. |
main.go
Outdated
@@ -186,6 +189,7 @@ func initAPMTracer(logger *zap.Logger) *apm.Tracer { | |||
if err != nil { | |||
logger.Fatal("Failed to initialize APM agent", zap.Error(err)) | |||
} | |||
tracer.RegisterMetricsGatherer(apmprometheus.Wrap(prometheus.DefaultGatherer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to send the metrics to elastic stack using the APM integration. However, metrics that have labels on it are stored as plain json strings by default. And new dashboards should be created (still not sure how to achieve this in the APM integration).
As an alternative, I've added a new endpoint (/metrics) that exposes all the metrics using the Prometheus format. From that we could use MetricBeat to create our own integration.
WDYT ? Should we discard the APM integration for the time being in favor of the Prometheus endpoint and a new integration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can focus on the prometheus endpoint by now. It is more clear how to build an integration around that. We can add the APM metrics gatherer later, specially seeing that it is basically a one-liner.
For users of APM, some of the information related to requests can be obtained in any case with more detail using the APM data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, added some comments by now.
main.go
Outdated
@@ -186,6 +189,7 @@ func initAPMTracer(logger *zap.Logger) *apm.Tracer { | |||
if err != nil { | |||
logger.Fatal("Failed to initialize APM agent", zap.Error(err)) | |||
} | |||
tracer.RegisterMetricsGatherer(apmprometheus.Wrap(prometheus.DefaultGatherer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can focus on the prometheus endpoint by now. It is more clear how to build an integration around that. We can add the APM metrics gatherer later, specially seeing that it is basically a one-liner.
For users of APM, some of the information related to requests can be obtained in any case with more detail using the APM data.
main.go
Outdated
@@ -281,7 +285,9 @@ func getRouter(logger *zap.Logger, config *Config, indexer Indexer) (*mux.Router | |||
router.HandleFunc(signaturesRouterPath, signaturesHandler) | |||
router.HandleFunc(packageIndexRouterPath, packageIndexHandler) | |||
router.HandleFunc(staticRouterPath, staticHandler) | |||
router.Handle("/metrics", promhttp.Handler()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For production environments we probably want to expose this endpoint in other host/port, so metrics are not publicly available. We may need to add a flag for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I'll add those flags and run this metrics endpoint in a separate server (e.g. http.Server).
Doing it in this way, it would also allow me to remove the filtering done in the logging middleware to avoid logging requests to /metrics.
util/metrics.go
Outdated
} | ||
labels := prometheus.Labels{"path": path} | ||
handler = promhttp.InstrumentHandlerCounter(httpRequestsTotal.MustCurryWith(labels), handler) | ||
handler = promhttp.InstrumentHandlerDuration(httpRequestDurationSeconds.MustCurryWith(labels), handler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APM also collects this kind of duration information. Though I think it is ok to have this also as a metric.
util/metrics.go
Outdated
512 * 1024, /* 512KiB */ | ||
1024 * 1024, /* 1MiB */ | ||
64 * 1024 * 1024, /* 64MiB */ | ||
512 * 1024 * 1024, /* 512MiB */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reasoning behind the selection of these buckets? Maybe we can use the current APM data we collect now to select values according to our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason for those buckets, I've set them just randomly and I was checking how they fit executing some curls.
If APM exposes that data so it can be checked the actual values, I'll check it and update this list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With v2 changes, we don't expect any responses great than a few KBs. In case the Fleet wants to download a ZIP file, it will be redirected (HTTP 301) to the https://package-storage.elastic.co/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should I plan now the buckets thinking on the storage v2 ? Buckets with lower values would be needed for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this http_request_duration_seconds
metric that is using the default buckets definition1:
var (
DefBuckets = [][float64](https://pkg.go.dev/builtin#float64){.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10}
)
I've been checking data collected by APM, and it looks they could fit well in our scenarios.
Footnotes
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
717bfdd
to
5e02260
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you post a sample /metrics
response in the issue description.
main.go
Outdated
// If -dry-run=true is set, service stops here after validation | ||
if dryRun { | ||
os.Exit(0) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. I wouldn't hide this condition in the initMetricsServer
function as it will be surprising to debug. Shall we move it to the main routine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was based on the initServer
one, and I must review the usage of dryRun here.
I think the call to os.Exit(0) in initServer could also be moved to main(). I'll check that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great!
// Do not log requests to the health endpoint | ||
if r.RequestURI == "/health" { | ||
switch r.RequestURI { | ||
case "/health": | ||
// Do not log requests to these endpoints | ||
next.ServeHTTP(w, r) | ||
return | ||
default: | ||
logRequest(logger, next, w, r) | ||
} | ||
|
||
logRequest(logger, next, w, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly speaking, I would log all requests. At least we know that some script/bot/logic/ hits the healthcheck. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got tempted by this when migrating to the ECS logger, but it flooded the logs quite a bit. Let's leave it for a separate PR if we do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, I just changed the code here because I wanted to filter outlogs from /metrics too. As part of one of the comments now /metrics is under another host:port (http.ListenAndServe()), so at the end it was not needed.
Not a strong opinion here, I see that this filter in case of /health endpoint could be advantageous to inspect logs later, since it could flood elastic with too many GET requests to /health (they could be filtered in a KQL query too). However, this is hiding from us the actual requests (in logs) to all the endpoints.
I could remove it and log all requests, and check whether or not it causes a problem when we try to diagnose, check logs... Ans if that is the case, we could remove it.
WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got tempted by this when migrating to the ECS logger, but it flooded the logs quite a bit. Let's leave it for a separate PR if we do it.
Ok, I'll keep the same logic and let's try that in a separate PR.
util/metrics.go
Outdated
512 * 1024, /* 512KiB */ | ||
1024 * 1024, /* 1MiB */ | ||
64 * 1024 * 1024, /* 64MiB */ | ||
512 * 1024 * 1024, /* 512MiB */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With v2 changes, we don't expect any responses great than a few KBs. In case the Fleet wants to download a ZIP file, it will be redirected (HTTP 301) to the https://package-storage.elastic.co/.
My fault, I haven't read the whole thread. I can see metrics here. Few ideas to consider:
|
This kind of "global" metadata uses to be added in Prometheus as labels of a dummy metric to avoid increasing a lot the size of responses. For example in Kubernetes you find "metrics" like the following ones, instead of adding this metadata to all the metrics of a service or a node. But assigning this info to the proper events in Metricbeat is a bit tricky.
In istio you can also find some versions this way: https://github.com/elastic/beats/blob/949d7ccd1f866ddbcd73edb8ba0cd7b8af7f918d/x-pack/metricbeat/module/istio/istiod/_meta/testdata/istiod.v1.7.1.plain#L357 The problem of this trick for Metricbeat is that generic Metricbeat collector groups prometheus metrics by labels, so it may be difficult to put the version in all documents, unless we modify the Metricbeat collectors. |
Sure! Added some examples of metrics into the description.
I could try to create a metric with that info at least to be exposed in
I wanted to at least cover number of indexed packages in this PR. I will take a look to the other metrics you mention to check how they can be added. |
812be9a
to
73cca37
Compare
main.go
Outdated
// If -dry-run=true is set, service stops here after validation | ||
if dryRun { | ||
os.Exit(0) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to main()
.
It looks safe to be executed the following lines:
router := mustLoadRouter(logger, config, combinedIndexer)
apmgorilla.Instrument(router, apmgorilla.WithTracer(apmTracer))
return &http.Server{Addr: address, Handler: router}
and in main.go just do the exit call.
Do you think there could be any issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea for dry-run is to run the package validation, fail in case of an error, or os.Exit(0)
when it's safe. To be 100% sure, you can try running it against a path with packages and see if it doesn't skip validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in a container, and it seems that works as expected. Here it is the output:
root@7f7596816c73:/package-registry# cat config.yml
package_paths:
- /packages/development
- /packages/production
- /packages/staging
- /packages/snapshot
root@7f7596816c73:/package-registry# ./package-registry --dry-run
{"log.level":"info","@timestamp":"2022-07-05T11:54:55.958Z","log.origin":{"file.name":"package-registry/main.go","file.line":95},"message":"Package registry started","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:54:55.961Z","log.origin":{"file.name":"package-registry/main.go","file.line":247},"message":"Packages paths: /packages/development, /packages/production, /packages/staging, /packages/snapshot","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:54:55.961Z","log.origin":{"file.name":"package-registry/main.go","file.line":249},"message":"Cache time for /search: 10m0s","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:54:55.961Z","log.origin":{"file.name":"package-registry/main.go","file.line":250},"message":"Cache time for /categories: 10m0s","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:54:55.961Z","log.origin":{"file.name":"package-registry/main.go","file.line":251},"message":"Cache time for all others: 10m0s","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:54:55.962Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/development","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:54:55.966Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/production","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:55:01.087Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/staging","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:55:01.309Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/snapshot","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:55:02.318Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/development","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:55:02.807Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/production","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:55:02.828Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/staging","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:55:02.932Z","log.origin":{"file.name":"packages/packages.go","file.line":191},"message":"Searching packages in /packages/snapshot","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-05T11:55:02.932Z","log.origin":{"file.name":"package-registry/main.go","file.line":269},"message":"1584 package manifests loaded","ecs.version":"1.6.0"}
root@7f7596816c73:/package-registry#
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find confusing to need now so many if dryRun { return }
.
Could we refactor this so it is more explicit what is run and what not when dryRun
is enabled?
If the idea is to run only the package validation, maybe we can move the initial package loading to its own function, and in main()
just do after initializing the logger:
if dryRun {
initIndexers(logger)
os.Exit(0)
}
(initIndexers
would contain what is done in initServer
before the current if dryRun
, initServer
would also call initIndexers
).
My intention was to fix the following use case: during the rollout of a new EPR deployment, we can easily determine whether errors are coming from the freshly deployed app. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few minor comments, but in general, it's close to getting merged. We can add more metrics in follow-ups.
One thing that is missing is a CHANGELOG entry.
main.go
Outdated
@@ -143,6 +152,11 @@ func initMetricsServer(logger *zap.Logger) { | |||
return | |||
} | |||
logger.Info("Starting http metrics in " + metricsAddress) | |||
hostname, found := os.LookupEnv("HOSTNAME") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is a specific function called os.Hostname
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Updated to use that function
util/metrics.go
Outdated
@@ -104,6 +110,8 @@ var ( | |||
// MetricsMiddleware is a middleware used to measure every request received | |||
func MetricsMiddleware() mux.MiddlewareFunc { | |||
// Rergister all metrics | |||
prometheus.MustRegister(ServiceInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ServiceInfo
be exposed or can be private?
util/metrics.go
Outdated
@@ -0,0 +1,140 @@ | |||
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't put more stuff in util
as it will become a trash can eventually. What about another package called metrics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I will put this code in a new metrics package.
I was thinking in the last changes that this kind of naming utils.ServiceInfo
was not meaningful. So, better in its own package.
main.go
Outdated
|
||
hostname, found := os.LookupEnv("HOSTNAME") | ||
if !found { | ||
hostname = defaultInstanceName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also try with os.Hostname()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that you could use both methods 🙂 the HOSTNAME
environment variable, and os.Hostname()
.
Consider creating a function to obtain the hostname from one of these two methods, plus the fallback of using localhost
.
search.go
Outdated
packages, err := indexer.Get(r.Context(), &opts) | ||
if err != nil { | ||
notFoundError(w, errors.Wrapf(err, "fetching package failed")) | ||
return | ||
} | ||
util.SearchProcessDurationSeconds.Observe(time.Since(start).Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are overlapping here a bit with APM. APM is intended for this kind of monitoring, and provides more contextual information, as data of the request, or information about the parent or child spans.
Without some context this metric doesn't say much, but if we start adding context to this metric, as the parameters used, we can end up with metrics explosion.
I would remove it by now as this is covered by APM, and maybe we can reconsider it in the future.
util/metrics.go
Outdated
|
||
const metricsNamespace = "epr" | ||
|
||
// info metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use godocs in this kind of comments, starting with the name what is being described.
// info metric | |
// ServiceInfo is a metric used to report information about the service. Labels are used for that, its value is not relevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added those comments just to some kind of grouping. I delete those comments, as every metric variable has a help message, I think it's not needed to add a comment too.
main.go
Outdated
// If -dry-run=true is set, service stops here after validation | ||
if dryRun { | ||
os.Exit(0) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find confusing to need now so many if dryRun { return }
.
Could we refactor this so it is more explicit what is run and what not when dryRun
is enabled?
If the idea is to run only the package validation, maybe we can move the initial package loading to its own function, and in main()
just do after initializing the logger:
if dryRun {
initIndexers(logger)
os.Exit(0)
}
(initIndexers
would contain what is done in initServer
before the current if dryRun
, initServer
would also call initIndexers
).
main.go
Outdated
if !found { | ||
hostname = defaultInstanceName | ||
} | ||
util.ServiceInfo.WithLabelValues(version, hostname).Set(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. I don't find WithLabelsValues
very intuitive, I had to double check where is defined the names of the labels (I see it is in the definition of ServiceInfo
).
Maybe we could use WithLabels
.
util.ServiceInfo.WithLabelValues(version, hostname).Set(1) | |
util.ServiceInfo.WithLabels(prometheus.Labels{"version": version, "hostname": hostname}).Set(1) |
Or given the particularity of this "metric", consider adding a method for this, so here we can do something like:
util.ServiceInfo.WithLabelValues(version, hostname).Set(1) | |
util.SetServiceInfo(version, hostname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it's not very intuitive.
As there have been added new metrics using labels (e.g. StorageRequestsTotal), I would keep the first approach you proposed (using With() ), and set/increment all the metrics in the same way.
@@ -160,6 +160,7 @@ func (i *Indexer) updateIndex(ctx context.Context) error { | |||
defer i.m.Unlock() | |||
i.cursor = storageCursor.Current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a string, we could collect it with something like the service info.
Something else that we could measure here is the count of updates, this may be an interesting metric to see how frequently we are requesting an update.
(Btw, we could instrument more the storage indexer with APM, but leave this for other PRs)
* `/epr/{name}/{name}-{version}.tar.gz`: Download a package | ||
* `/epr/{name}/{name}-{version}.zip`: Download a package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the artifact handler, just zip extension is allowed in the regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch :)
Removed epr_search_process_duration_seconds metric in favor of creating specific metrics for storage indexer get and update functions.
README.md
Outdated
@@ -236,6 +236,19 @@ It will be listening in the given address. | |||
|
|||
You can read more about this profiler and the available endpoints in the [pprof documentation](https://pkg.go.dev/net/http/pprof). | |||
|
|||
## Metrics | |||
Package registry is instrumented to expose Prometheus metrics. These metrics are exposed under the `/metrics` endpoint. | |||
By default this endpoint runs `localhost:9000`. These metrics can be scraped like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far APM and pprof are disabled by default, should we do the same with metrics? This would also help to avoid backwards compatibility issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll keep the same approach to avoid any possible backwards compatibility issue.
main.go
Outdated
|
||
hostname, found := os.LookupEnv("HOSTNAME") | ||
if !found { | ||
hostname = defaultInstanceName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that you could use both methods 🙂 the HOSTNAME
environment variable, and os.Hostname()
.
Consider creating a function to obtain the hostname from one of these two methods, plus the fallback of using localhost
.
metrics/middleware.go
Outdated
path, _ := route.GetPathTemplate() | ||
labels := prometheus.Labels{"path": path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we handle the error of route.GetPathTemplate()
, and set the path
label only when available?
path, _ := route.GetPathTemplate() | |
labels := prometheus.Labels{"path": path} | |
path, err := route.GetPathTemplate() | |
if err == nil { | |
labels := prometheus.Labels{"path": path} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I missed handling this error.
As the prometheus.Labels var must have all the values defined in the metrics (path should have some value), I think I'll go with this:
path, err := route.GetPathTemplate()
if err != nil {
path := "unknown"
}
labels := prometheus.Labels{"path": path}
When I try to query urls that are not defined like https://localhost:8080/notfound
, the router.NotFoundHandler
is executed and there is no metric about that in metrics endpoint. That's why I think that error did not raise in my tests.
Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
b560dc6
to
8492d33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be complete, all my comments are addressed. Well done!
Once you get a "green light" from Jaime, feel free to merge this PR.
@@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
|
|||
* Update Go version and base Ubuntu image. [#821](https://github.com/elastic/package-registry/pull/821) | |||
* Add support for "threat_intel" category. [#841](https://github.com/elastic/package-registry/pull/841) | |||
* Instrument package registry with Prometheus metrics. [#827](https://github.com/elastic/package-registry/pull/827) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I mark this as experimental here and in the Readme? I was thinking in case some metrics (e.g. namings or buckets) need to be updated later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, I would say that we can go without special labels like experimental or beta. It's just yet another feature. If it starts failing, we will bugfix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Relates #797
Add metrics to Elastic Package Registry. These metrics
are being sent to Elastic Stack through the APM agent and also theyare being exposed in the same application in the "/metrics" endpoint.These metrics are being instrumented using Prometheus client. The list of metrics exposed in this PR are:
Some example of metrics that are going to be exposed (to be reviewed buckets yet):