Prometheus metrics about amount of security issues #425
Replies: 8 comments 3 replies
-
Great question @wuestkamp We do not provide such metrics yet, but we'd like to do so in Starboard. As you relized Starboard has already the /metrics endpoint exposed but those metrics are related to Go runtime and work queues (shared informers) processing. Code has to be written to expose scan results that we keep in CRDs and/or shared informers caches in Prometheus format. There are tools that can expose any CRD on a regular schedule as Prometheus metrics, which is another option to consider. We also saw a project that runs just Trivy (not like Starboard other types of scanners) and exposes vulnerability summaries as Prometheus metrics. However, with Starboard we want to take holistic approach to K8s security and I believe that exposing Prometheus is a great functionality to build in. |
Beta Was this translation helpful? Give feedback.
-
First of all I think starboard is a great project but for me and my company this is one of the key features that is missing. We are today managing about 20 k8s clusters and the number will grow. Today we are using metrics as a way to visualize any issues that we might have in our clusters and I want that to include CVE:s.
Taking inspiration from: https://github.com/kaidotdev/kube-trivy-exporter the project provide: trivy_vulnerabilities{image="gcr.io/spinnaker-marketplace/echo:2.5.1-20190612034009",installedVersion="0.168-1",pkgName="libelf1",severity="MEDIUM",vulnerabilityId="CVE-2018-16403"} 1 Since we also have the replicaset information already I think also should add that as a metric to easily pinpoint where we have the issue. I guess we can use the job label for that. trivy_vulnerabilities{image="gcr.io/spinnaker-marketplace/echo:2.5.1-20190612034009",installedVersion="0.168-1",pkgName="libelf1",severity="MEDIUM",vulnerabilityId="CVE-2018-16403",job="replica-spinnarker1"} 1 I have done a test implementation of this when the job creates a new report, that works fine but it's just for new reports and all the metrics would be gone if the operator restarts. This will also help us when solving: #537, instead of creating a cronjob to delete the existing vulnerabilityreport CR:s once a night we could add a TTL config to starboard and we should be able to use the same controller created for metrics to delete existing reports and thus triggering new ones. What do you think @danielpacak @wuestkamp |
Beta Was this translation helpful? Give feedback.
-
This is partly related to #563 but includes metrics for all reports that gets generated. Personally I feel the need for it mostly in for vulnerability's and I think it should be possible to create separate PR:s to implement or probably even preferred but it's a good thing to keep in mind when deciding on the naming convention for the metrics that starboard would expose. |
Beta Was this translation helpful? Give feedback.
-
I really like the idea of having a separate controller to discover VulnerabilityReports and manage Prometheus metrics. For the schema and PromQL we can get started with the proposal based on your experience managing clusters in production. Just bear in mind that Trivy is a plugin and you could have other scanners so we should use more generic names for exported metrics / labels. This new controlled should be disabled by default with an option to turn it on. Similar to OPERATOR_CIS_KUBERNETES_BENCHMARK_ENABLED env used to enable / disable infrastructure scanning. |
Beta Was this translation helpful? Give feedback.
-
Starboard is awesome, but if each analysis stay in its report object in the cluster, it is not very practical.
I am not sure that metrics should expose information about each detected vulnerabilties, like that :
Not only what should we set as value for the metric. But also because this could result in a large number of values for the label |
Beta Was this translation helpful? Give feedback.
-
Hi guys, I've been testing starboard on our clusters for a while now and I really think you are doing a great job! I'm really interested in this feature and in contributing to the project, so I have made a small POC with a controller that exposes as prometheus metrics the summary of the vulnerabilities found. I've tested it on our cluster with around ~1500 vulnerabilityreport and the exposed metrics are as follows:
As @fredgate says I think that exposing metrics with the details of vulnerabilities may cause some problems due to high cardinality. I would like you to take a look at the code and tell me if it is more or less the idea you had in mind. I'm open to receive any feedback and make the necessary changes. Would you accept a PR with a similar implementation? |
Beta Was this translation helpful? Give feedback.
-
Very interesting projects 👍 |
Beta Was this translation helpful? Give feedback.
-
Hi All , I need to export Trivy scanned reports in Kubernetes cluster to grafana to visualize using Prometheus as a data source. I have installed Kube-trivy exporter but it is not taking the cluster IP/ports after applying Manifests. Can anyone recommend other options for my requirement. |
Beta Was this translation helpful? Give feedback.
-
Awesome project!
is there a way to get the summaries from the CRDs like this one:
into Prometheus? I guess I could write a custom app which reads the CRD reports and then converts these into prometheus metrics. Or is there maybe already a general project like that?
Because the operator metrics on 8080/metrics don't include info like that.
Beta Was this translation helpful? Give feedback.
All reactions