Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding first draft of [OTEL Kubernetes] Cluster Overview Dashboard #10443

Closed
wants to merge 6 commits into from

Conversation

gizas
Copy link
Contributor

@gizas gizas commented Jul 10, 2024

  • Enhancement

Proposed commit message

Adding a new [OTEL Kubernetes] Cluster Overview dashboard in k8s integration

Please explain:

  • WHAT: A new dashboard that visualises the needed metrics for K8s Cluster observability
  • WHY: To support the user experience of Kubernetes Observability users that use OTEL collector

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

  1. Create a kind cluster kind create cluster
  2. Clone this repo
  3. Navigate to elastic/integrations/packages/kubernetes
  4. Build package by running elastic-package build
  5. Install the elastic-agent-otel distribution from https://github.com/elastic/k8scollector/blob/otel/helm/elastic-agent/elastic-otel-collector_all.yaml. Run kubectl apply -f elastic-otel-collector_all.yaml
  6. Spin up a new elastic-package stack up -d -v --version=8.15.0-SNAPSHOT
  7. The new Kubernetes 1.64.0 integration should be available. Install it
  8. Navigate to Dashboards and open [OTEL Kubernetes] Cluster Overview

Screenshots

Screenshot 2024-07-10 at 1 35 47 PM

"68bd6f9e-6894-4991-b2ff-fac3a4461b2b": {
"dataType": "number",
"isBucketed": false,
"label": "Average of k8s.node.cpu.utilization",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it is reported right now by the Collector is not correct. I'd suggest keeping this out for now or use the k8s.node.cpu.usage instead.

You can find more info at:

There is a fix for that at open-telemetry/opentelemetry-collector-contrib@e248353 but i'm not sure yet if we will first land this or if we should first deprecate and remove the metrics and then add the correct metric back.
You can follow open-telemetry/opentelemetry-collector-contrib#27885 for updates.

@gizas
Copy link
Contributor Author

gizas commented Jul 18, 2024

Adding another iteration

  • Updated Node visualisation with k8s.node.cpu.usage
  • Added tables for memory/ cpu utilisation. Those have coloured cells. Although there seems a problem with colouring. See memory in screenshot, it has red colour although value is 15%
  • Added network errors
  • Added node filesystem usage in the bottom

Screenshot 2024-07-18 at 5 29 51 PM

@elasticmachine
Copy link

elasticmachine commented Jul 18, 2024

💔 Build Failed

Failed CI Steps

History

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might miss some content here but something that looks unclear is which type of data this Dashboard aims to consume.

Is this going to use translated metrics coming from the Infra Metrics Processor along with OTel native metrics?

Since this is focused on OTel users and the translation lib emits both the translated and the OTel native ones, why not to use the native ones here directly?

So instead of kubernetes.pod.cpu.usage.limit.pct to use the k8s.pod.cpu_limit_utilization directly?

{
"embeddableConfig": {
"attributes": {
"description": "Average of 100 Top Pod CPU Usage based on kubernetes.pod.usage.limit.pct",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those dashboards aim OTel users the namings should meet the semantic convention expectations, using the Utilization or Usage according to the underlying metric's name/notion.

@andrewkroh andrewkroh added Integration:kubernetes Kubernetes Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team [elastic/obs-cloudnative-monitoring] labels Jul 19, 2024
@botelastic
Copy link

botelastic bot commented Aug 18, 2024

Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Aug 18, 2024
changes:
- description: Create General Kubernetes Dashboard [Otel] to support Opentelemetry observability
type: enhancement
link: https://github.com/elastic/integrations/pull/10406
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
link: https://github.com/elastic/integrations/pull/10406
link: https://github.com/elastic/integrations/pull/10443

@botelastic botelastic bot removed the Stalled label Aug 20, 2024
@andrewkroh andrewkroh added the dashboard Relates to a Kibana dashboard bug, enhancement, or modification. label Aug 30, 2024
@gizas gizas closed this Sep 4, 2024
@gizas
Copy link
Contributor Author

gizas commented Sep 4, 2024

Closed as work was done in different PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard Relates to a Kibana dashboard bug, enhancement, or modification. Integration:kubernetes Kubernetes Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team [elastic/obs-cloudnative-monitoring]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants