Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubernetes] Add last_terminated_reason_timestamp metric #39200

Conversation

tetianakravchenko
Copy link
Contributor

@tetianakravchenko tetianakravchenko commented Apr 24, 2024

Proposed commit message

Please explain:

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

  1. create local k8s cluster: kind create cluster
  2. install stack: elastic-package-0.98.2 stack up -d -v --version=8.14.0-SNAPSHOT
  3. edit dev-tools/kubernetes/Tiltfile to run in mode="run" and run:
cd dev-tools/kubernetes
tilt up
  1. Create a pod with a low resources.limits.memory to cause OOMKilled

Related issues

Use cases

Screenshots

Screenshot 2024-04-26 at 14 30 47

Logs

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@tetianakravchenko tetianakravchenko requested a review from a team as a code owner April 24, 2024 17:12
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 24, 2024
@tetianakravchenko tetianakravchenko added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Apr 24, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 24, 2024
Copy link
Contributor

mergify bot commented Apr 24, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @tetianakravchenko? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 24, 2024

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 100 min 45 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@gizas
Copy link
Contributor

gizas commented Apr 25, 2024

Tania minor notes:

  • Add some steps on how to test this PR.
  • Add a screenshot to prove that field is present
  • Add a link for the relevant elastic-agent story where this field needs also to be added

Do you think shall we also add:
kube_pod_container_status_last_terminated_exitcode? I think the code is useful as well

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
"kube_pod_container_status_terminated_reason": p.LabelMetric("status.reason", "reason"),
"kube_pod_container_status_waiting_reason": p.LabelMetric("status.reason", "reason"),
"kube_pod_container_status_last_terminated_reason": p.LabelMetric("status.last_terminated_reason", "reason"),
"kube_pod_container_status_last_terminated_timestamp": p.Metric("status.last_terminated_reason.sec"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you map kube_pod_container_status_last_terminated_timestamp ksm metric to elasticsearch field kubernetes.container.status.last_terminated_reason.sec. But then , the field you declare in the fields.yaml is last_terminated_reason_timestamp. Am I missing something?

Copy link
Contributor Author

@tetianakravchenko tetianakravchenko Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake, reverted this - 180afdf

It is a leftover, I was considering other naming, not really sure regarding the name last_terminated_reason_timestamp. For example in state_cronjob - https://github.com/elastic/beats/blob/main/metricbeat/module/kubernetes/state_cronjob/state_cronjob.go#L33-L34 is used *.sec for time metrics, but then I figured out that there is not really a pattern for time metrics, for example:
job - time.* (type of the field: data)
statefulset - no time or sec in name (type: long)
storageclass - no time or sec in name (type: date)
namespace - created.sec, type: double

do you have any thoughts on naming?
maybe use last_terminated (similar to *.created with the type: double) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the field status.last_terminated_reason that corresponds to kube_pod_container_status_last_terminated_reason.
If we follow the same convention for kube_pod_container_status_last_terminated_timestamp let's name the field status.last_terminated_timestamp. There is no need to add the reason in the field name as it is just the timestamp of the last termination. It is not related with the reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 90d82b6

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@tetianakravchenko
Copy link
Contributor Author

Tania minor notes:

Add some steps on how to test this PR.
Add a screenshot to prove that field is present
Add a link for the relevant elastic-agent story where this field needs also to be added

@gizas I've added steps to test and the screenshot
regarding elastic-agent story - do you mean to add new field to the k8s integration? I've listed it elastic/elastic-agent#3802 (comment)

Do you think shall we also add:
kube_pod_container_status_last_terminated_exitcode? I think the code is useful as well

can you provide the use case? we already provide the last_terminated_reason and if this metric is present it already implies that some issue happened

@gizas
Copy link
Contributor

gizas commented Apr 26, 2024

@gizas I've added steps to test and the screenshot
Nice!

regarding elastic-agent story - do you mean to add new field to the k8s integration? I've listed it
Ok

can you provide the use case? we already provide the last_terminated_reason and if this metric is present it already implies that some issue happened

I was thinking that the code is a number and probably for those who use APIs is better to check for specific code and if is present then to return it. But not so important for now as not specifically asked. Skip it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants