Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_process_metadata: container id: Do not cache whole cgroup file for each process #17351

Merged
merged 2 commits into from
Mar 31, 2020
Merged

add_process_metadata: container id: Do not cache whole cgroup file for each process #17351

merged 2 commits into from
Mar 31, 2020

Conversation

jtinkus
Copy link
Contributor

@jtinkus jtinkus commented Mar 31, 2020

  • Enhancement

What does this PR do?

Current solution caches whole content of cgroup file for each process. Let's cache only found container ids.

Also, current solution skips kube-proxy container processes, because cgroup path has suffix /kube-proxy.

Why is it important?

Caching such a pile of useless data has performance impact.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Relates #15947

Use cases

Screenshots

Logs

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@jtinkus
Copy link
Contributor Author

jtinkus commented Mar 31, 2020

Hello! @exekias , @BrookHF, @danmx
Minor improvements for container id functionality introduced with #15947 I'd like to discuss with you guys.

Firs one is that current solution in master caches huge amount of useless data - whole cgroup file content for all processes. Also it always processes that cached cgroup file content again and again to ask cid even if data is cached. Here i changed cache just to cache container id.

Second issue i discovered was that some kubernetes cgroup paths have suffix after cid. One i found was kube-proxy. For now i hardcoded it there and it works. My question is do you happen to know any other corner cases like that? Maybe I should add suffix also configurable....

With best regards!

@exekias exekias added enhancement review Team:Platforms Label for the Integrations - Platforms team labels Mar 31, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@exekias
Copy link
Contributor

exekias commented Mar 31, 2020

Thank you for opening this!

Second issue i discovered was that some kubernetes cgroup paths have suffix after cid. One i found was kube-proxy. For now i hardcoded it there and it works. My question is do you happen to know any other corner cases like that? Maybe I should add suffix also configurable....

Do you know what's the general rule, or why that one behaves differenty? I'm guessing it's because it's an static pod, managed in a different way. We should find the rule for these and include a generic solution

@jtinkus
Copy link
Contributor Author

jtinkus commented Mar 31, 2020

Completely agree with generic solution and that's what i'm planning to do.

Finding out the rule needs further investigations, that's why I opened discussion here too, maybe some of you knows about it.

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me, thank you for contributing! let's split concerns in different PRs so this can move forward

This won't need a changelog as it will be released with the previous PR

@exekias
Copy link
Contributor

exekias commented Mar 31, 2020

jenkins test this please

@exekias exekias added the needs_backport PR is waiting to be backported to other branches. label Mar 31, 2020
Co-Authored-By: Carlos Pérez-Aradros Herce <exekias@gmail.com>
@exekias
Copy link
Contributor

exekias commented Mar 31, 2020

ok to test

@jtinkus
Copy link
Contributor Author

jtinkus commented Mar 31, 2020

@exekias until i do not know exact rule for cgroup format (and it might not be written anywhere in balck and white, so we could search it forever :)), i see following options:

  1. assuming, that differences are in suffix part only and adding configurable suffix list too
  2. adding configurable alternative to use regex, something like in my matcher PR Adding pid matcher for add_kubernetes_metadata processor. #16549

What do you think?
Sorry for pushing it, it's kind of blocker for our team now in our project.
And starting to maintain our own version of beats sounds just stupid :)

@exekias
Copy link
Contributor

exekias commented Mar 31, 2020

I'm ok with adding custom patterns to the getCid function. If no pattern is given (none by default) it would use the current approach for automatically guessing. Patterns can be provided to override that behavior.

@jtinkus
Copy link
Contributor Author

jtinkus commented Mar 31, 2020

Ok i'm happy to add regex pattern as configurable alternative in separate PR.
But still with default we know that kube-proxy processes will have "kube-proxy" added as container id to event, and leaving it like this is behaviour already more like a bug not expected feature :)

My initial wording using "skips" was not the best. As it is returning basepath, as cid, then it returns 'kube-proxy'

@exekias
Copy link
Contributor

exekias commented Mar 31, 2020

Yes, I really expect we can investigate this a little more and come up with the generic solution 😄

@ChrsMark do you know how the kube-proxy cgroup name is generated by a chance? We can research this a little bit

@exekias exekias changed the title add_process_metadata: container id: Do not cache whole cgroup file for each process. Include kube-proxy c… add_process_metadata: container id: Do not cache whole cgroup file for each process Mar 31, 2020
@jtinkus
Copy link
Contributor Author

jtinkus commented Mar 31, 2020

@exekias is there anything else i need to do for completing this PR? Or from now on all automatic?

@exekias exekias merged commit 8a47788 into elastic:master Mar 31, 2020
exekias pushed a commit to exekias/beats that referenced this pull request Mar 31, 2020
…r each process (elastic#17351)

* Do not cache whole cgroup file for each process. Include kube-proxy cid too.

Co-authored-by: Jako Tinkus <jatinkus@microsoft.com>
(cherry picked from commit 8a47788)
@exekias exekias added v7.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Mar 31, 2020
exekias pushed a commit to exekias/beats that referenced this pull request Mar 31, 2020
…r each process (elastic#17351)

* Do not cache whole cgroup file for each process. Include kube-proxy cid too.

Co-authored-by: Jako Tinkus <jatinkus@microsoft.com>
(cherry picked from commit 8a47788)
@exekias exekias added the v7.8.0 label Mar 31, 2020
@jtinkus jtinkus deleted the container_id_cache_fix branch March 31, 2020 13:53
@ChrsMark
Copy link
Member

ChrsMark commented Mar 31, 2020

Yes, I really expect we can investigate this a little more and come up with the generic solution 😄

@ChrsMark do you know how the kube-proxy cgroup name is generated by a chance? We can research this a little bit

@exekias it is something like /sys/fs/cgroup/memory/kubepods/besteffort/pod{pod_uid}.

More details:

Getting kube-proxy uid:

~/$ kubectl -n kube-system get pod -l k8s-app=kube-proxy -o jsonpath="{.items[0].metadata.uid}"
42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de

Searching for kube-proxy containers:

$ docker ps | grep kube-proxy
4f6454506d5d        7d54289267dc                         "/usr/local/bin/kube…"   About an hour ago   Up About an hour                                                                         k8s_kube-proxy_kube-proxy-968n2_kube-system_42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de_19
e2d8064b3bc8        k8s.gcr.io/pause:3.1                 "/pause"                 About an hour ago   Up About an hour                                                                         k8s_POD_kube-proxy-968n2_kube-system_42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de_19

We see above that there are two containers one is the main container for kube-proxy and the extra one called pause (for namespace handling networkin etc. This one is appended to every single pod if I'm not mistaken)

Check cgroup resources:

#### main container

$ cat /sys/fs/cgroup/memory/kubepods/besteffort/pod42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de/4f6454506d5dc49d1a374c6e7fb39d4eea00efbe8c3a17cddb74f8341d5c667c/cgroup.procs 
5660

#### pause container

$ cat /sys/fs/cgroup/memory/kubepods/besteffort/pod42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de/e2d8064b3bc8f2d5ffa537dd6467303b39c0815765cc147fefa9292d4dfaf9ec/cgroup.procs 
5403

Note that k8s will group pods' cgroups according to the QoS policies besteffort, burstable etc

exekias pushed a commit that referenced this pull request Mar 31, 2020
…r each process (#17351) (#17362)

* Do not cache whole cgroup file for each process. Include kube-proxy cid too.

Co-authored-by: Jako Tinkus <jatinkus@microsoft.com>
(cherry picked from commit 8a47788)

Co-authored-by: jtinkus <35308202+jtinkus@users.noreply.github.com>
exekias pushed a commit that referenced this pull request Mar 31, 2020
…r each process (#17351) (#17361)

* Do not cache whole cgroup file for each process. Include kube-proxy cid too.

Co-authored-by: Jako Tinkus <jatinkus@microsoft.com>
(cherry picked from commit 8a47788)

Co-authored-by: jtinkus <35308202+jtinkus@users.noreply.github.com>
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…r each process (elastic#17351) (elastic#17361)

* Do not cache whole cgroup file for each process. Include kube-proxy cid too.

Co-authored-by: Jako Tinkus <jatinkus@microsoft.com>
(cherry picked from commit 76f4353)

Co-authored-by: jtinkus <35308202+jtinkus@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement review Team:Platforms Label for the Integrations - Platforms team v7.7.0 v7.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants