add_process_metadata: container id: Do not cache whole cgroup file for each process #17351

jtinkus · 2020-03-31T08:40:25Z

Enhancement

What does this PR do?

Current solution caches whole content of cgroup file for each process. Let's cache only found container ids.

Also, current solution skips kube-proxy container processes, because cgroup path has suffix /kube-proxy.

Why is it important?

Caching such a pile of useless data has performance impact.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Relates #15947

Use cases

Screenshots

Logs

…id too.

elasticmachine · 2020-03-31T08:40:27Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

elasticmachine · 2020-03-31T08:40:27Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

jtinkus · 2020-03-31T08:51:24Z

Hello! @exekias , @BrookHF, @danmx
Minor improvements for container id functionality introduced with #15947 I'd like to discuss with you guys.

Firs one is that current solution in master caches huge amount of useless data - whole cgroup file content for all processes. Also it always processes that cached cgroup file content again and again to ask cid even if data is cached. Here i changed cache just to cache container id.

Second issue i discovered was that some kubernetes cgroup paths have suffix after cid. One i found was kube-proxy. For now i hardcoded it there and it works. My question is do you happen to know any other corner cases like that? Maybe I should add suffix also configurable....

With best regards!

elasticmachine · 2020-03-31T08:53:12Z

Pinging @elastic/integrations-platforms (Team:Platforms)

exekias · 2020-03-31T08:54:27Z

Thank you for opening this!

Second issue i discovered was that some kubernetes cgroup paths have suffix after cid. One i found was kube-proxy. For now i hardcoded it there and it works. My question is do you happen to know any other corner cases like that? Maybe I should add suffix also configurable....

Do you know what's the general rule, or why that one behaves differenty? I'm guessing it's because it's an static pod, managed in a different way. We should find the rule for these and include a generic solution

jtinkus · 2020-03-31T09:01:03Z

Completely agree with generic solution and that's what i'm planning to do.

Finding out the rule needs further investigations, that's why I opened discussion here too, maybe some of you knows about it.

exekias

This is looking good to me, thank you for contributing! let's split concerns in different PRs so this can move forward

This won't need a changelog as it will be released with the previous PR

libbeat/processors/add_process_metadata/gosigar_cid_provider.go

exekias · 2020-03-31T09:23:59Z

jenkins test this please

Co-Authored-By: Carlos Pérez-Aradros Herce <exekias@gmail.com>

exekias · 2020-03-31T09:55:48Z

ok to test

jtinkus · 2020-03-31T09:57:31Z

@exekias until i do not know exact rule for cgroup format (and it might not be written anywhere in balck and white, so we could search it forever :)), i see following options:

assuming, that differences are in suffix part only and adding configurable suffix list too
adding configurable alternative to use regex, something like in my matcher PR Adding pid matcher for add_kubernetes_metadata processor. #16549

What do you think?
Sorry for pushing it, it's kind of blocker for our team now in our project.
And starting to maintain our own version of beats sounds just stupid :)

exekias · 2020-03-31T10:00:23Z

I'm ok with adding custom patterns to the getCid function. If no pattern is given (none by default) it would use the current approach for automatically guessing. Patterns can be provided to override that behavior.

jtinkus · 2020-03-31T10:09:32Z

Ok i'm happy to add regex pattern as configurable alternative in separate PR.
But still with default we know that kube-proxy processes will have "kube-proxy" added as container id to event, and leaving it like this is behaviour already more like a bug not expected feature :)

My initial wording using "skips" was not the best. As it is returning basepath, as cid, then it returns 'kube-proxy'

exekias · 2020-03-31T10:20:08Z

Yes, I really expect we can investigate this a little more and come up with the generic solution 😄

@ChrsMark do you know how the kube-proxy cgroup name is generated by a chance? We can research this a little bit

jtinkus · 2020-03-31T11:53:54Z

@exekias is there anything else i need to do for completing this PR? Or from now on all automatic?

…r each process (elastic#17351) * Do not cache whole cgroup file for each process. Include kube-proxy cid too. Co-authored-by: Jako Tinkus <jatinkus@microsoft.com> (cherry picked from commit 8a47788)

ChrsMark · 2020-03-31T14:04:45Z

Yes, I really expect we can investigate this a little more and come up with the generic solution 😄

@ChrsMark do you know how the kube-proxy cgroup name is generated by a chance? We can research this a little bit

@exekias it is something like /sys/fs/cgroup/memory/kubepods/besteffort/pod{pod_uid}.

More details:

Getting kube-proxy uid:

~/$ kubectl -n kube-system get pod -l k8s-app=kube-proxy -o jsonpath="{.items[0].metadata.uid}"
42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de

Searching for kube-proxy containers:

$ docker ps | grep kube-proxy
4f6454506d5d        7d54289267dc                         "/usr/local/bin/kube…"   About an hour ago   Up About an hour                                                                         k8s_kube-proxy_kube-proxy-968n2_kube-system_42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de_19
e2d8064b3bc8        k8s.gcr.io/pause:3.1                 "/pause"                 About an hour ago   Up About an hour                                                                         k8s_POD_kube-proxy-968n2_kube-system_42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de_19

We see above that there are two containers one is the main container for kube-proxy and the extra one called pause (for namespace handling networkin etc. This one is appended to every single pod if I'm not mistaken)

Check cgroup resources:

#### main container

$ cat /sys/fs/cgroup/memory/kubepods/besteffort/pod42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de/4f6454506d5dc49d1a374c6e7fb39d4eea00efbe8c3a17cddb74f8341d5c667c/cgroup.procs 
5660

#### pause container

$ cat /sys/fs/cgroup/memory/kubepods/besteffort/pod42ae03d6-e2b8-4eb5-8a1b-47c3c5b257de/e2d8064b3bc8f2d5ffa537dd6467303b39c0815765cc147fefa9292d4dfaf9ec/cgroup.procs 
5403

Note that k8s will group pods' cgroups according to the QoS policies besteffort, burstable etc

…r each process (#17351) (#17362) * Do not cache whole cgroup file for each process. Include kube-proxy cid too. Co-authored-by: Jako Tinkus <jatinkus@microsoft.com> (cherry picked from commit 8a47788) Co-authored-by: jtinkus <35308202+jtinkus@users.noreply.github.com>

…r each process (#17351) (#17361) * Do not cache whole cgroup file for each process. Include kube-proxy cid too. Co-authored-by: Jako Tinkus <jatinkus@microsoft.com> (cherry picked from commit 8a47788) Co-authored-by: jtinkus <35308202+jtinkus@users.noreply.github.com>

…r each process (elastic#17351) (elastic#17361) * Do not cache whole cgroup file for each process. Include kube-proxy cid too. Co-authored-by: Jako Tinkus <jatinkus@microsoft.com> (cherry picked from commit 76f4353) Co-authored-by: jtinkus <35308202+jtinkus@users.noreply.github.com>

Do not cache whole cgroup file for each process. Include kube-proxy c…

d3352f8

…id too.

exekias added enhancement review Team:Platforms Label for the Integrations - Platforms team labels Mar 31, 2020

exekias suggested changes Mar 31, 2020

View reviewed changes

libbeat/processors/add_process_metadata/gosigar_cid_provider.go Outdated Show resolved Hide resolved

exekias added the needs_backport PR is waiting to be backported to other branches. label Mar 31, 2020

Update libbeat/processors/add_process_metadata/gosigar_cid_provider.go

9d65682

Co-Authored-By: Carlos Pérez-Aradros Herce <exekias@gmail.com>

exekias approved these changes Mar 31, 2020

View reviewed changes

exekias changed the title ~~add_process_metadata: container id: Do not cache whole cgroup file for each process. Include kube-proxy c…~~ add_process_metadata: container id: Do not cache whole cgroup file for each process Mar 31, 2020

jtinkus mentioned this pull request Mar 31, 2020

Add regex based cid extractor to add_process_metadata #17360

Merged

5 tasks

exekias merged commit 8a47788 into elastic:master Mar 31, 2020

exekias mentioned this pull request Mar 31, 2020

Cherry-pick #17351 to 7.7: add_process_metadata: container id: Do not cache whole cgroup file for each process #17361

Merged

6 tasks

exekias added v7.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Mar 31, 2020

exekias mentioned this pull request Mar 31, 2020

Cherry-pick #17351 to 7.x: add_process_metadata: container id: Do not cache whole cgroup file for each process #17362

Merged

6 tasks

exekias added the v7.8.0 label Mar 31, 2020

jtinkus deleted the container_id_cache_fix branch March 31, 2020 13:53

exekias mentioned this pull request Apr 3, 2020

Cherry-pick #17360 to 7.x: Add regex based cid extractor to add_process_metadata #17462

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_process_metadata: container id: Do not cache whole cgroup file for each process #17351

add_process_metadata: container id: Do not cache whole cgroup file for each process #17351

jtinkus commented Mar 31, 2020 •

edited by andresrc

Loading

elasticmachine commented Mar 31, 2020

elasticmachine commented Mar 31, 2020

jtinkus commented Mar 31, 2020 •

edited

Loading

elasticmachine commented Mar 31, 2020

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020

exekias left a comment

exekias commented Mar 31, 2020

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020 •

edited

Loading

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020 •

edited

Loading

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020

ChrsMark commented Mar 31, 2020 •

edited

Loading

add_process_metadata: container id: Do not cache whole cgroup file for each process #17351

add_process_metadata: container id: Do not cache whole cgroup file for each process #17351

Conversation

jtinkus commented Mar 31, 2020 • edited by andresrc Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

elasticmachine commented Mar 31, 2020

elasticmachine commented Mar 31, 2020

jtinkus commented Mar 31, 2020 • edited Loading

elasticmachine commented Mar 31, 2020

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020

exekias left a comment

Choose a reason for hiding this comment

exekias commented Mar 31, 2020

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020 • edited Loading

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020 • edited Loading

exekias commented Mar 31, 2020

jtinkus commented Mar 31, 2020

ChrsMark commented Mar 31, 2020 • edited Loading

Getting kube-proxy uid:

Searching for kube-proxy containers:

Check cgroup resources:

jtinkus commented Mar 31, 2020 •

edited by andresrc

Loading

jtinkus commented Mar 31, 2020 •

edited

Loading

jtinkus commented Mar 31, 2020 •

edited

Loading

jtinkus commented Mar 31, 2020 •

edited

Loading

ChrsMark commented Mar 31, 2020 •

edited

Loading