Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring perf events - chapter I #2419

Merged
merged 46 commits into from
Apr 9, 2020
Merged

Conversation

iwankgb
Copy link
Collaborator

@iwankgb iwankgb commented Mar 6, 2020

Purpose of this PR is to introduce support for measuring perf events to cAdvisor. It will eventually consist of the following:

  • Configuration primitives allowing to describe what events are to be measured (user-friendly, using libpfm to translate event names to perf_event_attr
  • Perf events collector (ungrouped core events support)
  • Perf events manager responsible to container lifecycle control (ungrouped core events support)

This is part of effort discussed in #2388

@k8s-ci-robot
Copy link
Collaborator

Hi @iwankgb. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dashpole
Copy link
Collaborator

dashpole commented Mar 6, 2020

/ok-to-test

@iwankgb iwankgb force-pushed the perf_events_support branch 2 times, most recently from fc92381 to d07e06c Compare March 12, 2020 10:53
@iwankgb
Copy link
Collaborator Author

iwankgb commented Mar 13, 2020

@dashpole, I would appreciate if you looked at this very PR. At the moment it consists of documentation and proposed configuration API. Before proceeding with further changes I want to know what you think about it and make sure that we can agree on a user-friendly implementation.

Originally, we were planning to use runc to handle perf subsystem but to do so we would have to update all the projects that interact with runc including projects that interact indirectly. For instance, if we updated runc configuration and OCI runtime spec then we would have to update containerd too and Kubernetes' CRI so that it is possible to pass perf configuration from PodSpec through CRI to containerd that would pass it to runc using OCI runtime spec.

We believe that configuring perf subsystem in cAdvisor still makes sense as this subsystem is not responsible for any resource allocation but purely for monitoring. We also assume that same events will be measured for all the containers.

@iwankgb iwankgb force-pushed the perf_events_support branch 3 times, most recently from 47b3ace to daf1da4 Compare March 17, 2020 10:19
@iwankgb iwankgb force-pushed the perf_events_support branch 3 times, most recently from 8dd743e to 42600b5 Compare March 17, 2020 19:05
@dashpole
Copy link
Collaborator

Whoops, missed your comment earlier... I had been ignoring it because of the WIP tag. I'll take a look today

@dashpole
Copy link
Collaborator

Originally, we were planning to use runc to handle perf subsystem but to do so we would have to update all the projects that interact with runc including projects that interact indirectly. For instance, if we updated runc configuration and OCI runtime spec then we would have to update containerd too and Kubernetes' CRI so that it is possible to pass perf configuration from PodSpec through CRI to containerd that would pass it to runc using OCI runtime spec.

Yeah, I doubt it would be accepted in the pod spec. Do you know if perf events could be enabled using OCI hooks? Those seem to be the standard place to do container configuration.

We believe that configuring perf subsystem in cAdvisor still makes sense as this subsystem is not responsible for any resource allocation but purely for monitoring. We also assume that same events will be measured for all the containers.

I definitely agree that cAdvisor collecting these events would be useful. I have seen some pretty neat sidecar-container perf tools, so I know these metrics can be useful. I was initially surprised that we would want to measure the same events across all containers, as I assumed it would hurt application performance. But from some reading, it doesn't seem to be that bad...

We haven't done a great job in the past with having this sort of configuration in cAdvisor. If there were a way to just have cAdvisor collect the metrics, and have something else do the configuration, that would fit more cleanly within our scope. But absent that, I think this is a reasonable feature to add.

@iwankgb
Copy link
Collaborator Author

iwankgb commented Mar 26, 2020

Sorry for the delayed reply, I had to take few days off.
@dashpole, I don't think it is feasible (while it's perfectly possible) to handle perf events configuration in an OCI hook. In order to do so one would have to call perf_event_open syscall in the hook and then do one of the following:

  • pass returned file descriptor to cAdvisor - doable but imperfect (only containers launched after launching cAdvisor could be monitored or we would have to create a long-running hook process)
  • you OCI hook to launch a sidecar container responsible for perf events handling - this is scenario that we want to deprecate by adding perf support to cAdvisor.
    I can't think about any other ways of handling configuration step outside of cAdvisor.

What do you think about providing two configuration schemas:

  • advanced - as already described in the PR
  • simple - limiting number of events to number of available counters, using only named events, no grouping support?

```

In the example above number:
* `INST_RETIRED.ANY_P` (number of instructions retired on 2nd Generation Intel® Xeon® Scalable Processor) identified by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are events specific to processors? Are there any common ones that nearly everyone would be able to use?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, events are different even between various generation of Xeons. If you take AMD's Naples and Rome CPUs you will see similar differences not mentioning other architectures such as ARM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed one of your questions: yes, some of the events (e.g. instructions retired, cycles) are supported across all or majority of Linux architectures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Once we have this in, it would be useful to have an example configmap in deploy/kubernetes as a follow-up, so it is easy for users to get started with the feature.

perf/config.go Outdated Show resolved Hide resolved
perf/config.go Outdated Show resolved Hide resolved
docs/runtime_options.md Outdated Show resolved Hide resolved
@dashpole
Copy link
Collaborator

As long as we have good documentation and examples, I prefer having a single configuration option.

I thought about it after suggesting it, and the file-descriptor API makes it pretty much impossible to implement cleanly.

In general, this is probably the right direction to go.

"blkio": {},
"io": {},
"devices": {},
"perf_event": {},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this exist and work similarly in cgroups v2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does exists and I believe it is similar but I have not investigated it yet.

@iwankgb
Copy link
Collaborator Author

iwankgb commented Mar 30, 2020

A quick note after chat with @dashpole on Slack:

  • It's fine to use libpfm to make perf event configuration more user friendly
  • It must be possible to build cAdvisor without cgo and/or libpfm
  • There must be clear message (perhaps a failure to start cAdvisor) if it's built without cgo and/or libpfmsupport and user tries to launch perf event monitoring.

@iwankgb iwankgb force-pushed the perf_events_support branch 8 times, most recently from b16b9a4 to 45d4ec4 Compare April 3, 2020 17:25
Maciej "Iwan" Iwanowski added 19 commits April 9, 2020 18:41
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
manager/manager.go Outdated Show resolved Hide resolved
perf/collector_libpfm.go Outdated Show resolved Hide resolved
docs/runtime_options.md Outdated Show resolved Hide resolved
@dashpole
Copy link
Collaborator

dashpole commented Apr 9, 2020

After this is rebased, I think we can merge and iterate from there.

Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
@iwankgb iwankgb force-pushed the perf_events_support branch from bbfaafd to e1d8e05 Compare April 9, 2020 16:49
@iwankgb
Copy link
Collaborator Author

iwankgb commented Apr 9, 2020

@dashpole rebased.

Signed-off-by: Maciej "Iwan" Iwanowski <maciej.iwanowski@intel.com>
Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dashpole
Copy link
Collaborator

dashpole commented Apr 9, 2020

The remaining follow-ups i'm aware of are:

  1. handle groups
  2. example kubernetes configmap
  3. cgroups v2

@dashpole dashpole merged commit e7efc0a into google:master Apr 9, 2020
@iwankgb iwankgb deleted the perf_events_support branch April 9, 2020 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants