Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inputs.linux_cpu): Add CPUFreq input plugin for Linux (v3) #8988

Merged
merged 1 commit into from
Aug 24, 2022
Merged

feat(inputs.linux_cpu): Add CPUFreq input plugin for Linux (v3) #8988

merged 1 commit into from
Aug 24, 2022

Conversation

fabianishere
Copy link
Contributor

This pull request adds an input plugin for monitoring Linux' CPUFreq subsystem. This pull request builds upon the work #4215 and #8342, but given that they haven't seen any activity in a while, I would like to complete the integration of this plugin into the Telegraf repository.

Currently, the plugin only exposes cpufreq/scaling_cur_freq, cpufreq/scaling_min_freq, cpufreq/scaling_cur_freq and thermal_throttle/core_throttle_count. If desirable, we could also add other fields exposed by the CPUFreq subsystem.
Question to the maintainers: the thermal_throttle/core_throttle_count field is a bit out of place in this plugin, since it is not part of the CPUFreq subsystem in Linux. We could limit this plugin to only expose properties from the CPUFreq subsystem and have a separate plugin for thermal_throttle or we could emit all properties that Linux exposes per CPU (and rename the plugin to cpulinux for instance)?

I have optimized the implementation to reduce the number of syscalls per gather cycle. However, a limitation of this implementation is that it does not support hotplugging CPUs, though I don't expect this to affect many people.

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Resolves #2601
Resolves #4256

@telegraf-tiger telegraf-tiger bot added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Mar 15, 2021
Copy link
Contributor

@telegraf-tiger telegraf-tiger bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝 ✅ CLA has been signed. Thank you!

Copy link
Collaborator

@zak-pawel zak-pawel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, check few minor comments.
It would be great to see a few unit tests for this plugin.

plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/README.md Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
@fabianishere
Copy link
Contributor Author

@zak-pawel Thanks for the comments. I have tried to address them.

Do you have any advice regarding the following question:

the thermal_throttle/core_throttle_count field is a bit out of place in this plugin, since it is not part of the CPUFreq subsystem in Linux. We could limit this plugin to only expose properties from the CPUFreq subsystem and have a separate plugin for thermal_throttle or we could emit all properties that Linux exposes per CPU (and rename the plugin to cpulinux for instance)?

I might be interested in tracking other CPU-related metrics exposed by Linux but that are not part of the CPUFreq subsystem (like thermal_throttle/core_throttle_count)? Do you prefer to have separate input plugins for each of the CPU subsystems or would a single input for all per-CPU metrics exposed by Linux be a better approach for Telegraf?

plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/cpufreq.go Outdated Show resolved Hide resolved
plugins/inputs/cpufreq/README.md Outdated Show resolved Hide resolved
@zak-pawel
Copy link
Collaborator

@zak-pawel Thanks for the comments. I have tried to address them.

Do you have any advice regarding the following question:

the thermal_throttle/core_throttle_count field is a bit out of place in this plugin, since it is not part of the CPUFreq subsystem in Linux. We could limit this plugin to only expose properties from the CPUFreq subsystem and have a separate plugin for thermal_throttle or we could emit all properties that Linux exposes per CPU (and rename the plugin to cpulinux for instance)?

I might be interested in tracking other CPU-related metrics exposed by Linux but that are not part of the CPUFreq subsystem (like thermal_throttle/core_throttle_count)? Do you prefer to have separate input plugins for each of the CPU subsystems or would a single input for all per-CPU metrics exposed by Linux be a better approach for Telegraf?

Hard to say... What metrics can be gathered from thermal_throttle? Is thermal_throttle available widely in Linux? Can it be enabled/disabled?
@ssoroka @reimda What do you think?

@fabianishere
Copy link
Contributor Author

fabianishere commented Mar 15, 2021

@zak-pawel thermal_throttle is exposes on Intel systems a couple metrics that might indicate that the CPU is being thermally throttled. One of the previous authors added the core_throttle_count field, which counts the amount of thermal throttle events reported by the CPU.

Perhaps to give a more concrete example of my question: Linux also exposes information about the idle states via /sys/devices/system/cpu/cpuX/cpuidle (e.g., C-state residency, similar to what's reported by intel_powerstat, but portable), which I would like to track. It would not be much work to support these metrics in this input plugin, but if we do so the name cpufreq for the plugin might not be appropriate.

@fabianishere
Copy link
Contributor Author

@zak-pawel Is there still anything I can do from my side?

@zak-pawel
Copy link
Collaborator

@zak-pawel Is there still anything I can do from my side?

I don't think so :) Just waiting for some comments from @reimda / @ssoroka about thermal_throttle and other subsytems you had mentioned before.

@reimda
Copy link
Contributor

reimda commented Mar 23, 2021

Hi Fabian, thanks for picking up #4215. It makes sense to me for this plugin to include things like /sys/devices/system/cpu/cpuX/cpuidle that are outside of the cpufreq linux subsystem. In that case I agree a more general name would be useful to quickly communicate to the user what the plugin does.

Would the name linux_cpu be more appropriate? There's already a linux_sysctl_fs plugin that matches this naming scheme.

cpu_linux is another option. It has the advantage of showing up next to the cpu plugin in sorted list, so it might be more visible. I think I prefer linux_cpu.

Both these options position this plugin as the place for linux specific metrics. That leaves the cpu plugin as the place for cross platform metrics (through gopsutil). It would be great to have only one cpu plugin but I don't think it's practical to expect gopsutil to provide the metrics this plugin does.

If this is going to be the place for all linux cpu metrics, I would expect the field names to match linux terms. For example, scaling_cur_freq instead of current_khz.

@fabianishere
Copy link
Contributor Author

fabianishere commented Mar 25, 2021

@reimda I agree with your points. My plan is to rename the plugin to linux_cpu and start adding other metrics that Linux exposes. I will work on this the coming weekend, so you can expect an update soon after.

@fabianishere fabianishere requested a review from zak-pawel March 31, 2021 19:53
@fabianishere
Copy link
Contributor Author

@zak-pawel I have re-requested your review. Although not all possible metrics are included yet, I want to get your opinion on the current design. I have two questions in particular:

  1. When should the plugin fail? It could be the case that some metrics do not exist on the system, does that mean that we fail all together?
  2. To what granularity should the user be able to control the metrics that are being collected by the plugin? I can imagine that some users are not interested in tracking all possible metrics and might want to reduce overhead and the amount of data collected.

Copy link
Collaborator

@zak-pawel zak-pawel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think that plugin should fail (during Init()) when there are no metrics to gather at all. For other cases, it would be good to gather what is possible.
  2. I think that these two flags (which should be converted to array of strings in config ;)) will control the metrics that are being collected by the plugin just fine. But I would fail as soon as possible if both of them are not enabled (== no metrics to gather).

plugins/inputs/all/all.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/README.md Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/README.md Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/README.md Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/README.md Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Show resolved Hide resolved
Copy link
Collaborator

@zak-pawel zak-pawel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still without unit tests :(

plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
@fabianishere
Copy link
Contributor Author

Thanks @zak-pawel for the review. I have been a bit busy these weeks with other things.

Given that I don't have a lot of experience with Go, I am still researching what's the best way to test the plugin. I am probably going to approach it the same way as the intel_powerstat plugin where they mocked the FS interface. Let me know if you have any suggestions yourself!

@powersj powersj added the help wanted Request for community participation, code, contribution label Oct 26, 2021
@sjwang90
Copy link
Contributor

@fabianishere Do you have a chance to continue working on this? Or else we'll see if someone else wants to take it over.

@fabianishere
Copy link
Contributor Author

I was planning on finishing this at the start of next year. Currently rather busy with wrapping up some other things.

@antst
Copy link

antst commented Feb 8, 2022

When run as unprivileged user, this would have no permissions to read /sys/devices/system/cpu/cpuXX/cpufreq/cpuinfo_cur_freq on fresh kernels , but because file exists, it will keep trying to read and pollute logs.
validatePath() should also check for permissions.
Either really trying to open file, or, at least, check "o+r" flag in FileMode ( mode&(1<<2) should be not 0 )

@fabianishere
Copy link
Contributor Author

@antst Good catch! I’ll probably go with just trying to open the file.

@sspaink sspaink changed the title Add CPUFreq input plugin for Linux (v3) feat(inputs.linux_cpu): Add CPUFreq input plugin for Linux (v3) Mar 23, 2022
@powersj
Copy link
Contributor

powersj commented Aug 4, 2022

@zak-pawel I think this is ready for another review since you last looked at it? Is that true?

@powersj powersj added waiting for response waiting for response from contributor and removed help wanted Request for community participation, code, contribution labels Aug 4, 2022
@zak-pawel
Copy link
Collaborator

@zak-pawel I think this is ready for another review since you last looked at it? Is that true?

I would love to see few unit tests :)

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 4, 2022
@zak-pawel zak-pawel added the waiting for response waiting for response from contributor label Aug 4, 2022
@fabianishere
Copy link
Contributor Author

@zak-pawel Sorry for the delay, I was quite busy with some other work last months.

I have rebased the changes onto the master branch and added a few unit tests. Let me know if you want me to cover any additional cases.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 11, 2022
plugins/inputs/linux_cpu/README.md Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu_test.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu_test.go Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
plugins/inputs/linux_cpu/linux_cpu.go Outdated Show resolved Hide resolved
This change adds to Telegraf an input plugin that tracks the per-CPU metrics
exposed by the Linux kernel.

Currently, the plugin only supports a few properties exposed by the
CPUFreq subsystem as well as by thermal_throttle.
@telegraf-tiger
Copy link
Contributor

@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Aug 23, 2022
Copy link
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Thank you so much for keeping this pull request alive, I think it looks great.

@zak-pawel is out, but it looks like all the comments have been resolved. If there are any other changes I think we should handle them in a separate PR, merging.

@reimda reimda merged commit 7f3395f into influxdata:master Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add cpu_speed (in mhz) to cpu or system measurement Fetch CPU frequency states
7 participants