Skip to content

Commit

Permalink
feat(intel_powerstat): add Max Turbo Frequency and introduce improvem…
Browse files Browse the repository at this point in the history
…ents (#11035)
  • Loading branch information
bkotlowski authored May 23, 2022
1 parent 4f972da commit df3e9ec
Show file tree
Hide file tree
Showing 11 changed files with 658 additions and 182 deletions.
147 changes: 93 additions & 54 deletions plugins/inputs/intel_powerstat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,25 @@ to take preventive/corrective actions based on platform busyness, CPU temperatur
```toml
# Intel PowerStat plugin enables monitoring of platform metrics (power, TDP) and per-CPU metrics like temperature, power and utilization.
[[inputs.intel_powerstat]]
## All global metrics are always collected by Intel PowerStat plugin.
## User can choose which per-CPU metrics are monitored by the plugin in cpu_metrics array.
## Empty array means no per-CPU specific metrics will be collected by the plugin - in this case only platform level
## telemetry will be exposed by Intel PowerStat plugin.
## The user can choose which package metrics are monitored by the plugin with the package_metrics setting:
## - The default, will collect "current_power_consumption", "current_dram_power_consumption" and "thermal_design_power"
## - Setting this value to an empty array means no package metrics will be collected
## - Finally, a user can specify individual metrics to capture from the supported options list
## Supported options:
## "cpu_frequency", "cpu_busy_frequency", "cpu_temperature", "cpu_c1_state_residency", "cpu_c6_state_residency", "cpu_busy_cycles"
## "current_power_consumption", "current_dram_power_consumption", "thermal_design_power", "max_turbo_frequency"
# package_metrics = ["current_power_consumption", "current_dram_power_consumption", "thermal_design_power"]

## The user can choose which per-CPU metrics are monitored by the plugin in cpu_metrics array.
## Empty or missing array means no per-CPU specific metrics will be collected by the plugin.
## Supported options:
## "cpu_frequency", "cpu_c0_state_residency", "cpu_c1_state_residency", "cpu_c6_state_residency", "cpu_busy_cycles", "cpu_temperature", "cpu_busy_frequency"
## ATTENTION: cpu_busy_cycles option is DEPRECATED - superseded by cpu_c0_state_residency
# cpu_metrics = []
```

## Example: Configuration with no per-CPU telemetry

This configuration allows getting global metrics (processor package specific), no per-CPU metrics are collected:
This configuration allows getting default processor package specific metrics, no per-CPU metrics are collected:

```toml
[[inputs.intel_powerstat]]
Expand All @@ -31,28 +38,39 @@ This configuration allows getting global metrics (processor package specific), n

## Example: Configuration with no per-CPU telemetry - equivalent case

This configuration allows getting global metrics (processor package specific), no per-CPU metrics are collected:
This configuration allows getting default processor package specific metrics, no per-CPU metrics are collected:

```toml
[[inputs.intel_powerstat]]
```

## Example: Configuration for CPU Temperature and CPU Frequency

This configuration allows getting default processor package specific metrics, plus subset of per-CPU metrics (CPU Temperature and CPU Frequency):

```toml
[[inputs.intel_powerstat]]
cpu_metrics = ["cpu_frequency", "cpu_temperature"]
```

## Example: Configuration for CPU Temperature and Frequency only
## Example: Configuration for CPU Temperature and CPU Frequency without default package metrics

This configuration allows getting global metrics plus subset of per-CPU metrics (CPU Temperature and Current Frequency):
This configuration allows getting only a subset of per-CPU metrics (CPU Temperature and CPU Frequency):

```toml
[[inputs.intel_powerstat]]
package_metrics = []
cpu_metrics = ["cpu_frequency", "cpu_temperature"]
```

## Example: Configuration with all available metrics

This configuration allows getting global metrics and all per-CPU metrics:
This configuration allows getting all processor package specific metrics and all per-CPU metrics:

```toml
[[inputs.intel_powerstat]]
cpu_metrics = ["cpu_frequency", "cpu_busy_frequency", "cpu_temperature", "cpu_c1_state_residency", "cpu_c6_state_residency", "cpu_busy_cycles"]
package_metrics = ["current_power_consumption", "current_dram_power_consumption", "thermal_design_power", "max_turbo_frequency"]
cpu_metrics = ["cpu_frequency", "cpu_busy_frequency", "cpu_temperature", "cpu_c0_state_residency", "cpu_c1_state_residency", "cpu_c6_state_residency"]
```

## SW Dependencies
Expand All @@ -66,11 +84,17 @@ The following dependencies are expected by plugin:

Minimum kernel version required is 3.13 to satisfy all requirements.

Please make sure that kernel modules are loaded and running. You might have to manually enable them by using `modprobe`.
Exact commands to be executed are:
Please make sure that kernel modules are loaded and running (cpufreq is integrated in kernel). Modules might have to be manually enabled by using `modprobe`.
Depending on the kernel version, run commands:

```sh
sudo modprobe cpufreq-stats
# kernel 5.x.x:
sudo modprobe rapl
subo modprobe msr
sudo modprobe intel_rapl_common
sudo modprobe intel_rapl_msr

# kernel 4.x.x:
sudo modprobe msr
sudo modprobe intel_rapl
```
Expand All @@ -80,9 +104,13 @@ to retrieve data for calculation of most critical per-CPU specific metrics:

- `cpu_busy_frequency_mhz`
- `cpu_temperature_celsius`
- `cpu_c0_state_residency_percent`
- `cpu_c1_state_residency_percent`
- `cpu_c6_state_residency_percent`
- `cpu_busy_cycles_percent`

and to retrieve data for calculation per-package specific metric:

- `max_turbo_frequency_mhz`

To expose other Intel PowerStat metrics root access may or may not be required (depending on OS type or configuration).

Expand All @@ -99,13 +127,13 @@ The following processor properties are required by the plugin:
model specific registers for all features
- The following processor flags shall be present:
- "_msr_" shall be present for plugin to read platform data from processor model specific registers and collect
the following metrics: _powerstat_core.cpu_temperature_, _powerstat_core.cpu_busy_frequency_,
_powerstat_core.cpu_busy_cycles_, _powerstat_core.cpu_c1_state_residency_, _powerstat_core._cpu_c6_state_residency_
- "_aperfmperf_" shall be present to collect the following metrics: _powerstat_core.cpu_busy_frequency_,
_powerstat_core.cpu_busy_cycles_, _powerstat_core.cpu_c1_state_residency_
- "_dts_" shall be present to collect _powerstat_core.cpu_temperature_
- Processor _Model number_ must be one of the following values for plugin to read _powerstat_core.cpu_c1_state_residency_
and _powerstat_core.cpu_c6_state_residency_ metrics:
the following metrics: _powerstat\_core.cpu\_temperature_, _powerstat\_core.cpu\_busy\_frequency_,
_powerstat\_core.cpu\_c0\_state\_residency_, _powerstat\_core.cpu\_c1\_state\_residency_, _powerstat\_core.cpu\_c6\_state\_residency_
- "_aperfmperf_" shall be present to collect the following metrics: _powerstat\_core.cpu\_busy\_frequency_,
_powerstat\_core.cpu\_c0\_state\_residency_, _powerstat\_core.cpu\_c1\_state\_residency_
- "_dts_" shall be present to collect _powerstat\_core.cpu\_temperature_
- Processor _Model number_ must be one of the following values for plugin to read _powerstat\_core.cpu\_c1\_state\_residency_
and _powerstat\_core.cpu\_c6\_state\_residency_ metrics:

| Model number | Processor name |
|-----|-------------|
Expand Down Expand Up @@ -168,61 +196,72 @@ When starting to measure metrics, plugin skips first iteration of metrics if the

- The following Tags are returned by plugin with powerstat_core measurements:

```text
| Tag | Description |
|-----|-------------|
| `package_id` | ID of platform package/socket |
| `core_id` | ID of physical processor core |
| `cpu_id` | ID of logical processor core |
| Tag | Description |
|--------------|-------------------------------|
| `package_id` | ID of platform package/socket |
| `core_id` | ID of physical processor core |
| `cpu_id` | ID of logical processor core |

Measurement powerstat_core metrics are collected per-CPU (cpu_id is the key)
while core_id and package_id tags are additional topology information.
```

- Available metrics for powerstat_core measurement

```text
| Metric name (field) | Description | Units |
|-----|-------------|-----|
| `cpu_frequency_mhz` | Current operational frequency of CPU Core | MHz |
| `cpu_busy_frequency_mhz` | CPU Core Busy Frequency measured as frequency adjusted to CPU Core busy cycles | MHz |
| `cpu_temperature_celsius` | Current temperature of CPU Core | Celsius degrees |
| `cpu_c1_state_residency_percent` | Percentage of time that CPU Core spent in C1 Core residency state | % |
| `cpu_c6_state_residency_percent` | Percentage of time that CPU Core spent in C6 Core residency state | % |
| `cpu_busy_cycles_percent` | CPU Core Busy cycles as a ratio of Cycles spent in C0 state residency to all cycles executed by CPU Core | % |
```
| Metric name (field) | Description | Units |
|---------------------|-------------|-------|
| `cpu_frequency_mhz` | Current operational frequency of CPU Core | MHz |
| `cpu_busy_frequency_mhz` | CPU Core Busy Frequency measured as frequency adjusted to CPU Core busy cycles | MHz |
| `cpu_temperature_celsius` | Current temperature of CPU Core | Celsius degrees |
| `cpu_c0_state_residency_percent` | Percentage of time that CPU Core spent in C0 Core residency state | % |
| `cpu_c1_state_residency_percent` | Percentage of time that CPU Core spent in C1 Core residency state | % |
| `cpu_c6_state_residency_percent` | Percentage of time that CPU Core spent in C6 Core residency state | % |
| `cpu_busy_cycles_percent` | (**DEPRECATED** - superseded by cpu_c0_state_residency_percent) CPU Core Busy cycles as a ratio of Cycles spent in C0 state residency to all cycles executed by CPU Core | % |

- powerstat_package

- The following Tags are returned by plugin with powerstat_package measurements:

```text
| Tag | Description |
|-----|-------------|
| `package_id` | ID of platform package/socket |
Measurement powerstat_package metrics are collected per processor package -_package_id_ tag indicates which
package metric refers to.
```
| Tag | Description |
|-----|-------------|
| `package_id` | ID of platform package/socket |
| `active_cores`| Specific tag for `max_turbo_frequency_mhz` metric. The maximum number of activated cores for reachable turbo frequency

Measurement powerstat_package metrics are collected per processor package -_package_id_ tag indicates which package metric refers to.

- Available metrics for powerstat_package measurement

```text
| Metric name (field) | Description | Units |
|-----|-------------|-----|
| `thermal_design_power_watts` | Maximum Thermal Design Power (TDP) available for processor package | Watts |
| `current_power_consumption_watts` | Current power consumption of processor package | Watts |
| `current_dram_power_consumption_watts` | Current power consumption of processor package DRAM subsystem | Watts |
```
| Metric name (field) | Description | Units |
|-----|-------------|-----|
| `thermal_design_power_watts` | Maximum Thermal Design Power (TDP) available for processor package | Watts |
| `current_power_consumption_watts` | Current power consumption of processor package | Watts |
| `current_dram_power_consumption_watts` | Current power consumption of processor package DRAM subsystem | Watts |
| `max_turbo_frequency_mhz`| Maximum reachable turbo frequency for number of cores active | MHz

### Known issues

From linux kernel version v5.4.77 with [this kernel change](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.4.77&id=19f6d91bdad42200aac557a683c17b1f65ee6c94)
resources like `/sys/class/powercap/intel-rapl*/*/energy_uj` are readable only by root for security reasons, so this plugin needs root privileges to work properly.

If such strict security restrictions are not relevant, reading permissions to files in `/sys/devices/virtual/powercap/intel-rapl/`
directory can be manually changed for example with `chmod` command with custom parameters.
For example to give all users permission to all files in `intel-rapl` directory:

```bash
sudo chmod -R a+rx /sys/devices/virtual/powercap/intel-rapl/
```

### Example Output

```shell
powerstat_package,host=ubuntu,package_id=0 thermal_design_power_watts=160 1606494744000000000
powerstat_package,host=ubuntu,package_id=0 current_power_consumption_watts=35 1606494744000000000
powerstat_package,host=ubuntu,package_id=0 current_dram_power_consumption_watts=13.94 1606494744000000000
powerstat_package,host=ubuntu,package_id=0,active_cores=0 max_turbo_frequency_mhz=3000i 1606494744000000000
powerstat_package,host=ubuntu,package_id=0,active_cores=1 max_turbo_frequency_mhz=2800i 1606494744000000000
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_frequency_mhz=1200.29 1606494744000000000
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_temperature_celsius=34i 1606494744000000000
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_c6_state_residency_percent=92.52 1606494744000000000
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_busy_cycles_percent=0.8 1606494744000000000
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_c1_state_residency_percent=6.68 1606494744000000000
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_c0_state_residency_percent=0.8 1606494744000000000
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_busy_frequency_mhz=1213.24 1606494744000000000
```
4 changes: 2 additions & 2 deletions plugins/inputs/intel_powerstat/dto.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ type msrData struct {
c3 uint64
c6 uint64
c7 uint64
throttleTemp uint64
temp uint64
throttleTemp int64
temp int64
mperfDelta uint64
aperfDelta uint64
timeStampCounterDelta uint64
Expand Down
19 changes: 19 additions & 0 deletions plugins/inputs/intel_powerstat/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,22 @@ func (fs *fileServiceImpl) readFileAtOffsetToUint64(reader io.ReaderAt, offset i
func newFileService() *fileServiceImpl {
return &fileServiceImpl{}
}

func checkFile(path string) error {
if path == "" {
return fmt.Errorf("empty path given")
}

lInfo, err := os.Lstat(path)
if err != nil {
if os.IsNotExist(err) {
return fmt.Errorf("file `%s` doesn't exist", path)
}
return fmt.Errorf("cannot obtain file info of `%s`: %v", path, err)
}
mode := lInfo.Mode()
if mode&os.ModeSymlink != 0 {
return fmt.Errorf("file `%s` is a symlink", path)
}
return nil
}
4 changes: 2 additions & 2 deletions plugins/inputs/intel_powerstat/file_mock_test.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit df3e9ec

Please sign in to comment.