Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] consider iowait as idle. #19110

Merged
merged 4 commits into from
Aug 31, 2023

Conversation

thetumbled
Copy link
Member

@thetumbled thetumbled commented Dec 30, 2022

Fixes #19109 #19109

Motivation

CPU usage calculated in pulsar is implemented by reading /proc/stat, that is sum all cpu time and substract idle time.

     *     cpu  user   nice system idle    iowait irq softirq steal guest guest_nice
     *     cpu  317808 128  58637  2503692 7634   0   13472   0     0     0

But real CPU usage should equal the sum substracting the idle cycles (idle+iowait), because %iowait represents the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
https://www.ibm.com/docs/en/linux-on-systems?topic=tests-understanding-cpu-utilization

Modifications

consider iowait as idle.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: thetumbled#11

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 30, 2022
Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can we add a unit test for this?

@Jason918
Copy link
Contributor

@mattisonchao Will this issue be resolved with the oshi library?

@thetumbled
Copy link
Member Author

LGTM, can we add a unit test for this?

done.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@lhotari lhotari added this to the 2.12.0 milestone Jan 2, 2023
@lhotari lhotari added the type/bug The PR fixed a bug or issue reported a bug label Jan 2, 2023
@@ -132,7 +133,7 @@ public static ResourceUsage getCpuUsageForEntireHost() {
.filter(s -> !s.contains("cpu"))
.mapToLong(Long::parseLong)
.sum();
long idle = Long.parseLong(words[4]);
long idle = Long.parseLong(words[4]) + Long.parseLong(words[5]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be a little dangerous to split values directly, can it be handled by oshi library?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we consider using oshi library, may be we should rewrite whole module using oshi?
oshi support for retrieving system information, such as memory and CPU usage, disks and partitions, etc.
To fix the iowait problem, a simple patch above may be a better solution.
@mattisonchao

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of module is splitting values directly.

long idle = Long.parseLong(words[4]);

there is no different with the patch.

long idle = Long.parseLong(words[4]) + Long.parseLong(words[5]);

I think we could fix this problem with this patch first.
WDYT. @codelipenghui

Copy link
Contributor

@HQebupt HQebupt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HQebupt
Copy link
Contributor

HQebupt commented Jan 3, 2023

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2023

Codecov Report

Merging #19110 (53c0519) into master (62a2058) will decrease coverage by 9.41%.
The diff coverage is 88.88%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #19110      +/-   ##
============================================
- Coverage     45.63%   36.22%   -9.42%     
+ Complexity    11013     7906    -3107     
============================================
  Files           772      712      -60     
  Lines         74313    69598    -4715     
  Branches       7998     7470     -528     
============================================
- Hits          33915    25211    -8704     
- Misses        36604    41058    +4454     
+ Partials       3794     3329     -465     
Flag Coverage Δ
unittests 36.22% <88.88%> (-9.42%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ache/pulsar/broker/loadbalance/LinuxInfoUtils.java 34.65% <0.00%> (ø)
...ker/loadbalance/impl/LinuxBrokerHostUsageImpl.java 79.68% <ø> (ø)
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 54.56% <100.00%> (+0.52%) ⬆️
...ava/org/apache/pulsar/broker/admin/v1/Brokers.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pulsar/broker/admin/v1/Clusters.java 0.00% <0.00%> (-100.00%) ⬇️
.../org/apache/pulsar/broker/admin/v1/Properties.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pulsar/broker/admin/v2/ResourceGroups.java 0.00% <0.00%> (-100.00%) ⬇️
...ar/common/naming/PartitionedManagedLedgerInfo.java 0.00% <0.00%> (-100.00%) ⬇️
...e/pulsar/broker/admin/impl/ResourceQuotasBase.java 0.00% <0.00%> (-96.43%) ⬇️
...he/pulsar/broker/service/AnalyzeBacklogResult.java 0.00% <0.00%> (-92.31%) ⬇️
... and 201 more

@github-actions
Copy link

github-actions bot commented Feb 3, 2023

The pr had no activity for 30 days, mark with Stale label.

@Technoboy- Technoboy- modified the milestones: 3.1.0, 3.2.0 Jul 31, 2023
# Conflicts:
#	pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/LinuxInfoUtils.java
#	pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/LinuxBrokerHostUsageImpl.java
@BewareMyPower BewareMyPower merged commit f35d3e0 into apache:master Aug 31, 2023
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-3.0 cherry-picked/branch-3.1 doc-not-needed Your PR changes do not impact docs release/3.0.2 release/3.1.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] iowait should be considered as idle time.