Skip to content
This repository has been archived by the owner on Jun 18, 2021. It is now read-only.

report: add average CPU consumption #79

Closed
wants to merge 1 commit into from

Conversation

bidipyne
Copy link
Contributor

node-report currently produces absolute CPU consumption
from the time since the process started. By making use of
the load time and the current time, convert this into an
average consumption rate which is easily consumable.

Read and understood the guidelines for contributing.
Thanks in advance!

@rnchamberlain
Copy link
Contributor

Cool. Could you add an example here - i.e. just the relevant part of an example report, with the new info?

@@ -1020,6 +1024,11 @@ static void PrintGCStatistics(std::ostream& out, Isolate* isolate) {
******************************************************************************/
static void PrintResourceUsage(std::ostream& out) {
char buf[64];
double cpu_abs;
double cpu_percentage;
long int diff_time = current_time - load_time;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use difftime().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

difftime() produces result with resolution of seconds, while the time() command gives the result in terms of millis. While in most of the production cases this is immaterial, short-running apps will loose the milli-second differences. Is that fine with you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the time() function gives the time in seconds:
http://en.cppreference.com/w/cpp/chrono/c/time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood and made the changes.

double cpu_abs;
double cpu_percentage;
long int diff_time = current_time - load_time;
if(diff_time == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please insert a space after if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardlau - done, thanks.

@bidipyne bidipyne force-pushed the averagecpu branch 2 times, most recently from 67a335e to 6c4a11c Compare March 28, 2017 12:50
@@ -106,6 +106,8 @@ static char report_directory[NR_MAXPATH + 1] = ""; // defaults to current workin
std::string version_string = UNKNOWN_NODEVERSION_STRING;
std::string commandline_string = "";
static TIME_TYPE loadtime_tm_struct; // module load time
static time_t load_time; // module load time absolute
static time_t current_time; // current time absolute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need either of these:

  • load_time can be obtained by doing: time_t load_time = mktime(&loadtime_tm_struct);
  • there's no point storing current_time in a static variable, we can't re-use it the next time node-report is invoked it'll be wrong. You can move time(&load_time) into PrintResourceUsage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

snprintf( buf, sizeof(buf), "%ld.%06d", stats.ru_stime.tv_sec, stats.ru_stime.tv_usec);
out << "\n Kernel mode CPU: " << buf << " secs";
cpu_abs += std::stod(buf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing back our floating point numbers seems like more work than neccessary. Could you just use arithmetic to add up stats.ru_utime.tv_sec, stats.ru_utime.tv_usec, stats.ru_stime.tv_sec and stats.ru_stime.tv_usec instead?
I think you can do it outside the #if defined block as well then so there's less duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -1020,6 +1024,11 @@ static void PrintGCStatistics(std::ostream& out, Isolate* isolate) {
******************************************************************************/
static void PrintResourceUsage(std::ostream& out) {
char buf[64];
double cpu_abs;
double cpu_percentage;
long int diff_time = current_time - load_time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can diff_time be a time_t like current_time and load_time? (Or you could use auto since we've used it elsewhere.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

#endif
cpu_percentage = (cpu_abs / diff_time) * 100;
out << "\n Average Consumption : "<< cpu_percentage << "%";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably being paranoid but could you multiply by 100.0 to make sure there are no conversions back to integer arithmetic.
Could you change the label to "Average CPU Consumption"? When I wrote the report out this came just before the resident set size line and it wasn't clear whether it was average CPU or memory consumption. It might be worth specifying the format for the percentage more tightly so we don't get things like 1.9e-09 if there's a very small amount of CPU used.
It's also probably worth writing out diff_time as "Uptime" or similar as that's a) useful to know and b) means someone can see where the percentage came from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What number do we actually want to show here? Does 100% mean all the processing time across all CPUs or just 100% of one CPU?
To put it another way, does the calculation for RUSAGE_SELF need to be divided by the number of availble CPUs to get the fraction of CPU time used in the whole system? (The value for RUSAGE_THREAD won't as a thread can by definition only use one CPU.)

As it stands the code could report that the whole process used 400% of CPU time when in fact on an 8 way box that might need to only be 50% of the total available. Which one does the user expect? What other tools will this number be compared against and what do they do? (e.g. what would top show?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division with double value rather integer - done.
Modified the header to "Average CPU Consumption"
I would think the CPU consumption across all the CPUs make sense. Having already averaged out the consumption across the whole duration of the run, the fine-grained information split by the threads may not be required, or meaningful. So in short, the current code stands. Please let me know what is your thought on this.

@bidipyne
Copy link
Contributor Author

Relevant section of node-report with the changes:

==== Resource Usage =====================================
Process total resource usage:
  User mode CPU: 0.074988 secs
  Kernel mode CPU: 0.032994 secs
  Average Consumption : 10.7982%
  Maximum resident set size: 16,183,296 bytes
  Page faults: 156 (I/O required) 4020 (no I/O required)
  Filesystem activity: 16832 reads 16 writes

Event loop thread resource usage:
  User mode CPU: 0.078987 secs
  Kernel mode CPU: 0.036994 secs
  Average Consumption : 11.5981%
  Filesystem activity: 16960 reads 16 writes

node-report currently produces absolute CPU consumption
from the time since the process started. By making use of
the load time and the current time, convert this into an
average consumption rate which is easily consumable.
@bidipyne
Copy link
Contributor Author

I think I have addressed all the comments and change requests made so far. Please review.

@hhellyer
Copy link
Contributor

The changes look good. I'm unsure about whether we need to divide down the process CPU time percentage. On the one hand reporting > 100% of anything used is obviously a bit odd. On the other if we report individual threads then their percentages will add up to the total percentage reported for the process and that makes it easier to see which one is using all the CPU time.

I did a run with a small program that just creates objects as fast as possible to use as much CPU time as I could, this was the output:

==== Resource Usage ============================================================

Process total resource usage:
  User mode CPU: 11.212000 secs
  Kernel mode CPU: 0.240000 secs
  Average CPU Consumption : 127.244%
  Maximum resident set size: 763,080,704 bytes
  Page faults: 0 (I/O required) 210509 (no I/O required)
  Filesystem activity: 0 reads 8 writes

Event loop thread resource usage:
  User mode CPU: 8.636000 secs
  Kernel mode CPU: 0.240000 secs
  Average CPU Consumption : 98.6222%
  Filesystem activity: 0 reads 8 writes

I think it looks good - I have managed to see the event loop thread use more than 100% CPU - I think this was an artifact of having a short running program and using the node-report module load time as the start time which may have happened a short time after the process started. We might be able to get a better process start time by reading /proc but that's a separate issue and is unlikely to be a problem when running node-report in production.

@bidipyne
Copy link
Contributor Author

@hhellyer , just to clarify - you are not expecting any further response / action from me right?

On the one hand reporting > 100% of anything used is obviously a bit odd.

Thinking further on this, I guess the event loop thread will consume the predominant amount of CPU in the most common scenarios. So, reporting the consumption across threads / CPUs with average method will give equal weightage for all the threads potentially, the true state of CPU consumption may be misrepresented. So I guess showing the cumulative percentage, even if it is more than 100%, makes sense to me.

@hhellyer
Copy link
Contributor

@bidipyne - I think all I wanted was for you to confirm you were happy with the CPU percentages and the output was what you expected/needed. So long as it is that's fine.

rnchamberlain pushed a commit that referenced this pull request Apr 3, 2017
node-report currently produces absolute CPU consumption
from the time since the process started. By making use of
the load time and the current time, convert this into an
average consumption rate which is easily consumable.

PR-URL: #79
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Howard Hellyer <hhellyer@uk.ibm.com>
@rnchamberlain
Copy link
Contributor

Landed as c9c1359

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants