-
Notifications
You must be signed in to change notification settings - Fork 45
Conversation
Cool. Could you add an example here - i.e. just the relevant part of an example report, with the new info? |
src/node_report.cc
Outdated
@@ -1020,6 +1024,11 @@ static void PrintGCStatistics(std::ostream& out, Isolate* isolate) { | |||
******************************************************************************/ | |||
static void PrintResourceUsage(std::ostream& out) { | |||
char buf[64]; | |||
double cpu_abs; | |||
double cpu_percentage; | |||
long int diff_time = current_time - load_time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use difftime()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
difftime() produces result with resolution of seconds, while the time() command gives the result in terms of millis. While in most of the production cases this is immaterial, short-running apps will loose the milli-second differences. Is that fine with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the time() function gives the time in seconds:
http://en.cppreference.com/w/cpp/chrono/c/time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood and made the changes.
src/node_report.cc
Outdated
double cpu_abs; | ||
double cpu_percentage; | ||
long int diff_time = current_time - load_time; | ||
if(diff_time == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please insert a space after if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardlau - done, thanks.
67a335e
to
6c4a11c
Compare
src/node_report.cc
Outdated
@@ -106,6 +106,8 @@ static char report_directory[NR_MAXPATH + 1] = ""; // defaults to current workin | |||
std::string version_string = UNKNOWN_NODEVERSION_STRING; | |||
std::string commandline_string = ""; | |||
static TIME_TYPE loadtime_tm_struct; // module load time | |||
static time_t load_time; // module load time absolute | |||
static time_t current_time; // current time absolute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need either of these:
- load_time can be obtained by doing:
time_t load_time = mktime(&loadtime_tm_struct);
- there's no point storing current_time in a static variable, we can't re-use it the next time node-report is invoked it'll be wrong. You can move
time(&load_time)
intoPrintResourceUsage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
src/node_report.cc
Outdated
snprintf( buf, sizeof(buf), "%ld.%06d", stats.ru_stime.tv_sec, stats.ru_stime.tv_usec); | ||
out << "\n Kernel mode CPU: " << buf << " secs"; | ||
cpu_abs += std::stod(buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing back our floating point numbers seems like more work than neccessary. Could you just use arithmetic to add up stats.ru_utime.tv_sec, stats.ru_utime.tv_usec, stats.ru_stime.tv_sec and stats.ru_stime.tv_usec
instead?
I think you can do it outside the #if defined
block as well then so there's less duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
src/node_report.cc
Outdated
@@ -1020,6 +1024,11 @@ static void PrintGCStatistics(std::ostream& out, Isolate* isolate) { | |||
******************************************************************************/ | |||
static void PrintResourceUsage(std::ostream& out) { | |||
char buf[64]; | |||
double cpu_abs; | |||
double cpu_percentage; | |||
long int diff_time = current_time - load_time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can diff_time be a time_t like current_time and load_time? (Or you could use auto
since we've used it elsewhere.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
src/node_report.cc
Outdated
#endif | ||
cpu_percentage = (cpu_abs / diff_time) * 100; | ||
out << "\n Average Consumption : "<< cpu_percentage << "%"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably being paranoid but could you multiply by 100.0 to make sure there are no conversions back to integer arithmetic.
Could you change the label to "Average CPU Consumption"? When I wrote the report out this came just before the resident set size line and it wasn't clear whether it was average CPU or memory consumption. It might be worth specifying the format for the percentage more tightly so we don't get things like 1.9e-09 if there's a very small amount of CPU used.
It's also probably worth writing out diff_time as "Uptime" or similar as that's a) useful to know and b) means someone can see where the percentage came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What number do we actually want to show here? Does 100% mean all the processing time across all CPUs or just 100% of one CPU?
To put it another way, does the calculation for RUSAGE_SELF need to be divided by the number of availble CPUs to get the fraction of CPU time used in the whole system? (The value for RUSAGE_THREAD won't as a thread can by definition only use one CPU.)
As it stands the code could report that the whole process used 400% of CPU time when in fact on an 8 way box that might need to only be 50% of the total available. Which one does the user expect? What other tools will this number be compared against and what do they do? (e.g. what would top
show?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Division with double value rather integer - done.
Modified the header to "Average CPU Consumption"
I would think the CPU consumption across all the CPUs make sense. Having already averaged out the consumption across the whole duration of the run, the fine-grained information split by the threads may not be required, or meaningful. So in short, the current code stands. Please let me know what is your thought on this.
Relevant section of node-report with the changes:
|
node-report currently produces absolute CPU consumption from the time since the process started. By making use of the load time and the current time, convert this into an average consumption rate which is easily consumable.
I think I have addressed all the comments and change requests made so far. Please review. |
The changes look good. I'm unsure about whether we need to divide down the process CPU time percentage. On the one hand reporting > 100% of anything used is obviously a bit odd. On the other if we report individual threads then their percentages will add up to the total percentage reported for the process and that makes it easier to see which one is using all the CPU time. I did a run with a small program that just creates objects as fast as possible to use as much CPU time as I could, this was the output:
I think it looks good - I have managed to see the event loop thread use more than 100% CPU - I think this was an artifact of having a short running program and using the node-report module load time as the start time which may have happened a short time after the process started. We might be able to get a better process start time by reading /proc but that's a separate issue and is unlikely to be a problem when running node-report in production. |
@hhellyer , just to clarify - you are not expecting any further response / action from me right?
Thinking further on this, I guess the event loop thread will consume the predominant amount of CPU in the most common scenarios. So, reporting the consumption across threads / CPUs with average method will give equal weightage for all the threads potentially, the true state of CPU consumption may be misrepresented. So I guess showing the cumulative percentage, even if it is more than 100%, makes sense to me. |
@bidipyne - I think all I wanted was for you to confirm you were happy with the CPU percentages and the output was what you expected/needed. So long as it is that's fine. |
node-report currently produces absolute CPU consumption from the time since the process started. By making use of the load time and the current time, convert this into an average consumption rate which is easily consumable. PR-URL: #79 Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Howard Hellyer <hhellyer@uk.ibm.com>
Landed as c9c1359 |
node-report currently produces absolute CPU consumption
from the time since the process started. By making use of
the load time and the current time, convert this into an
average consumption rate which is easily consumable.
Read and understood the guidelines for contributing.
Thanks in advance!