Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

send memory usage and cpu usage to telemetry #499

Merged
merged 3 commits into from
Aug 15, 2018

Conversation

guanqun
Copy link

@guanqun guanqun commented Aug 5, 2018

Fixes #443, however, it doesn't send IO usage, as it's not available in
this sysinfo crate.

I believe we should provide a way to opt out this feature, how do you think?

@gnunicorn gnunicorn added the A0-please_review Pull request needs code review. label Aug 6, 2018
let memory_usage = sys.get_used_memory() as f64 / sys.get_total_memory() as f64;
let procs = sys.get_processor_list();
let cpu_usage = procs[0].get_cpu_usage();
telemetry!("system.usage"; "memory" => memory_usage, "cpu" => cpu_usage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gavofyork any thoughts on whether doing this in the informant is OK?

Copy link
Author

@guanqun guanqun Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was wondering whether it's better to put the logic in polkadot instead of in the informat, then we can expose some interface to this underlying informat, just like what we did for "network", "client". I'm still learning which is the best way to split the responsibilities between these two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine to do in informant, but i would rather fold it into the system.interval telemetry report to minimise chatter and keep the reporting frequency down.


// get cpu usage and memory usage
sys.refresh_system();
let memory_usage = sys.get_used_memory() as f64 / sys.get_total_memory() as f64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paranoid mode on:
I wonder how robust this sysinfo crate is. Can it return 0 from get_total_memory? Or can get_processor_list return []? It would be kinda pity if it panics on telemetry data acquistion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the first question, even when get_total_memory returns 0, it's kinda OK, as it's been converted to a float, the final result would be NaN, it wouldn't panic though.

for the second question, I see your concern, but it seems the crate's document does just that. Hmm. I can add the length check before accessing the [] to prevent the possible panic here.

@tomaka
Copy link
Contributor

tomaka commented Aug 6, 2018

Ideally the sysinfo crate would be optional, as it doesn't automatically support all platforms.
However that's pretty nitpicky. The change can easily be done later if necessary.

@guanqun
Copy link
Author

guanqun commented Aug 6, 2018

@tomaka According to the crate info here: https://crates.io/crates/sysinfo , it says it supports Linux, Mac OSX, Windows and Raspberry. What's our polkadot's target platform? I thought the major three would be OK.

@guanqun
Copy link
Author

guanqun commented Aug 6, 2018

Notice this crate might be better for our use, https://crates.io/crates/systemstat, it support more platforms and it can even read the disk IO stats.

So I'll switch to this one.

@tomaka
Copy link
Contributor

tomaka commented Aug 7, 2018

I've had a discussion with the author of sysinfo a long time ago and I remember him saying that it was easy to get things wrong.

I'm on mobile right now, but IMO we should quickly check the quality of the crates before choosing one.

@guanqun
Copy link
Author

guanqun commented Aug 8, 2018

Here's an attempt at comparing these two crates: (not sure how exactly to check the code quality, so the following metrics is a bit superficial.)

items sysinfo systemstat
URL https://crates.io/crates/sysinfo https://crates.io/crates/systemstat
Supported platforms Linux/Raspberry/Mac OSX/Windows FreeBSD/Linux/OpenBSD/Windows/macOS
Features CPU/Memory CPU/Memory/IO/Network
Downloads 11568 4421
Dependent crates 14 2

@guanqun
Copy link
Author

guanqun commented Aug 11, 2018

Some updates:

  1. rebased on latest master to resolve the conflicts.
  2. use the "system.interval" instead of creating a new one.
  3. assuming we're still using sysinfo crate and make it slightly more robust.

@gavofyork
Copy link
Member

@guanqun two things would be good to get changed:

  • report statistics (CPU/RAM) only for the current process (get_process(std::process::id()))
  • for RAM, report actual value used, not a %

@guanqun
Copy link
Author

guanqun commented Aug 13, 2018

The CI fails due to the new commit 25f0c07 , @rphmeier might need to take a look, the current master fails to pass the test.

Guanqun Lu added 2 commits August 14, 2018 22:42
Fixes paritytech#443, however, it doesn't send IO usage, as it's not available in
this crate.
@guanqun
Copy link
Author

guanqun commented Aug 14, 2018

Rebased on latest master and it passes the tests now, please help review it again.

telemetry!(
"system.interval";
"status" => format!("{}{}", status, target),
"peers" => num_peers,
"height" => best_number,
"best" => ?hash,
"txcount" => txpool_status.transaction_count
"txcount" => txpool_status.transaction_count,
"cpu" => format!("{}%", cpu_usage),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

telemetry should be basic datatypes rather than string-formatted (since it's structured data)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We need to know this cpu usage is percentage based and the memory is kB.

Fixed now.

@gavofyork gavofyork added A5-grumble and removed A0-please_review Pull request needs code review. labels Aug 14, 2018
@gavofyork
Copy link
Member

one minor grumble - looks good otherwise

@gavofyork gavofyork added A0-please_review Pull request needs code review. A8-looksgood and removed A5-grumble A0-please_review Pull request needs code review. labels Aug 15, 2018
@gavofyork gavofyork merged commit a63ed69 into paritytech:master Aug 15, 2018
@guanqun guanqun deleted the guanqun-send-usages branch August 15, 2018 10:02
lamafab pushed a commit to lamafab/substrate that referenced this pull request Jun 16, 2020
* Final tweaks for Kusama PrePos.

* Replace old code

* Extra utility function.

* Update to latest Substrate

* Update to latest again
liuchengxu pushed a commit to autonomys/substrate that referenced this pull request Jun 3, 2022
…naming

Support additional naming conventions for Gemini snapshots
helin6 pushed a commit to boolnetwork/substrate that referenced this pull request Jul 25, 2023
Signed-off-by: Gregory Hill <gregorydhill@outlook.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants