Display info when --tstop/--tstart temperatures are reached. #1159

StefanOberhumer · 2018-05-26T18:04:30Z

No description provided.

AndreaLanfranchi · 2018-05-27T09:19:38Z

Sorry for my poor understanding but ... why keep cnote commented out ?
Wouldn't it be useful for users to actually see the mining on CPU x has been suspended due to temperature threshold ?

StefanOberhumer · 2018-05-27T09:46:13Z

I don't wanted to make any performance impacts depending on my (already commented) "debugging" infos - due the #1146 was accepted and merged.
As I recognized some wrong and false interpretable commented "debug" lines I wanted to fix them.
Maybe we should make the info depending on verbosity level ? (Which?)

AndreaLanfranchi · 2018-05-27T09:50:56Z

This is my personal opinion. CLI args --tstop and --tstart are user activated thus the user expects to see some behavior happen when temp threshold kicks in.
As a minimal output with cnote I would advise user when the GPU gets suspended due to temp limit and when it gets resumed.

jean-m-cyr · 2018-05-27T14:53:14Z

@StefanOberhumer I'm am also of the view that, at a minimum, a cnote log entry should accompany any GPU tstop/tstart state transition. I would even suggest a cwarn on stopping a GPU.

I'm also a little puzzled with this feature... my NV GPUs do a pretty good job of automatically adjusting fans and throttling work to manage temp.

…art/--tstop.

StefanOberhumer · 2018-05-27T20:29:42Z

Adapted (cwarn if --tstop is reached, cinfo if --tstart is reached)

(Think I cannot rename the branch when a PR is open without closing the PR .... so I left the name of my branch unchanged ... )

StefanOberhumer · 2018-05-27T20:52:54Z

@jean-m-cyr

I'm also a little puzzled with this feature... my NV GPUs do a pretty good job of automatically adjusting fans and throttling work to manage temp.

We had problems with external cooling and fan system.
I try to keep the temp of my cards at 60 degrees to allow them a (hopefully) long life ;-)
At a target temperature of 60 my fans were at 100% and the temperature raised...
We're getting summer (outside near 40 degrees)
Other mining software also includes this feature
For those reasons I decided to add this feature and I'm very glad it was merged !

But: How does your cards

... throttling work to manage temp

?
I already thought about throttling minimizing cuda tasks !

jean-m-cyr · 2018-05-27T21:23:26Z

Ah, ok. 60C is very low. I've seen cards that run comfortably at 80C...

In Nvidia 10x0 series GPUs, temp and power limits are handled by the GPU's hardware dispatch. The dispatcher will automatically back off the number of running work groups. Not sure were the temp. limit is but it's above 80C. I believe the silicon is spec'd at up to 100C!!!

StefanOberhumer · 2018-05-27T22:50:47Z

Well I 've seen some setting using
nvidia-smi --query-gpu=clocks_throttle_reasons.supported --format=csv,nounits,noheader
nvidia-smi --query-gpu=clocks_throttle_reasons.active --format=csv,nounits,noheader
nvidia-smi --query-gpu=clocks_throttle_reasons.hw_slowdown --format=csv,nounits,noheader
==> see more of them using nvidia-smi --help-query-gpu

I image that I saw info about throtteling about 90°.
I also image that I saw a setting shutting down the GPU at 100°.

As I have some electronic background knowledge I think 90° is much too hot! (even if spec "allows this")
I have seen card running at 80° where something ran over the PCB (Iooked like glue - I said they "sweated")
So I try to keep my cards at 60° and (possible) loose some MH/s but hoping to keep my cards in good condition.

Let's see what summer brings - maybe I have to update my target temperature ;-)

Thanks for your info & feedback

jean-m-cyr · 2018-05-27T23:16:12Z

@StefanOberhumer One more thing... I missed it too but there's a new practice of updating the CHANGELOG.md file with functional changes such as this one. I think the intent is to include this changelog update with the PR.

This PR is already merged, so perhaps you could submit a further PR to update the changelog?

jean-m-cyr · 2018-05-27T23:20:54Z

@chfast Should the CHANGELOG update be included with the functional PR, or should it be submitted as a separate PR? Including the CHANGELOG update with the code update PR would be clean, but also cause a lot of forced branch merges...

chfast · 2018-05-28T08:14:13Z

It's better to include it with the PR. But I don't mind updating it later on.

StefanOberhumer force-pushed the NFC-FixComments branch from 9595ef7 to 3aeb66d Compare May 26, 2018 18:10

NFC: Update some comments.

6198328

StefanOberhumer force-pushed the NFC-FixComments branch from 3aeb66d to 6198328 Compare May 26, 2018 18:12

Display info if a gpu is paused/restarted due temperature flags --tst…

d0d197a

…art/--tstop.

StefanOberhumer changed the title ~~NFC: Update/Correct some comments.~~ Display info when --tstop/--tstart temperatures are reached. May 27, 2018

jean-m-cyr approved these changes May 27, 2018

View reviewed changes

jean-m-cyr merged commit c7245e7 into ethereum-mining:master May 27, 2018

StefanOberhumer deleted the NFC-FixComments branch May 27, 2018 20:54

StefanOberhumer mentioned this pull request May 28, 2018

CHANGELOG.md: Add changes of PR1146, PR1159 and PR1162 #1169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display info when --tstop/--tstart temperatures are reached. #1159

Display info when --tstop/--tstart temperatures are reached. #1159

StefanOberhumer commented May 26, 2018

AndreaLanfranchi commented May 27, 2018

StefanOberhumer commented May 27, 2018

AndreaLanfranchi commented May 27, 2018

jean-m-cyr commented May 27, 2018 •

edited

Loading

StefanOberhumer commented May 27, 2018 •

edited

Loading

StefanOberhumer commented May 27, 2018 •

edited

Loading

jean-m-cyr commented May 27, 2018

StefanOberhumer commented May 27, 2018 •

edited

Loading

jean-m-cyr commented May 27, 2018 •

edited

Loading

jean-m-cyr commented May 27, 2018

chfast commented May 28, 2018

Display info when --tstop/--tstart temperatures are reached. #1159

Display info when --tstop/--tstart temperatures are reached. #1159

Conversation

StefanOberhumer commented May 26, 2018

AndreaLanfranchi commented May 27, 2018

StefanOberhumer commented May 27, 2018

AndreaLanfranchi commented May 27, 2018

jean-m-cyr commented May 27, 2018 • edited Loading

StefanOberhumer commented May 27, 2018 • edited Loading

StefanOberhumer commented May 27, 2018 • edited Loading

jean-m-cyr commented May 27, 2018

StefanOberhumer commented May 27, 2018 • edited Loading

jean-m-cyr commented May 27, 2018 • edited Loading

jean-m-cyr commented May 27, 2018

chfast commented May 28, 2018

jean-m-cyr commented May 27, 2018 •

edited

Loading

StefanOberhumer commented May 27, 2018 •

edited

Loading

StefanOberhumer commented May 27, 2018 •

edited

Loading

StefanOberhumer commented May 27, 2018 •

edited

Loading

jean-m-cyr commented May 27, 2018 •

edited

Loading