Skip to content
This repository has been archived by the owner on Apr 24, 2022. It is now read-only.

Display info when --tstop/--tstart temperatures are reached. #1159

Merged
merged 2 commits into from
May 27, 2018

Conversation

StefanOberhumer
Copy link
Collaborator

No description provided.

@AndreaLanfranchi
Copy link
Collaborator

Sorry for my poor understanding but ... why keep cnote commented out ?
Wouldn't it be useful for users to actually see the mining on CPU x has been suspended due to temperature threshold ?

@StefanOberhumer
Copy link
Collaborator Author

I don't wanted to make any performance impacts depending on my (already commented) "debugging" infos - due the #1146 was accepted and merged.
As I recognized some wrong and false interpretable commented "debug" lines I wanted to fix them.
Maybe we should make the info depending on verbosity level ? (Which?)

@AndreaLanfranchi
Copy link
Collaborator

This is my personal opinion. CLI args --tstop and --tstart are user activated thus the user expects to see some behavior happen when temp threshold kicks in.
As a minimal output with cnote I would advise user when the GPU gets suspended due to temp limit and when it gets resumed.

@jean-m-cyr
Copy link
Contributor

jean-m-cyr commented May 27, 2018

@StefanOberhumer I'm am also of the view that, at a minimum, a cnote log entry should accompany any GPU tstop/tstart state transition. I would even suggest a cwarn on stopping a GPU.

I'm also a little puzzled with this feature... my NV GPUs do a pretty good job of automatically adjusting fans and throttling work to manage temp.

@StefanOberhumer StefanOberhumer changed the title NFC: Update/Correct some comments. Display info when --tstop/--tstart temperatures are reached. May 27, 2018
@StefanOberhumer
Copy link
Collaborator Author

StefanOberhumer commented May 27, 2018

Adapted (cwarn if --tstop is reached, cinfo if --tstart is reached)

(Think I cannot rename the branch when a PR is open without closing the PR .... so I left the name of my branch unchanged ... )

@jean-m-cyr jean-m-cyr merged commit c7245e7 into ethereum-mining:master May 27, 2018
@StefanOberhumer
Copy link
Collaborator Author

StefanOberhumer commented May 27, 2018

@jean-m-cyr

I'm also a little puzzled with this feature... my NV GPUs do a pretty good job of automatically adjusting fans and throttling work to manage temp.

  • We had problems with external cooling and fan system.
  • I try to keep the temp of my cards at 60 degrees to allow them a (hopefully) long life ;-)
  • At a target temperature of 60 my fans were at 100% and the temperature raised...
  • We're getting summer (outside near 40 degrees)
  • Other mining software also includes this feature
    For those reasons I decided to add this feature and I'm very glad it was merged !

But: How does your cards

... throttling work to manage temp

?
I already thought about throttling minimizing cuda tasks !

@StefanOberhumer StefanOberhumer deleted the NFC-FixComments branch May 27, 2018 20:54
@jean-m-cyr
Copy link
Contributor

Ah, ok. 60C is very low. I've seen cards that run comfortably at 80C...

In Nvidia 10x0 series GPUs, temp and power limits are handled by the GPU's hardware dispatch. The dispatcher will automatically back off the number of running work groups. Not sure were the temp. limit is but it's above 80C. I believe the silicon is spec'd at up to 100C!!!

@StefanOberhumer
Copy link
Collaborator Author

StefanOberhumer commented May 27, 2018

Well I 've seen some setting using
nvidia-smi --query-gpu=clocks_throttle_reasons.supported --format=csv,nounits,noheader
nvidia-smi --query-gpu=clocks_throttle_reasons.active --format=csv,nounits,noheader
nvidia-smi --query-gpu=clocks_throttle_reasons.hw_slowdown --format=csv,nounits,noheader
==> see more of them using nvidia-smi --help-query-gpu

I image that I saw info about throtteling about 90°.
I also image that I saw a setting shutting down the GPU at 100°.

  • As I have some electronic background knowledge I think 90° is much too hot! (even if spec "allows this")
  • I have seen card running at 80° where something ran over the PCB (Iooked like glue - I said they "sweated")
  • So I try to keep my cards at 60° and (possible) loose some MH/s but hoping to keep my cards in good condition.

Let's see what summer brings - maybe I have to update my target temperature ;-)

Thanks for your info & feedback

@jean-m-cyr
Copy link
Contributor

jean-m-cyr commented May 27, 2018

@StefanOberhumer One more thing... I missed it too but there's a new practice of updating the CHANGELOG.md file with functional changes such as this one. I think the intent is to include this changelog update with the PR.

This PR is already merged, so perhaps you could submit a further PR to update the changelog?

@jean-m-cyr
Copy link
Contributor

@chfast Should the CHANGELOG update be included with the functional PR, or should it be submitted as a separate PR? Including the CHANGELOG update with the code update PR would be clean, but also cause a lot of forced branch merges...

@chfast
Copy link
Contributor

chfast commented May 28, 2018

It's better to include it with the PR. But I don't mind updating it later on.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants