-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XMR-Stak stopping across multiple rigs at same time #2398
Comments
we had a similar issue a week ago with NVIDIA gpus. Not sure if the error
is equal.
Is it possible that I chanhe a few lines in the source code and you compile
it for the broken machine and test it?
|
I run 4 windows 10 rigs version 1809 with latest patches and the updates paused. Mining XMR. I also think the miner or card hit something it didn't like. |
I would absolutely love to try some updated source code. I very much appreciate your help @psychocrypt I've been stable the last few hours after having switched back to 2.10.2 if that helps you to isolate the problem any |
Had two miners crash for me as well at the same time. Pool also shown a huge drop in hash at the same time. I think the pool is causing a crash somehow. I'd love to help solve this. |
I run 2.10.3 until it crashes, then start 2.10.2 until it crashes, I've been alternating like this for 48 hours with 2-10 hours of uptime each time. Can't get a crashed version to restart after a crash, but the alternative will - usually without even needing a restart. I have no idea why this is, but thought it might help you debug Edit: this work around stopped working, have had to reboot between failures and restart last 2 times |
I've noticed the same thing - rigs with AMD cards stopping at the same time. Weirdly enough, I have a few rigs with Nvidia cards as well and I've noticed they've stopped a couple of times at the same time as well. Not as the same times as the AMD rigs though. |
Seems very similar to what happened to me, #2377 but I was on adrenalin drivers and also had a vega 64 mixed in |
What is the current status of this bug? |
fir NVIDIA I fixed it #2390. For AMD GPUs it was mostly a out of memory bug and was also fixed. I will close this issue. |
I suspected it was a windows-specific driver issue. I switched the same rig over to Debian, has run uninterpreted on same version of XMR-stak for almost 8 weeks. |
2.10.3 on Windows 10, no overclock/undervolt, blockchain gpus
My setup of 4 Windows 10 and 2 Debian rigs with a mix of 480s and 580s has demonstrated something curious:
the XMR-stak seems to drop all gpus and revert to cpu-only mining or worse stop entirely, these both occur at what seems to be nearly the same time (within a few minutes to under an hour at least, I'm not sitting in front of them actively watching) across all the W10 rigs, but the Debian ones run for days unattended.
The W10 rigs all run the blockchain drivers. The Debian don't. Switching entirely to Debian is tempting, but I'd get 10% better hash rates on the same cards with blockchain drivers on W10. I've tested this, same card different OS, consistently 10% less. I've heard rocm drivers on Debian might resolve this, but not all my pcie slots are 3.0 so I'm not sure they're compatible.
Rather than give up on the blockchain drivers on W10, I'd like to understand why this is happening before I just trade reduced hash rate for stability on Debian.
Any ideas? The timing part of the stoppages always seems suspicious to me.
Also, since the Debian rigs continue running through all this I don't think its a network issue.
I haven't been able to correlate what type of restart is necessary yet to get up and running.
Sometimes a restart works, which is great if I'm remote. Other times I have to do a full shutdown-startup to get everything reinitialized - sometimes more than once per rig, but sometimes one shutdown-startup gets it working again.
Blaming the Beta blockchain drivers on Windows seems too obvious. I stick with them over Adrenalin because they have always worked (up until the fork). The hours of downtime on Windows isn't worth a 10% boost when up when compared to being able to run Debian at a 10% reduction with no user intervention. That all W10 rigs stop together and in the same manner (either gpus drop out or entire XMR-stak closes) seems not to be coincidence.
These failures seem to occur after 2 - 12 hours of running, so multiple times per day. I've noticed that after one of these failures the cpus, if they remain hashing, seem to be at 25% their typical rate. I wonder if that is related? What could cause the GPUs to drop off and cpus to be restricted? Memory issue? I've never "affined to cpu", maybe I should.
One last thought is it could be unfortunate coincidence. Since the first time they all stopped together they've now all consistently been started within a minute of each other. They all have similar builds, maybe Windows just reaches it's breaking point independently on each rig and since they start together and are similarly built it just happens to occur near-simultaneously? I doubt it, but don't want to rule anything out trying to resolve this.
I have reason to believe this may be happening to other users' Windows rigs at simultaneously with my own.
The text was updated successfully, but these errors were encountered: