XMR-Stak stopping across multiple rigs at same time #2398

fbmoose48 · 2019-04-07T05:31:09Z

2.10.3 on Windows 10, no overclock/undervolt, blockchain gpus

My setup of 4 Windows 10 and 2 Debian rigs with a mix of 480s and 580s has demonstrated something curious:

the XMR-stak seems to drop all gpus and revert to cpu-only mining or worse stop entirely, these both occur at what seems to be nearly the same time (within a few minutes to under an hour at least, I'm not sitting in front of them actively watching) across all the W10 rigs, but the Debian ones run for days unattended.

The W10 rigs all run the blockchain drivers. The Debian don't. Switching entirely to Debian is tempting, but I'd get 10% better hash rates on the same cards with blockchain drivers on W10. I've tested this, same card different OS, consistently 10% less. I've heard rocm drivers on Debian might resolve this, but not all my pcie slots are 3.0 so I'm not sure they're compatible.

Rather than give up on the blockchain drivers on W10, I'd like to understand why this is happening before I just trade reduced hash rate for stability on Debian.

Any ideas? The timing part of the stoppages always seems suspicious to me.

Also, since the Debian rigs continue running through all this I don't think its a network issue.

I haven't been able to correlate what type of restart is necessary yet to get up and running.

Sometimes a restart works, which is great if I'm remote. Other times I have to do a full shutdown-startup to get everything reinitialized - sometimes more than once per rig, but sometimes one shutdown-startup gets it working again.

Blaming the Beta blockchain drivers on Windows seems too obvious. I stick with them over Adrenalin because they have always worked (up until the fork). The hours of downtime on Windows isn't worth a 10% boost when up when compared to being able to run Debian at a 10% reduction with no user intervention. That all W10 rigs stop together and in the same manner (either gpus drop out or entire XMR-stak closes) seems not to be coincidence.

These failures seem to occur after 2 - 12 hours of running, so multiple times per day. I've noticed that after one of these failures the cpus, if they remain hashing, seem to be at 25% their typical rate. I wonder if that is related? What could cause the GPUs to drop off and cpus to be restricted? Memory issue? I've never "affined to cpu", maybe I should.

One last thought is it could be unfortunate coincidence. Since the first time they all stopped together they've now all consistently been started within a minute of each other. They all have similar builds, maybe Windows just reaches it's breaking point independently on each rig and since they start together and are similarly built it just happens to occur near-simultaneously? I doubt it, but don't want to rule anything out trying to resolve this.

I have reason to believe this may be happening to other users' Windows rigs at simultaneously with my own.

psychocrypt · 2019-04-07T08:42:59Z

we had a similar issue a week ago with NVIDIA gpus. Not sure if the error is equal. Is it possible that I chanhe a few lines in the source code and you compile it for the broken machine and test it?

CryptoBroke550 · 2019-04-07T12:14:00Z

I run 4 windows 10 rigs version 1809 with latest patches and the updates paused. Mining XMR.
Last night at 9:30pm on my Rx580 and Rx470 rigs running XMR-Stak 2.10.3 both miners crashed, I have just restarted both. I have a RX550 rig still running 2.10.3 and Nvidia 1050Ti rig running 2.10.4 that both worked passed this point with no issues.
I do overclock the memory, under clock the Core and lower the voltage but HWinfo shows no errors when mining XMR so they are as stable as possible. I use Seasonic power supply's with over double the voltage required so power supply's should be ok. I also use UPS's on all the rigs. Driver is 18.6.1.

I also think the miner or card hit something it didn't like.

fbmoose48 · 2019-04-07T16:21:52Z

I would absolutely love to try some updated source code. I very much appreciate your help @psychocrypt

I've been stable the last few hours after having switched back to 2.10.2 if that helps you to isolate the problem any

Zarkoob · 2019-04-07T23:10:28Z

Had two miners crash for me as well at the same time. Pool also shown a huge drop in hash at the same time. I think the pool is causing a crash somehow. I'd love to help solve this.

fbmoose48 · 2019-04-08T19:25:50Z

I run 2.10.3 until it crashes, then start 2.10.2 until it crashes, I've been alternating like this for 48 hours with 2-10 hours of uptime each time. Can't get a crashed version to restart after a crash, but the alternative will - usually without even needing a restart.

I have no idea why this is, but thought it might help you debug

Edit: this work around stopped working, have had to reboot between failures and restart last 2 times

StuieG · 2019-04-10T00:52:12Z

I've noticed the same thing - rigs with AMD cards stopping at the same time. Weirdly enough, I have a few rigs with Nvidia cards as well and I've noticed they've stopped a couple of times at the same time as well. Not as the same times as the AMD rigs though.

PasternakMichal · 2019-04-17T13:16:59Z

Seems very similar to what happened to me, #2377 but I was on adrenalin drivers and also had a vega 64 mixed in

lordhugo7880 · 2019-06-10T19:02:21Z

What is the current status of this bug?

psychocrypt · 2019-06-10T19:08:22Z

fir NVIDIA I fixed it #2390. For AMD GPUs it was mostly a out of memory bug and was also fixed.
@lordhugo7880 If you have still issues like that please open a new PR with all the details like system,gpus and the exact error.

I will close this issue.

fbmoose48 · 2019-06-10T19:28:40Z

I suspected it was a windows-specific driver issue. I switched the same rig over to Debian, has run uninterpreted on same version of XMR-stak for almost 8 weeks.

psychocrypt self-assigned this Apr 8, 2019

psychocrypt added the bug label Apr 8, 2019

psychocrypt closed this as completed Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XMR-Stak stopping across multiple rigs at same time #2398

XMR-Stak stopping across multiple rigs at same time #2398

fbmoose48 commented Apr 7, 2019

psychocrypt commented Apr 7, 2019 via email

CryptoBroke550 commented Apr 7, 2019 •

edited

Loading

fbmoose48 commented Apr 7, 2019 •

edited

Loading

Zarkoob commented Apr 7, 2019

fbmoose48 commented Apr 8, 2019 •

edited

Loading

StuieG commented Apr 10, 2019

PasternakMichal commented Apr 17, 2019

lordhugo7880 commented Jun 10, 2019

psychocrypt commented Jun 10, 2019

fbmoose48 commented Jun 10, 2019

XMR-Stak stopping across multiple rigs at same time #2398

XMR-Stak stopping across multiple rigs at same time #2398

Comments

fbmoose48 commented Apr 7, 2019

psychocrypt commented Apr 7, 2019 via email

CryptoBroke550 commented Apr 7, 2019 • edited Loading

fbmoose48 commented Apr 7, 2019 • edited Loading

Zarkoob commented Apr 7, 2019

fbmoose48 commented Apr 8, 2019 • edited Loading

StuieG commented Apr 10, 2019

PasternakMichal commented Apr 17, 2019

lordhugo7880 commented Jun 10, 2019

psychocrypt commented Jun 10, 2019

fbmoose48 commented Jun 10, 2019

CryptoBroke550 commented Apr 7, 2019 •

edited

Loading

fbmoose48 commented Apr 7, 2019 •

edited

Loading

fbmoose48 commented Apr 8, 2019 •

edited

Loading