-
Notifications
You must be signed in to change notification settings - Fork 30
Illegal memory access on GTX 1060 6g #50
Comments
I get the same error using two TITAN X (Pascal) (28 SMX) mining monero, but in my case it fails. This morning I installed the newest Nvidia driver (cuda_8.0.61_375.26_linux.run) and cudnn (cudnn-8.0-linux-x64-v6.0.tgz). I have to use that version of cudnn because it is the only that tensorflow 1.3.0 admits. So I rebuilt ccminer-cryptonight, with no errors, and now I get that "illegal memory access" error. |
Do you still get this error when using --bfactor=0 ? |
Yes, I still do: [2017-09-12 09:46:57] Starting Stratum on stratum+tcp://pool.minexmr.com:5555 GPU 0: an illegal memory access was encountered GPU 1: OS call failed or operation not supported on this OS |
Me too
|
I do not know how to debug it, but if you will help me I will. |
It looks like it's only making problems under Linux. |
This is card specific issue. Could you suggest me how can I debug it and gather information when miner starts and fails? |
I have created the branch "memdebug": |
Ok it is success start while ethereum miner running
This is cold start with no GPU load
Look 8393588736 bytes it is like 8G am I correct? |
BTW I can see here 2 gpu with same index #1 |
I swap GPUs. Nothing changed but GPU indexes does.
Do not start
As you can see now it show two #3 devices with same amount of memory. But it has to show 0,1,2 devices as 1070 with 8G and 3 as 1060 with 6G. |
Looks like we have at least two different bugs. |
ok, two bugs have been fixed. |
Now it is fail to compile from master :(
I could not guess what does " Error 137" stands for |
It's compiling just fine for me. After pulling latest changes to master, I did:
Ah, and it now runs fine under Linux (Ubuntu Mate 16.04). |
Maybe a have an issue with my environment.
nvidia-smi outputs
So it is move 1060 to #1 index and shift last GPU to #0 |
I confirm this problem for any "Intensity" for Zotac 1060 3GB on Cuda 9 RC3/Linux/Ubuntu. |
About 9 days ago I tried to fix this. How old is your version? |
yesterday again compiled from latest code with cuda9/ubuntu 14.04. Was wondering why it runs with XMR and fails with SUMO... |
It failed with this like on 3GB Cards: |
version 2.05 ? |
yes latest version. |
The problem comes up only with SUMO on a yiimp-pool. XMR i short checked only with XMR-nanopool. |
Would you please try the memdebug branch and tell me what values for the memory you see? |
So... here we go: *** ccminer-cryptonight 2.05 (64 bit) for nVidia GPUs by tsiv and KlausT [2017-09-22 20:05:22] Keepalive actived GPU 1: an illegal memory access was encountered GPU 3: an illegal memory access was encountered and a dmesg: [283191.912617] NVRM: Xid (PCI:0000:03:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 0): Out Of Range Address [283191.912627] NVRM: Xid (PCI:0000:03:00): 13, Graphics SM Global Exception on (GPC 0, TPC 0): Physical Multiple Warp Errors [283191.912633] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ESR 0x504648=0x102000e 0x504650=0x4 0x504644=0xd3eff2 0x50464c=0x17f [283191.912704] NVRM: Xid (PCI:0000:03:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 1): Out Of Range Address [283191.912711] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ESR 0x504e48=0x102000e 0x504e50=0x20 0x504e44=0xd3eff2 0x504e4c=0x17f [283191.912781] NVRM: Xid (PCI:0000:03:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 2): Out Of Range Address [283191.912787] NVRM: Xid (PCI:0000:03:00): 13, Graphics SM Global Exception on (GPC 0, TPC 2): Physical Multiple Warp Errors [283191.912793] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ESR 0x505648=0x102000e 0x505650=0x24 0x505644=0xd3eff2 0x50564c=0x17f Every time i start the miner up again, another PCI-ID shows up, so it´s every time another card with exactly the same error. And it´s no overclock problem - 4 exactly identical GTX1060 3 GB cards. |
I'm seeing the same error on a GTX 1030. I tried building the memdebug branch but on CUDA 9.0 Ubuntu 16.04 it's throwing an error. I've attached logs/output to this comment. Any help would be greatly appreciated. Please note this worked with CUDA 8.0 Ubuntu 16.04 about 14 days ago. I ran into trouble only after upgrading to CUDA 9.0 on Ubuntu 16.04. |
CUDA 9 doesn't support sm_20 anymore. That's the reason for the compile error. |
Thank you for such a quick turn around. Is there anything you need from me that might be helpful? |
Quick update: I was able to build the latest memdebug (revision a8141db) and received the following output. What's interesting is I have 2 GPUs in this rig. A GTX 1060 6Gb and a GTX 1030 2Gb. I'm currently running ccminer with just the GTX 1030 2Gb (-d 1 cli option) but the output would seem to indicate it's seeing both cards as GPU 1.
|
I have the same issue.
|
I'm still receiving this error on the latest (a8141db) memdebug branch (after CUDA 9 merge). I managed to patch the sources for additional debug output and got the following. The relevant code that isolates the "exact" crash is below the log output. Does this help you narrow it down any? Is there more detail I can provide? A quick google search suggested it's in the kernel but I have no idea what that means or where to start poking around next in the code base. If you've got a 'max debug output' branch (or code) with more logging I can try running that to shed insight as well. Log
Crashing Code/Method
|
This doesn't look right:
How many cards do you use? |
I am using the -d argument (full CLI options below as well as nvidia-smi output). These graphics cards do work with your standard ccminer sources, if that helps. CLI*
nvidia-smi*
|
I have found a bug. I hope that was the reason for the crash. Edit: |
I was just about to leave a comment to that effect. I've attached my latest run of without the -d option
|
I think this change will fix it: |
That did the trick. I just built the latest memdebug branch with the referenced commit and things are working 100% 😁 Thank you for all your hard work. |
I can confirm that 6863dbe fixed the issue for me as well.
Thanks! |
Really strange Issue. Asus DUAL GTX 1060 6G fails to start because of next error
But it would not fail, if it is already mining ethereum.
So I can start ethereum mining on this GPU, then start ccminer, stop ethereum miner. And WTF - it is working.
The text was updated successfully, but these errors were encountered: