IGP causes NVMe Kernel Panic CSTS=0xffffffff #1193

0xfeedface-turbo · 2020-10-02T21:28:22Z

Let me start with the fact that this is not a bug in NVMeFix or Whatevergreen but this seems like the best place to document the issue.

I have an Intel 9600K/H370 system that experiences kernel panics in IONVMeController that manifests as a generic timeout:

void AppleNVMeRequestTimer::PrintPending()::243:QID=1 Deadline=4390442285091 DW0=00140001 DW10=00F04593 DW11=00000000 DW12=0000001F DW13=00000000 DW14=00000000 DW15=00000000
void AppleNVMeRequestTimer::PrintPending()::243:QID=1 Deadline=4390442285091 DW0=00140001 DW10=00F04593 DW11=00000000 DW12=0000001F DW13=00000000 DW14=00000000 DW15=00000000
Debugger called:
IOPlatformPanicAction -> IONVMeController
IOPlatformPanicAction -> AppleSMC
: panic(cpu 0 caller 0xffffff7f865edb30): nvme: "Fatal error occurred. CSTS=0xffffffff US[1]=0x0 US[0]=0x5a1 VID/DID=0x500215b7
. FW Revision=102000WD\n"@/BuildRoot/Library/Caches/com.apple.xbs/Sources/IONVMeFamily/IONVMeFamily-387.270.1/IONVMeController.cpp:5334
Backtrace (CPU 0), Frame : Return Address
0xffffff873a6f3a10 : 0xffffff8003fad58d mach_kernel : _handle_debugger_trap + 0x47d
0xffffff873a6f3a60 : 0xffffff80040e9145 mach_kernel : _kdp_i386_trap + 0x155
0xffffff873a6f3aa0 : 0xffffff80040da87a mach_kernel : _kernel_trap + 0x50a
0xffffff873a6f3b10 : 0xffffff8003f5a9d0 mach_kernel : _return_from_trap + 0xe0
0xffffff873a6f3b30 : 0xffffff8003facfa7 mach_kernel : _panic_trap_to_debugger + 0x197
0xffffff873a6f3c50 : 0xffffff8003facdf3 mach_kernel : _panic + 0x63
0xffffff873a6f3cc0 : 0xffffff7f865edb30 com.apple.iokit.IONVMeFamily : __ZN16IONVMeController13FatalHandlingEv + 0x10e
0xffffff873a6f3e20 : 0xffffff800465d407 mach_kernel : _ZN18IOTimerEventSource15timeoutSignaledEPvS0 + 0x87
0xffffff873a6f3e90 : 0xffffff800465d329 mach_kernel : _ZN18IOTimerEventSource17timeoutAndReleaseEPvS0 + 0x99
0xffffff873a6f3ec0 : 0xffffff8003fec7a5 mach_kernel : _thread_call_delayed_timer + 0xef5
0xffffff873a6f3f40 : 0xffffff8003fec345 mach_kernel : _thread_call_delayed_timer + 0xa95
0xffffff873a6f3fa0 : 0xffffff8003f5a0ce mach_kernel : _call_continuation + 0x2e
Kernel Extensions in backtrace:
com.apple.iokit.IONVMeFamily(2.1)[E109699D-6257-3176-B081-4CC8B1C181AB]@0xffffff7f865e0000->0xffffff7f8661ffff
dependency: com.apple.driver.AppleMobileFileIntegrity(1.0.5)[1AD7D9F4-24B5-354F-BD01-C301F58FAA52]@0xffffff7f84d8d000
dependency: com.apple.iokit.IOPCIFamily(2.9)[EF12A360-E92B-3407-8080-E4889F8AAC97]@0xffffff7f84895000
dependency: com.apple.driver.AppleEFINVRAM(2.1)[32B99D26-4CD1-3CE5-8856-D2659CCA4861]@0xffffff7f84f67000
dependency: com.apple.iokit.IOStorageFamily(2.1)[DFD9596C-E596-376A-8A00-3B74A06C2D02]@0xffffff7f84b83000
dependency: com.apple.iokit.IOReportFamily(47)[769D4408-2D1B-3B65-89D1-4C3C547099E3]@0xffffff7f85407000
BSD process name corresponding to current thread: kernel_task

I have tried to debug this timeout, which always happens at random times but there is a commonality - it only happens when using the IGP and the display is sleeping.

The IGP going into a low-power mode seems to disrupt power to the NVMe, causing it to crash/reset, and thus causing the timeout. The NVMe keeps smart statistics on power offs, and I have recorded this anomaly:

Power Cycles: 3,814
Power On Hours: 202
Unsafe Shutdowns: 3,794

I have not been able to figure out exactly how the IGP is causing the NVMe to lose power, but I suspect it may be related to this issue (RC6)

I modified the CFL FB kext with these changes, which seems to completely solve the KP issue:
<key>RenderStandby</key><integer>0</integer>
<key>SetRC6Voltage</key><integer>1</integer>
<key>SupportPSRwithExternalDisplay</key><integer>0</integer>

Have you guys seen issues relating to IGP power saving causing any similar problems? I'm thinking there might be a way to work around this in Whatevergreen or NVMeFix to avoid having to create a plist-only kext to change these settings.

The text was updated successfully, but these errors were encountered:

0xfeedface-turbo · 2020-10-02T22:13:17Z

I forgot to mention that I spent a lot of time troubleshooting this before discovering.

Different NVMe cards, different motherboards, NVMe heatsinks, built-in M.2 slots vs PCI adapter cards, UEFI PCI power settings, enable/disable ASPM etc, the kernel panic always reoccurred. Sometimes the VID/PID would read as 0xffff

Onboard PCH IGE, AHCI, USB never had an issue at all, only NVMe. I'm guessing it's some kind of UEFI firmware bug?

07151129 · 2020-10-02T23:17:33Z

That's an extremely curious bug, thanks for suggesting a fix. I think force disabling RC6 by default in the FeatureControl dict of the framebuffer IORegistryEntry is a good immediate solution.

Were you able to isolate the issue just to a single key of this dictionary?

Worth mentioning you can also disable render standby by passing bootarg forceRenderStandby=0.

0xfeedface-turbo · 2020-10-03T16:29:35Z

Thanks for the tip on the bootarg. I am pretty sure that's it.

It can take hours for the panic to happen, but I set RenderStandby back to 1 and I got a panic almost immediately. I have reverted the previous changes and am testing just with forceRenderStandby=0 right now and it hasn't KP so far.

I am not sure the power impact with this change? This is a desktop system, but the same problem could be happening with laptops. One of the linux posts mentions disabling coarse power gating as the better option. There is a key CoarsePowerGatingSelect but I haven't deduced what the values mean yet.

07151129 · 2020-10-03T17:54:19Z

RenderStandby refers to RC6, the lowest-power idle render state. It has been notoriously buggy and required workarounds, both in Linux and Windows.

Coarse power gating is another mechanism used in GEN9 to transition Render and Media engines to sleep. The two appear to be independent in principle. The CoarsePowerGatingSelect bits 0 and 1 are used to enable Render and Media CPG, respectively. An older version of i915 used to disable Render CPG https://patchwork.kernel.org/patch/6193051/, but apparently it is now enabled along with RC6.

0xfeedface-turbo · 2020-10-03T18:54:37Z

Thanks for the info, it has saved me a lot of time!

I did some testing with RenderStandby=1 and CoarsePowerGatingSelect=0 and I was actually able to get the same NVMe crash with the display ON for the first time. Do you know what bit 2 is used for? The default in the CFL FB kext is 4, and disabling that bit seems to make a difference.

Setting forceRenderStandby=0 in boot-args solves the crashes completely.

Intel Power Gadget reports that the IGP frequency never drops below 350mhz and total power consumption is approximately 1W higher than with RenderStandby enabled.

I'm still at a loss as to why RC6 on the IGP would be affecting the NVMe at all, though.

07151129 · 2020-10-03T22:20:54Z

CoarsePowerGatingSelect=4 uses the value from the platform info struct at offset 0x58 (gPlatformInformationList, see IntelFramebuffer.bt) to configure CPG:

AppleIntelFramebufferController::getCPGControl
...
    cpgsel = OSMetaClassBase::safeMetaCast(v3, OSNumber::metaClass);
    if ( cpgsel )
    {
      cpgsel = (cpgsel->vtbl->unsigned32BitValue)(cpgsel);
      if ( cpgsel != 4 )
        goto LABEL_7;
      this->CoarsePowerGatingSelect = 0;
      v4 = this->platformInfo->member22;
      cpgsel = (&dword_0 + 2);
      if ( _bittest(&v4, 0x10u) )
      {
        this->CoarsePowerGatingSelect = 1;
        cpgsel = (&dword_0 + 3);
      }
      if ( _bittest(&v4, 0x11u) )
LABEL_7:
        this->CoarsePowerGatingSelect = cpgsel;
    }

It's a complete mystery why there is interference between GPU and PCI. If you can reproduce it on Linux with i915, then this could be reported to Intel.

07151129 · 2020-10-07T12:30:57Z

By the way, value CSTS=0xffffffff also looks suspicious according to the spec.

A similar bug in Linux: https://bugs.freedesktop.org/show_bug.cgi?id=108546. Apparently, it is a BIOS issue, although in that case intel_idle.max_cstate=1 i915.enable_dc=0 i915.enable_fbc=0 did not help.

references acidanthera/bugtracker#1193

vit9696 · 2020-10-07T16:05:37Z

Thanks for your help! Added a comment to WhateverGreen FAQ. Other FAQs will also need to be updated.

CC @Andrey1970AppleLife @khronokernel @PMheart

Mateo1234454545 · 2020-10-08T12:02:16Z

I added forceRenderStandby=0 boot arg as well , and IGPU is stacked at 0,3ghz.

malhal · 2020-11-12T22:42:41Z

Maybe this state is when TRIM runs and it is crashing? Try sudo trimforce disable and reboot. If re-enabling then it is recommended to run disk first aid.

blodt · 2021-09-04T21:46:25Z

It's back doing it again on my machine after a month or so of no issues

Getting more consistent too

malhal · 2021-09-04T21:55:33Z

I haven't had this panic since I disabled TRIM

blodt · 2021-09-05T15:44:06Z

I haven't had this panic since I disabled TRIM

Will try that - thank you!

Mateo1234454545 · 2021-09-07T17:47:59Z

I haven't had this panic since I disabled TRIM

How did you disable trim?
Tried your command but at reboot nvme trim is still enabled.
Maybe this command is only for sata3 ssd?

1alessandro1 · 2021-09-08T08:08:03Z

@Mateo1234454545

Ensure ThirdPartyDrives kernel patch is set to False
Try sudo trimforce disable
Set SetApfsTrimTimeout to 999 which is the minimal timeout

blodt · 2021-09-20T21:00:32Z

I ended up having to do a fresh Big Sur install and restore my install from Time Machine

That all went great and I'm back up and running with no freezes again and I've used @1alessandro1 tips/settings above in hopes that might cure it long term.

I don't think I will really know for a month or so, as that's how long the freezing issue took to reappear after the last time I did all this.

I'll report back in hopes of helping anyone else down the line.

Thank you all

vit9696 added the project:green label Oct 2, 2020

vit9696 added a commit to acidanthera/WhateverGreen that referenced this issue Oct 7, 2020

Add more connector flag definitions

9dd0233

references acidanthera/bugtracker#1193

vit9696 closed this as completed in acidanthera/WhateverGreen@baa1dc0 Oct 7, 2020

vit9696 mentioned this issue Nov 13, 2020

Any fix for Intel 600p NVMe Drive? #1286

Closed

brucespang mentioned this issue Apr 8, 2021

Occasional nvme kernel panic #1598

Closed

profzei mentioned this issue Jun 11, 2021

macOS 11+ restarts unpredictably profzei/Matebook-X-Pro-2018#171

Closed

This was referenced Aug 21, 2021

Frequent and Random system crashes daliansky/XiaoMi-Pro-Hackintosh#218

Open

之前用的好好的，昨天晚上开始就无规律重启，我都快要奔溃了，怎么办 daliansky/XiaoMi-Pro-Hackintosh#238

Open

profzei mentioned this issue Sep 24, 2021

3.0.0 have some problem profzei/Matebook-X-Pro-2018#186

Closed

JeromeAstero mentioned this issue Sep 27, 2021

NVMeFix Kernel Panic #1803

Closed

MagicianLjj mentioned this issue Oct 9, 2022

menterey12.5.1 最新版3.0.6efi 耗电巨大。。 daliansky/XiaoXinPro-13-hackintosh#217

Closed

shiecldk mentioned this issue Jan 27, 2023

[Kernel/FB] ScreenPad display static on FB initialization Qonfused/ASUS-ZenBook-Duo-14-UX481-Hackintosh#4

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IGP causes NVMe Kernel Panic CSTS=0xffffffff #1193

IGP causes NVMe Kernel Panic CSTS=0xffffffff #1193

0xfeedface-turbo commented Oct 2, 2020

0xfeedface-turbo commented Oct 2, 2020

07151129 commented Oct 2, 2020 •

edited

Loading

0xfeedface-turbo commented Oct 3, 2020

07151129 commented Oct 3, 2020

0xfeedface-turbo commented Oct 3, 2020

07151129 commented Oct 3, 2020

07151129 commented Oct 7, 2020 •

edited

Loading

vit9696 commented Oct 7, 2020

Mateo1234454545 commented Oct 8, 2020

malhal commented Nov 12, 2020

blodt commented Sep 4, 2021

malhal commented Sep 4, 2021

blodt commented Sep 5, 2021

Mateo1234454545 commented Sep 7, 2021

1alessandro1 commented Sep 8, 2021

blodt commented Sep 20, 2021

IGP causes NVMe Kernel Panic CSTS=0xffffffff #1193

IGP causes NVMe Kernel Panic CSTS=0xffffffff #1193

Comments

0xfeedface-turbo commented Oct 2, 2020

0xfeedface-turbo commented Oct 2, 2020

07151129 commented Oct 2, 2020 • edited Loading

0xfeedface-turbo commented Oct 3, 2020

07151129 commented Oct 3, 2020

0xfeedface-turbo commented Oct 3, 2020

07151129 commented Oct 3, 2020

07151129 commented Oct 7, 2020 • edited Loading

vit9696 commented Oct 7, 2020

Mateo1234454545 commented Oct 8, 2020

malhal commented Nov 12, 2020

blodt commented Sep 4, 2021

malhal commented Sep 4, 2021

blodt commented Sep 5, 2021

Mateo1234454545 commented Sep 7, 2021

1alessandro1 commented Sep 8, 2021

blodt commented Sep 20, 2021

07151129 commented Oct 2, 2020 •

edited

Loading

07151129 commented Oct 7, 2020 •

edited

Loading