Skip to content
Ulrich Stern edited this page May 8, 2023 · 16 revisions

Dell Issues

System board and BIOS issues with Dell machines.

Reason for this write-up

I used to be a fan of Dell machines -- we purchased 12 Dell XPS desktops over the years in our neurobio lab as lab servers, but the recent XPS 8940 and 8950 have been unreliable enough that I am considering switching vendor. Reliability is quite important for us since the machines often control experiments running over multiple hours or even days, and a crash can set a project back for multiple weeks since it can cause data loss for many experimental animals (fruit flies in our case) that take time to grow.

My hope is that this write-up will be useful for

  • Dell engineers to help them debug their machines.
    • I tried contacting Dell Support about the issues, but the experiences felt like a waste of time. To be fair, help with sporadic issues is difficult, but I describe one of the Dell Support experiences below showing that Dell Support could improve.
    • I continue to be impressed about the reliability of our older (pre-8940) XPS machines.
  • Dell customers experiencing random crashes on the XPS 8940 and 8950, or potential customer considering these machines for applications requiring high reliability.

XPS 8940

The XPS 8940 with 11th-gen Intel CPUs can sporadically freeze (crash), requiring a reboot, the freezes are likely triggered by USB activity, and downgrading to 10th-gen CPUs fixes the freezing issue! (The XPS 8940 supports both 10th- and 11th-gen CPUs.)

Our data in support of this conclusion is relatively good:

  • we have 5 of the XPS 8940 in our lab.
  • the freezes happened for all of the machines using 11th-gen CPUs and under different versions of Ubuntu, 18.04, 20.04, and 22.04.
  • after downgrading two of the machines to 10th-gen CPUs (i9-10900K), the freezes on them never reoccurred.
  • maybe half of the freezes happened without user interaction during experiments, when fruit flies were tracked in real-time using multiple USB webcams causing relatively high USB load; most of the remaining freezes happened upon (USB) mouse movement, often to wake up the screen prior to attempting to log in.
    • due to the clear temporal correlation with mouse movement in a sufficiently large number of freezes, an issue with USB handling on XPS 8940 with 11th-gen CPUs seems likely the root cause of the freezes. In addition, we did not downgrade one of the XPS 8940 to a 10th-gen CPU since it is not used for tracking; the machine has never frozen under weeks of heavy load analyzing genomic sequencing data when operated remotely, but it has frozen under both minimal and heavy load with local user interaction (mouse and keyboard).
    • /var/log/syslog never contained any info that looked like an Ubuntu kernel issue is to blame for the freeze, regardless of Ubuntu version.
  • it seems likely (my guess: >80%) Dell's BIOS is the root cause of the freezes, since this seems more likely than alternative explanations like a bug in Intel's 11th-gen CPUs or a bug in all three versions of Ubuntu we use or used on the machines.
    • the one remaining XPS 8940 we have on an 11th-gen CPU does not have as heavy USB load, so it is harder to see whether Dell's latest BIOS may have fixed the issue. The machine did freeze twice (both cases USB-triggered) with BIOS 2.10.0 (released 10/11/22), and from the Fixes & Enhancements section of the driver details page of Dell's currently latest BIOS 2.11.1 it does not sound like the issue was addressed.

XPS 8950

The XPS 8950 may have a system board issue that causes sporadic memory errors, which, in turn, can cause crashes, data corruption, etc. It is unclear whether this is limited to our particular machine (we have only a single XPS 8950) or a design flaw in the system board.

Here the supporting data:

  • I have been running MemTest86 on the machine for the majority of the time since we bought it in May 2022. MemTest86 regularly identifiers memory errors -- sometimes after running less than day, sometimes after multiple days.
    • the errors appear at random locations.
    • MemTest86 has high confidence these types of errors are real -- "intermittent errors are without exception valid" (source).
    • due to the high reliability requirements of our applications, a memory error every few days rules out the machine.
    • for a detailed log of the errors detected in 2023, see XPS 8950.
  • to rule out an issue with the original memory modules in the machine, I replaced them with an XPS 8950-compatible Crucial module (CT16G48C40U5), but this did not solve the issue.
  • the memory speed dropped to about 17.8 GB/s starting with BIOS 1.7.0 from about 24.4 GB/s with earlier BIOS versions. The slower speed did not solve the issue.
  • during a call with Dell Support about the memory errors, I was asked to run their built-in (BIOS) diagnostics; the memory passed, which is not surprising for a less than 1-h test. But the machine froze after the test while still inside Dell diagnostics! The Dell Support person I talked to insisted that only Dell's diagnostic counts for hardware issues, despite the fact that MemTest86 seems the industry standard for x86 memory testing.
  • given all the above, it seems likely there is a system board issue with our XPS 8950; I do not know whether this is limited to our particular board or a design flaw, and in the latter case, whether the design flaw has been fixed with a revision of the board. The MemTest86 reports for the memory errors state that our board is version A01, see XPS 8950.
Clone this wiki locally