You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ubuntu Studio 22.04.2 LTS ISOs (ubuntustudio-22.04.2-dvd-amd64.iso) now uses a casper file (casper/filesystem.squashfs) that is larger than 4 GB.
Because of this, we can't use FAT32 with ISO mode but instead must switch to NTFS.
Because of this, we need to chain load the Ubuntu bootloaders (shim + mokmanager + GRUB) through UEFI:NTFS.
If Secure Boot is disabled everything is fine and you get to the GRUB menu.
If Secure Boot is enabled, and the embedded UEFI:NTFS read-only NTFS driver is used, shim + mokmanager + GRUB boot freezes (most likely in mmx64.efi but needs to be validated) and you never get to the GRUB menu.
If Secure Boot is enabled and an external firmware provided read/write NTFS driver is used (tested on an Intel NUC), shim + mokmanager + GRUB works as expected (NB: We obviously tested this with UEFI:NTFS being invoked, not with direct boot of the NTFS partition).
Oh and of course, when using DD mode, and therefore when the shim resides on a FAT partition, everything works (though you do get an Error: file /boot/ not found! message, that doesn't seem to have much of an impact).
So we have an incompatibility somewhere between the UEFI:NTFS bootloader or the UEFI:NTFS ntfs-3g read-only driver and one of shim/mokmanager/GRUB, with the current most likely suspect being mokmanager (which doesn't seem to be invoked when Secure Boot is disabled) shim (WTF?!?) possibly needing/expecting r/w access to the ESP with one of the code changes introduced in https://github.com/rhboot/shim/commits/main between 2021.07 and 2023.01.18.
Unfortunately, and unlike what UEFI:NTFS does with is very detailed and verbose output, the Red Hat/GRUB/Ubuntu folks borrowed the most patronizing page from Microsoft's rulebook that says "you should hide scary boot details from the user and give them a nice empty screen since it'll looks pretty" and ran with it, with the result that they chose not to provide a single point of information about what's currently happening that could give us any clue as to where their process chokes.
Which means that we now have to find a needle in the blind haystack of shim + mokmanager + GRUB to try to figure out what is really happening.
Things we tested:
ubuntu-22.10-desktop-amd64.iso (with casper < 4 GB) written in UEFI:NTFS mode boots fine under the same conditions
ubuntu-22.04.2-desktop-amd64.iso (with casper < 4 GB) written in UEFI:NTFS mode has the same issue
A quick comparison between the above shows that all of bootx64.efi, mmx64.efi and grubx64.efi are different, so it's possible that this is a known Shim/MOK Manager/GRUB issue that has been fixed in the most up to date versions, and that will be picked by LTS eventually.
Removed/renamed grubx64.efi and mmx64.efi → Still froze, which seems to indicate that the issue is with the shim (unless shim is designed to halt if it can't find MM/GRUB).
Replaced 22.04.02 LTS Shim with 22.10 Shim → This works. So, correlated with the above, it looks pretty safe to say that the issue is in the Linux Shim. However the thing that now worries me is that the Shim that doesn't work (937 KB) was signed Wednesday 18 January 2023 02:37:51 whereas the Shim that does work (933 KB) was signed Thursday 12 August 2021 22:00:22, which would tend to hint that it's the newer shims that introduced breakage and that future versions of Ubuntu will all have the issue. WTF did Red Hat do to break boot?!?
Tried a boot in DD mode to see if the ESP was altered but the ESP was identical before and after boot. This would tend to indicate, though it does not exclude it totally, that this isn't a rw vs ro issue...
Use a r/w version of our ntfs-3g driver → Same issue, so this validates that this is not a rw vs ro issue.
Use a different NTFS driver. Using the old (ro) GPLv3 NTFS driver from https://efi.akeo.ie/downloads/efifs-1.9/x64/ fixes the issue → Goddammit Microsoft, if you didn't bullshit the world and refuse to sign GPLv3 binaries, that's the driver we would use in the first place and we wouldn't be in this mess!
Use an MSCV/gnu-efi compiled version of the driver rather than a gcc/EDK2 one → Still fails. So this is not a toolchain issue...
Enable NTFS debug in our driver to try to see what is being accessed when the freezout occurs:
Hmmm, so we fail in ntsf_readdir() most likely after the ntfs_attr_open() call and in a code secton triggering a goto err_out; jump but not on a goto dir_err_out; jump, since the latter would override the E2BIGerrno we get to EIO instead. On that subject, I can't locate any part of the ntfs-3g code that would explicitly return E2BIG, so it looks like this error code is being returned from the if (HookData->Info->Size < ((UINT64)NameLen + 1) * sizeof(CHAR16)) check in DirHook().
Yup, that's where we choke. The RH shim is issuing a Read() of the directory with a 0 sized buffer and this is throwing our driver off:
Looking at the UEFI specs for EFI_FILE_PROTOCOL.Read() (Section 13.5 File Protocol) it seems that our issue is that we are returning 0 for the size, whereas we should return the minimum required size, per:
If This is a directory, the function reads the directory entry at the file’s current position and returns the entry in Buffer. If the Buffer is not large enough to hold the current directory entry, then EFI_BUFFER_TOO_SMALL is returned and (...) BufferSize is set to be the size of the buffer needed to read the entry.
So it would look the problem is that the shim is expecting the driver to return the minimum required size to read the buffer (per specs) and adjusting its request to use the returned size until it gets a successful read, but since our NTFS driver is returning 0 instead of the required size (non specs compliant), whatever loop the shim uses to read the directory loops forever.
Aaaand, this is the same issues as the one reported in Don't loop forever in load_certs() with buggy firmware rhboot/shim#547, for which Red Hat have now applied a workaround. Well, at last now we know what the root of the issue is and what's required to fix it... This could also explain some of the issues reported by folks using Dell computers with their UEFI firmware freezing when a UEFI:NTFS drive is plugged...
The text was updated successfully, but these errors were encountered:
While we were at it, we also submitted a PR to improve the Shim code (that currently just bails out on non-compliance, but could try to allocate buffers with increased size, thus ensuring that the directory listing succeeds regardless).
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue if you think you have a related problem or query.
This is basically an investigation of the issue that reported on askubuntu.com and that I have also been able to replicate on another system.
To cut a long story short:
ubuntustudio-22.04.2-dvd-amd64.iso
) now uses a casper file (casper/filesystem.squashfs
) that is larger than 4 GB.most likely in mmx64.efi but needs to be validated) and you never get to the GRUB menu.Error: file /boot/ not found!
message, that doesn't seem to have much of an impact).So we have an incompatibility somewhere between the UEFI:NTFS bootloader or the UEFI:NTFS ntfs-3g read-only driver and one of shim/mokmanager/GRUB, with the current most likely suspect being
mokmanager (which doesn't seem to be invoked when Secure Boot is disabled)shim (WTF?!?) possiblyneeding/expecting r/w access to the ESPwith one of the code changes introduced in https://github.com/rhboot/shim/commits/main between 2021.07 and 2023.01.18.Unfortunately, and unlike what UEFI:NTFS does with is very detailed and verbose output, the Red Hat/GRUB/Ubuntu folks borrowed the most patronizing page from Microsoft's rulebook that says "you should hide scary boot details from the user and give them a nice empty screen since it'll looks pretty" and ran with it, with the result that they chose not to provide a single point of information about what's currently happening that could give us any clue as to where their process chokes.
Which means that we now have to find a needle in the blind haystack of shim + mokmanager + GRUB to try to figure out what is really happening.
Things we tested:
ubuntu-22.10-desktop-amd64.iso
(with casper < 4 GB) written in UEFI:NTFS mode boots fine under the same conditionsubuntu-22.04.2-desktop-amd64.iso
(with casper < 4 GB) written in UEFI:NTFS mode has the same issuebootx64.efi
,mmx64.efi
andgrubx64.efi
are different, so it's possible that this is a known Shim/MOK Manager/GRUB issue that has been fixed in the most up to date versions, and that will be picked by LTS eventually.grubx64.efi
andmmx64.efi
→ Still froze, which seems to indicate that the issue is with the shim (unless shim is designed to halt if it can't find MM/GRUB).Wednesday 18 January 2023 02:37:51
whereas the Shim that does work (933 KB) was signedThursday 12 August 2021 22:00:22
, which would tend to hint that it's the newer shims that introduced breakage and that future versions of Ubuntu will all have the issue. WTF did Red Hat do to break boot?!?ntsf_readdir()
most likely after thentfs_attr_open()
call and in a code secton triggering agoto err_out;
jump but not on agoto dir_err_out;
jump, since the latter would override theE2BIG
errno
we get toEIO
instead. On that subject, I can't locate any part of the ntfs-3g code that would explicitly returnE2BIG
, so it looks like this error code is being returned from theif (HookData->Info->Size < ((UINT64)NameLen + 1) * sizeof(CHAR16))
check inDirHook()
.Read()
of the directory with a 0 sized buffer and this is throwing our driver off:EFI_FILE_PROTOCOL.Read()
(Section 13.5 File Protocol) it seems that our issue is that we are returning0
for the size, whereas we should return the minimum required size, per:The text was updated successfully, but these errors were encountered: