-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stability problems since updating to 10 and 10.1 Pi4 8GB NVMe SSD via USB adapter #2536
Comments
Similar issues, catastrophic issues with OS 10.1 on Rpi4 never encountered before EDIT: this made me give up on Rpi4, went to HAOS VM on Proxmox, 100% stable. Cheaper, much more robust with builtin NVMe, using the same power. |
Similar issues. I happened to be upgrading from SD card to SSD during this OS 10.0 release and it was a nightmare. I also saw the similar SQUASHFS error. My HA runs smoothly like for 1 years without any issue. It crashes a few times already during the past week. I'm not sure if it is OS 10's issue or is it because of this new SSD hardware. Let's see if I have more information in the coming weeks. Edit: I'm on OS 10.1 and it still crashes once. |
Have exactly the same issue and obviously lots of people do. Since 10.0 my HA crashes at least every second day. Before the Update it ran seamlessly on a Raspberry Pi 4b with an external SSD. |
Hi, I just wanted to update on this issue. I have had no crashes since I downgraded to HA OS 9.5, uptime is now since 9 May 2023 at 18:17, which was a normal host reboot. I also updated to core 2023.5.2 again, that does not seem to cause any problems. |
I'm having similar issues, it's behaving like a system, starting with HomeAssistant OS 10.0, where the hard-drive got removed while running. Things keep running, but the longer they do, the less functions. The status page on port 4357 isn't available, the dashboard loads, but everything on it fails to display, error while loading setting pages,...
the app failing to connect locally, automations stop running, Addons crashed, you get the picture. And since there are no logs kept after a restart, it's impossible to get an idea why the system went into this state. I'm running the system on a Raspberry Pi 4B+, with an external SSD from the supported RPi SSD-Adapter list and I've even reflashed the OS 2 times already, before restoring my backup - it still keeps happening every 48 hours. I've attached a capture card, to be able to see what the system prints out, when it crashes again. |
I have experienced the same thing with my Yellow/NVMe combo and symptoms persisted after an in-place downgrade to 9.5 until I did a full reinstall of HAOS from scratch (i.e. wiping the NVMe and reinstalling using USB mass storage mode, as well as uploading a known stable firmware). I have some serial logs showing that journald can't access its log directory, squashfs errors etc., but they don't show the initial stages of the problem. Once the connection to the drive is lost, there appears to be no way to get a root shell or access kernel logs even with a serial connection. A reboot fixes the issue for some time (a few hours to days), but all logs from the beginning of the fault are wiped. Since this persisted after the in-place downgrade, I think that the latest firmware maybe to blame, but I don't have anything approaching "proof" for this hypothesis. Hardware:
|
It happened again today, but the screen wasn't on, so I couldn't capture any screenshots etc. Will try to see if I can gather logs any other way. |
Since we're all speculating, maybe 10.1 just uses more power. I used to have this power supply: I now changed to this power supply: This seems to support this hypothesis: #2513 from here https://community.home-assistant.io/t/installing-home-assistant-on-a-rpi-4b-with-ssd-boot/230948 :
EDIT: Still rock solid, no crashes, since I downgraded to HA OS 9.5 |
I did, 3A PS for the Pi 4 PLUS externally powered SSD. Didn't make a difference. |
Would be nice to have rsyslog as an addon maybe. |
I'm using a POE adapter, that gives up to 15W in 5V mode (= 5V 3A) and I don't think the upgrade resulted in a higher power draw, especially, since the drive is still signaling some activity even after it crashed. I haven't researched on how to access the hypervisor directly via SSH without any Add-on, in order to be able to access logs directly, maybe I'll have some time next week to look into it. I've tried to connect via port 22222 before, but the sshd daemon seemed unresponsive, once the system crashes. |
The first thing I did once the system started to crash was to replace my old 15 W power supply with one that has 20 W. I think this rules out any concerns regarding the power consumption. I can also confirm that it is no longer possible to SSH into the host (admin via port 22222) once the system becomes unresponsive. I did, however, notice that recently one of my custom integrations stopped polling. I have a Riemann sum sensor based on one of the integration’s native sensors which then had really strange values. I replaced the custom integration by self-configured REST entities two days ago and since then the system seems to be stable. Not sure though if this is related in any kind… |
I am pretty confident it is not the power supply. After the first incident, I replaced the stock power supply that came with the Yellow with a 36 W battery-buffered Eaton 3S Mini and it still happened again. |
@mundschenk-at unfortunately several reports of Samsung 980 1TB models not behaving well with Yellow. But since that was also in 9.5, and on Yellow, this is not related to the topic at hand. The symptoms might look similar, but the cause is different: On Yellow the NVMe is directly attached. On RPi 4 an USB to NVMe adapter is being used. Unfortunately the exact cause for Samsung 980 Pro missbehaving on Yellow is unknown. At this point I can only suggest to use a different NVMe (note also that the Samsung 980 Pro is a bit overkill for Yellow: The CM4/BCM2711 only uses PCIe Gen 2 x1). See also: #2235 (comment) |
This isn't related to that Samsung SSD though, I'm using an Intenso M.2 SSD TOP 128GB as well as this SSD Adapter from SSK. Also in addition to the POE power supply I'm using, my Pi also is mounted inside a PI-TOP [4], which provides battery power to the Pi 4, so undervoltage or an unreliable power supply shouldn't be any problem here. The problem is persisting with the latest version 10.2 btw and I haven't been able to extract any useful logs afterwards.. :/ |
@danir-de right, this Thread is not related to Samsung NVMe SSDs or Yellow. @mundschenk-at your case is really off-topic here. This thread is about issues with NVMe and USB adapters connected to a Raspberry Pi. |
This is most likely related to Raspberry Pi's Linux kernel and/or firmware. There hasn't been an update to them since a while, so this is kinda expected. Are you using USB boot? Can you try to use SD-card boot along with the data disk feature to see if that works better? |
After the latest 'incident' I accessed the OS via SSH (admin / port 22222) and did some research in the journalcrl. I found that there are no entries for almost two hours until the time noticed that the system got unresponsive and restarted the hardware: Jun 06 06:07:52 homeassistant addon_77113f40_powerbox-mqtt[567]: END Reading value of ... There is also nothing before that event in the logs that would look suspicious. The system just suddenly hangs and does not even write logs anymore. I updated to 10.2 a few days ago and this was the first crash since the update. I'm using this external SSD: https://www.amazon.de/dp/B085TL8W6V?psc=1&ref=ppx_yo2ov_dt_b_product_details |
I can't SSH into my pi anymore after it hangs. |
Me neither but after a reboot you can still see the old logs using journalctl. |
That was not clear before. I'll create a separate ticket. |
That unfortunately does not help much when the disk with those logs becoming offline is the issue at hand. |
Same issue here when using 10.1. I have a usb adapter to a 2.5 SSD (not an NVMe). I went back to 9.5 a month or so ago and it’s been solid. I came to check the bug reports to see if others had the same issue and it seems so. This reminds me of an issue back in late 2020 early 2021 where it came down to an issue with the Pi firmware after months of people debugging differences. |
Okay, went back to 9.5 a week ago and it's running stable ever since. Do you see a problem staying there for a longer period of time? Doesn't look like there will be a fix in the near future? |
This still is a problem with the latest 10.3 release. |
Just follow the procedure described here: https://developers.home-assistant.io/docs/operating-system/debugging/ It looks worse than it is and is very useful in many occasions... |
|
|
I'm going to stay, optimistic here. How hard can it be to blacklist: 0bda:9210 Realtek Semiconductor Corp. RTL9210 M.2 NVME Adapter for the @HA_dev_Team? EDIT: Pi4 SSD USB Boot support is STILL broken with kernel 6.1 |
@markusmauch Add: |
I can confirm this. I changed the /mnt/boot/cmdline.txt accordingly and updated to 10.3. I'll keep you informed... |
Sad to hear that. I really hoped blacklisting the uas-mode works. Searching for...
leads to articles about problematic U1/U2 implementations which had to do with LPM (low power mechanism) of USB3-Devices. Disabling LPM for your USB-Device might be worth a try. |
@Baxxy13, I think you got me wrong. I said, "if"
|
this is really frustrating.. I can't use network storage while on 9.5, I need to be on 10.0+. No one is assigned, I anyone from the development team even looking at this? |
I had the same problem, wasn't even able to startup HA on Raspi4, first it keeps checking the file system and then SquashFS errors flood the logs. |
I unplugged my ssd and ran my old SD card with Raspian for deconz to check my current firmware Version.
Maybe I should update the firmware ?
If so:
Or might I run into similar problems as this guy? Thx for letting me know, I will now boot back into my SSD and await your reply P.S. Might this eventually help with SSD issues for kernel 6.1 on raspberry pi ? |
This did do the trick. My system ran stable on 10.3 for a whole week. I updated to 10.4 yestarday and so far it looks good. Thanks for the assistance! |
Tell me about it, it really is! Just search this bug tracker for issues with the label We've been tempted to declare USB SSD unsupported entirely. But on the other hand, there are configurations of USB SSD adapters + disk + Raspberry Pis which do work really reliable. It just seems a huuuge hit and miss. 😢
I follow it loosly. In the end, Raspberry Pi and USB SSD has a painful journey since.. forever, essentially. It was one reason we created Yellow: Proper NVMe SSD using M.2/PCIe did help to alleviate a lot of the problems. This is the technology PCs and Notebooks are using, and sidesteps all the USB powering and USB UMS protocol issues. Granted it seems that the Raspberry Pi SoC also has troubles to talk to some high end NVMe's such as Samsung 980 Pro and WD_BLACK NVMe SSD, unfortunately 😢 , but pretty much every other NVMe really works rock solid. That said, I'd really hope that USB SSD support gets more stable as time progresses. But we rely on the progress of the Raspberry Pi kernel. I am waiting for a new Raspberry Pi Linux kernel release, but it seems they stopped releasing regularly, their last release was in April, see https://github.com/raspberrypi/linux/tags 😩 |
Just thinking... Why not disabling usb-uas mode totally for the Pi4B HA-OS? Ok, performance throughput is a bit lower with usb-storage but this isn't really appreciable within the running HA-OS. Sidenote from my last adapter-testing: |
For the sake of reliability, stability and my sanity is there an easy migration path to the other supported solution with SSDs with having HAOS installed on the SD card and HA on the SSD? Would it be a case of creating a backup in my current configuration of it all on an SSD, installing a new instance direct on an SD card, restore the backup to the SD card and then migrate the data disk back to the SSD? Or would it be better to migrate the data disk and then restore the backup? The back up is universally supported as long as its a supervised instance of HA right regardless of what hardware and configuration i stick it on? |
I just installed my new RaspBerry 8GB 4+ and wrote an SSD (a cheap brand, LITEON or something like that, data-disk: USB3.0-Super-Speed-DD564198838B0) with Balena Edger. It would not boot, it got stuck at SquashHfs errors. Then I wrote an SD card and tested: the desktop runs just fine, so the sbc is okay. After reading all of the above (to the letter) and rewriting the SSD 3 times, trying and trying while I am reading, in the end I came up with my own solution. I pushed the USB plug not in the USB3 (blue) outlet but in the USB2.0 outlet of my RaspBerry. I do not think I need to update more here, I did a temporary registration, and I solved the problem for myself. Note: my H.A. version is 10.4, downloaded and flashed it today. If changing the USB cable had not worked, I would have gone for the docker solution. I would boot from SD and from there on use docker on the SSD (if that would work, I have not tested it and leave it here for others as an idea). |
Just to report back, my system runs stable as well, since I blacklisted UAS in /mnt/cmdline.txt EDIT: |
I would like to report my experience:
After i found this issue on GitHub, i checked UAS. With HA OS 9.5 it was using usb-storage, with 10.5 it does so too. |
@DaniEll-AT I'm on 10.5 and it still happens... Maybe it's a different issue but the symptoms seem to be the same. |
Downgraded to 9.5, and so far, so good... |
This also worked for me, I've been running it rock stable for over a month now. Also not my favorite solution, it also seems slower on bigger operations like updating and creating backups but everything else seems to be running just as before. For everyone still suffering, try this: #2536 (comment) If it's not helping, maybe you have another problem? |
Anyone tried with 11.x? |
I am running
Survived several core & OS upgrades, working perfectly so far for around 3 months (touch wood 🤞 - hope I haven't cursed myself by mentioning this...). |
Does somewone know if it's still necessary to blacklist the Orico SSD with Home Assistant OS 11.1? |
There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. |
@agners It crashes far to often, only I cant blacklist any usb adapter, since its pci express. I'm pretty sure I should open a new thread for this, could you point me in the right direction?
|
Describe the issue you are experiencing
Hi there,
I just want to add to this. I have a:
Pi 4b 8GB
USB Boot to ORICO SSD Portable External 128GB Mini M.2 NVME
I updated from HA OS 9.5 to 10.0, the day it was released and it has been a nightmare since. I read that some people were not even able to boot when they updated with a similar NVME SSD Pi4 hardware configuration.
#2479
Luckily mine did, it just kept crashing every 5 hours or so. I connected the HDMI and saw that it was the SQUASHFS becoming read only and journald errors.
compare:
https://community.home-assistant.io/t/squashfs-error-ext4-fs-error/293167
I since changed the power supply from a 20W 4 Ampere to a macbook usb-c charger and updated to HA OS 10.1 which, brought some stability improvement. But still it crashed, then about every other day.
Today I rolled back to HA OS 9.5:
ha os update --version 9.5
ha core update --version=2023.1.7
and its currently migrating my DB back
so it's still very busy. It yet remains to be seen whether I get my old regular 1 month or more uptime without crashes. I really hope so.
This is not OKAY!
I suspect it has something to do with the following 'features', from release notes:
What operating system image do you use?
rpi4-64 (Raspberry Pi 4/400 64-bit OS)
What version of Home Assistant Operating System is installed?
10.1
Did you upgrade the Operating System.
Yes
Steps to reproduce the issue
1.Upgrade from 9.5 to 10.0
2.Upgrade from 10.0 to 10.1
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
System Information
Home Assistant Community Store
Home Assistant Cloud
Home Assistant Supervisor
Dashboards
Recorder
Additional information
I downgraded to 9.5 today
I also posted this on the forum:
https://community.home-assistant.io/t/home-assistant-os-10-update-has-broken-my-pi-4b-4gb/561918/24
I hope you are aware that many Pi4b users have a very unstable system at the moment.
The text was updated successfully, but these errors were encountered: