-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQTT not reconnecting after days of uptime vs. minutes / hours #320
Comments
This one is going to be hard to debug. I'll try to run an ESP8266 for some days, but I am afraid the issue comes from the ESPAsyncTCP library (or maybe there's a memory leak somewhere). |
Hi! I.e, not related to uptime, which is why when fully stopping mosquitto, the problem does not materialize? |
When this happens (when Homie is stuck), can you connect to the broker with another client? On your phone or computer? |
Not tried that specifically. SSH responds, but logons do not complete after entering username / password. Just hangs. After the powercycle, the Homie device on my desk remains in its stuck state, but other devices report in. I can test next time it happens, but pretty sure mosquitto is responding to the initial connect, but is not functional. Same as SSH. |
I forgot that I have a device that is running for some months now, without any issues, so there's no obvious errors in the code. Unfortunately, ESP8266 are small devices that might not be as bulletproof as other devices, so if your broker is not running rock stable, don't expect the ESP8266s to work great. |
Does the ESP8266 not have a watchdog timer? If it does, this may be the situation to enable it? |
Further testing shows that after restarting both wifi access points (same SSID, but different channels), some sensors don't fully reconnect to the MQTT server. Strangely they are marked as "$online true", but "$stats" are not updated. (Signal and Uptime). Restarted the mosquitto service and $stats began to update themselves again. Note that this is on a new box where mosquitto is running without issues, and a sensor on my desk was able to read/write to the service. |
Can you test with the AsyncMqttClient v0.7.0? |
No, its not fixed in 0.7.1 Firmware updated a number of devices to 0.7.1 and applied new config on the wifi box causing the devices to disconnect and reconnect to the wifi. A couple of devices didn't re-connect to the MQTT server correctly. They respond to ping, but no data is logged on the MQTT server, including the $stats values. Restarted mosquitto service, and data, $stats and sensor data, appeared for the MQTT server again. JJ |
@johnjore please wait for the next beta, things should stabilize. 😉 |
:) Thanks! |
I found that using QoS=2 may cause the problem, I had some such settings in nodered mqtt nodes. Do you have some like this? |
What QoS value have you had the most success with if 2 causes problems with client or server? |
Unfortunately, after a couple of tests I see that with qos 0 and 1 situation almost the same. It just stops reconnecting more rarely. In general I do not know what to do anymore. Among all devices I have the one that does not write to mqtt anything and it works surprisibgly stable, and those ones which write stop connecting more than once a day. I saw the mosquitto log and every time it happens it writes "connection time out ... disconnecting". |
I moved one of my sensor packs to a Raspberry Pi's USB port and logged the output to a text file to see what is going on. I removed the outputs from the sensors and some of the repetitive outputs, which is why the uptime jumps forward. During the night this is what happened:
While it reconnects to the WiFi, it never reconnects to the MQTT server. Restarted the MQTT Server:
Homie is now reporting data to the MQTT server as normal again. Any clues as to what is going on? |
Could it be due to the server IP changing from the crash. Need to check if this could effect homie and reconnecting to mqtt when I have a computer and look at the code |
Hm... doubt it as no ip addresses changed. Static ip on client and mqtt server. Only the WiFi Access Point changed, twice. |
Will try replicate when i have some time, but ive tested this before (while back, could of broke... ) but restarting the broker or ap, shouldn't get it stuck in a connection loop. Any success with the latest everything (mainly the esp8266 arduino core 2.4.0)? Looks like the mqtt client doesn't detect that its disconnected ..... Could use |
I upgraded all the libraries yesterday (including esp core 2.4) but during the night there was no change of AP. I've added the connected status to the serial output, and will report back when device switches AP and let you know how it goes. |
As a workaround I think we can reboot ESP on each "onDisconnect" event of AsyncMqttClient. It is not ideal but sounds as a solution in my case. How do you think? |
Its a sledgehammer approach, but better than the devices going offline, needing a hard-reset, or resetting the MQTT server. |
Ideally we should fix this issue and not work around it. So send me updates and progress @johnjore |
So, it finally happened again and multiple devices went offline at the same time. On the device with logging and updated to the latest packages at the time, "Homie.getMqttClient().connected(): 1" is reported throughout. After x hours, it never recovered and I power cycled it. Connected fine and all ok again. I believe a blip on the wifi caused the devices to disconnect, and never reconnect. |
As a workaround, in the main loop just use next code:
|
This wont help if the client doesn't recognise this it is actually disconnected. Sounds like a bug in the MQTT client. @johnjore is is possible to get the console log ? Did is show any event triggers ? |
Hope this helps. I washed the log file down from 435MB (too large for Notepad++) by removing large chunks of duplicate information like Wifi signal quality. Homie.getMqttClient().connected(): is always 1. Thousands of lines with "✖ setNodeProperty(): impossible now" were also removed. No events were deleted.
|
Intresting, will need to look into this more, but it's hard since it's not easly replicated |
Is it possible to force the MQTT client to disconnect when wifi disconnects? Worst case, a reboot command could be added when a wifi disconnect event happens, not really a solution, but better than needing to reboot the device manually, as it never recovers on its own as it is now. As an FYI, I currently use homie on all devices with a fixed 5V/3.3V supply like switches, dimmers, sockets and sensor packs. These don't have push buttons to reset the ESP, (integrated into the walls) so I need to toggle the fuse. This is anything but ideal, and preventing me deploying additional devices. |
Yes, maybe forcing mqtt lib to disconnect when wifi goes down. I'll have a think about other solutions Also just to be sure, what version of homie and version of Arduino core ? And any other details related please. |
Hi! didn't realize the post had been edited: homie-esp8266-develop at version 2.0.0 (Downloaded Jan 28th) JJ |
Same Problem on my side. WiFi disconnects and reconnects, but MQTT hangs in a reconnect loop (led blinking fast). |
The issue is odd, i need to swap over to mqtt to have a look and see if i can find the issue, but atm im pushed for time.... |
I also have this problem. Wifi reconnect works (homie clients are pingable), but MQTT won't come up again. This does not happen always. A second power cycle on the wifi AP (in my case a RPi, so a simple Best regards |
I do experience the exact same issue (led fast blinking and data no more sent to MQTT broker). Thanks |
There is an issue, where MQTT reconnects, but the LED is not switched off (so continue in "fast blinking".) So check, If there is really no MQTT traffic anymore. If there is no traffic: Do you use a hostname for MQTT or the IP address? |
Do you mean does my Homie client configuration is using an IP or a hostname to connect to MQTT? If so, my clients are using the IP. No FQDN used to connect. I have seen that I can ping my Homie clients but they were no more sending data to MQTT when their led was blinking fast. This morning, I have changed the source code of BootNormal.cpp file as follow |
See #545 : Fixed in 2.0.1-pre1 and #develop-v3 |
Hi!
My mosquitto server goes offline from time to time (separate issue to resolve), but when this happens, homie does not always reconnect after the server is rebooted and mosquitto is again running. I can only reproduce the issue after letting it run for several days, not minutes or hours.
Here I left it running for a few days, and this morning I have this message (after the server crash):
After 8h in this state, its not moved on and it never tries to reconnect, but other sensors have. I've not managed to figure out what, or why, it does not reconnect. As I can see other sensors are reporting in, the problem is not with the mosquitto server. Log file on the mosquitto server shows no traffic from this node when it is in this state.
After resetting the device, and then manually stopping the mosquitto service:
It all looks normal and when the mosquitto service is started, homie reconnects and all is good.
Versions: Using your mqtt client and homie (2) as of March 10th.
Any idea as to why it fails to reconnect after days of uptime vs minutes or hours?
Is there additional debug information I can provide?
JJ
The text was updated successfully, but these errors were encountered: