-
-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] tcp://ip:port connection ETIMEDOUT error doesn't match ZWaveErrorCodes.Driver_Failed
#3509
Comments
@vladbabii thanks for your issue, could you provide me some logs when this happens? I need both zwavejs and Zwavejs2mqtt logs. You can find instructions about how to get them here: https://zwave-js.github.io/zwavejs2mqtt/#/troubleshooting/generating-logs |
Here you go. I made a separate install with nothing private on it. |
The log file has the same docker stuff as before, connected to a second vera edge with ser2net with only one qubino device.
|
You can probably use a ser2net docker image (many available on docker hub) to map your zwave /dev/... device to a network port inside docker to reproduce this issue |
I'm not on my pc right now I will give a look at your logs on Monday |
@AlCalzone I think that https://github.com/zwave-js/zwavejs2mqtt/blob/master/lib/ZwaveClient.ts#L2232 |
ZWaveErrorCodes.Driver_Failed
@vladbabii do you have a foolproof way to reproduce this? I've tried with ser2net on a Raspberry Pi, but as soon as I reboot it from another shell, the socket gets closed and the driver destroys itself to force a restart. Note I'm working without Docker, but if necessary, I could try with it. |
With docker it happens every time. I can reproduce it like clock work. Let me try again with the latest docker image. |
Seems like it's fixed. I had to pull latest image ( from 17 hours ago according to docker hub) it detects the issue then tries to reconnect until it succeeds. When i kill ser2net
When i reboot - so connection is dropped from one side with no immediate tcp notification
going up until
then
then it stays at
If i then go to settings and click on save it connects and works again. I can't get the ECONNTIMEOUT error i was getting initially... Oh wait, i managed to get it again by doing a couple of reboots on the network device with ser2net - Driver: read ETIMEDOUT
I'll try to get some logs now.... |
Here is log files... Here is what I see in the interface So when TCP is broken correctly (zwave2mqtt is notified connection is closed) it reconnects fine after a 3 second delay (that it appears in logs). But when the TCP times out I get ETIMEDOUT or ECONNRESET but the driver is not restarted / not reconnecting If it would help I can do a screen share session or provide a tunnel for my device ser2net so you can test it yourself... |
Wait, seems latest is from 4 days ago, i will try with a newer image. |
@vladbabii Use |
Testing with master from 18 hours ago ( https://hub.docker.com/layers/zwavejs/zwavejs2mqtt/master/images/sha256-141126158ae796f9a8610676079f9488cc1858bec1bb8150245b575593c129fc?context=explore ) Killing ser2net - works fine, 3 seconds reconnect. Rebooting device ... timeoutACK increases... Getting error while polling (timeout while waiting for ACK) then ECONNRESET. From docker logs:
After 2-3 minutes i don't see any mention of restarting/reconnecting in 3 seconds like when tcp breaks cleanly. |
I can start a jitsi meet session or google hangouts and give you screen share if it helps in any way Log files from master test |
In zwavejs2mqtt log
in zwavejs log
|
Is this ok @robertsLando ? |
@vladbabii Like I said above I think that this whould be handled on zwave-js side as I already listen for driver errors and restart driver when the error match a driver failed error, I think that @AlCalzone should just add the correct error code to those errors: https://github.com/zwave-js/zwavejs2mqtt/blob/master/lib/ZwaveClient.ts#L2235 |
@AlCalzone can i help in some other way than provide logs? |
I think that is enough, thanks. The error causes the driver to be destroyed, but zwavejs2mqtt has no means of detecting that. |
@AlCalzone @robertsLando please let me know when i can test with any docker image to validate the fix. Also, if you have any other bugs that need docker testing please let me know, i have a couple zwave devices, zwave sticks and other zwave-related things. |
As soon as @AlCalzone makes a new release I will update my repo |
You can do it now: |
Then if it fails i see the retry
And then it seems to be working fine. I'll do some testing over the next 1-2 hours to make sure it's stable |
Thanks - let me know if it stays that way, then I can merge. |
@robertsLando I see this topic/message in mqtt
But i don't see any message published without any error when the connection works again... Is this intended behaviour ? |
The events you see there are all the events coming from zwave-js, I don't emit any event on connection successfull if that's the question. BTW you see that after a reconnect zwave-js will send the all nodes readty event, you can use that as event to know when everything is setted up correctly |
What i'm trying to do is to write a script to automate breaking zwave tcp connection (by using relays with power source of the device and of a network switch or by running shell commands to restart ser2net) but i don't see an obvious way of detecting state from mqtt topics right now... what should i use? I want to make the script and leave it running for 1-2 hours without my intervention so i can get some statistics at the end... edit1 |
All the mqtt retained topics are theres
so currently i see no easy way to get a status from mqtt. If the mqtt client is not connected when the reconnection happens it does not see that at all. I would expect to see a retained topic with will set up on it so on a bad disconnect mqtt would change value to something to indicate 'this zwave2mqtt is not online/working right now'... |
Seems /health endpoint gives correct state of service ( 200/500 status). |
I was going to answer to you that BTW I still don't get the problem here, when the driver fails it emits the driver error, then it connects again and should emit driver_ready event, couldn't you listen to this events on mqtt? |
The device status only gets updated when communicating with devices. |
Using only retain messages, a client can connect to mqtt at any time to read data. the messages of driver error and driver ready are not retained messages. So for stateful information about zwave actual status (not only device status) to be used, a clients would need to either poll /health, or have a second service poll /health and write it to a mqtt topic. This way an automation script that listens to a motion sensor being tripped and doing something should listen to both /health updates and motion updates. In this way it can device "zwave stack is in a bad state, don't try to do anything with information until health says it's ok" I can script my way around the information that's available now by polling /health every 250ms and dumping it into mqtt topic but it would be nicer to have /health exposed as mqtt message (with retained messages) and also unset it via last will if service goes down (docker issue, connection issue, zwave issue). this way i can get my automations to be aware of the real state of things regardless of when they connect to mqtt (some things might run for a brief time, some might run for months, but kubernetes/docker could also deploy them on different nodes so it would trigger reconnects). Or my use case is different than the intended use case of zwave2mqtt. I thought of it as "make available all information from zwave on mqtt so any mqtt client can read it and make valid decisions from (near) real-time data" |
Try to check zwave-js/zwave-js-ui#1838 you should see driverReady status now on |
Regarding the initial issue in the ticket: |
So build docker image with "driver-status" for z2m branch ?
There are 2 cases to cover
@robertsLando should we move the discussion in a different issue or on that merge request since it's kinda off topic here ? |
@vladbabii We can move to zwave-js/zwave-js-ui#1840 |
Checklist:
Build/Run method
Zwavejs2Mqtt version: 5.8.0
Z-Wave JS version: 8.4.1
Describe the bug
After reboot, the web interface shows
Driver: read ETIMEDOUT
and in MQTT I have (topic and value)
service/zwave_01/_EVENTS/ZWAVE_GATEWAY-zwave_01/driver/driver_error {"data":[{"errno":-110,"code":"ETIMEDOUT","syscall":"read"}]}
and it never works again unless i restart the zwave2mqtt docker image or i save setting again in the web interface
To Reproduce
Steps to reproduce the behavior:
service/zwave_01/_EVENTS/ZWAVE_GATEWAY-zwave_01/driver/driver_error {"data":[{"errno":-110,"code":"ETIMEDOUT","syscall":"read"}]}
Expected behavior
When zwave driver enter that specific error state i expect it to be restarted after some amount of time.
Additional context
How do i clean up my logs of any private data ? I'd like to post them here but i want to make sure i don't leak any keys or other information.
Thank you for your time! This is great software!
The text was updated successfully, but these errors were encountered: