Sleeping node is marked as dead #1578

snagytx · 2021-02-03T21:47:09Z

Describe the bug

In 6.1.1(zwavejs2mqtt v1.0.4) AEON Labs - Multisensor 6 (Device ID: 134-100-258 (0x0086-0x0064-0x0102) when docker container would start and the device would be sleeping it would be loaded from cache and marked as RestartFromCache. In the latest it seems that the device is marked as dead until the device sends something.

21:07:35.825 CNTRLR   [Node 019] trying to load device config
21:07:35.850 CNTRLR   [Node 019] device config loaded
...
21:07:36.055 CNTRLR   [Node 019] Beginning interview - last completed stage: RestartFromCache
21:07:36.056 CNTRLR » [Node 019] pinging the node...
...
21:07:36.421 SERIAL » 0x010800131301002506d5                                              (10 bytes)
21:07:36.422 DRIVER » [Node 019] [REQ] [SendData]
                      │ transmit options: 0x25
                      │ callback id:      6
                      └─[NoOperationCC]
...
21:07:43.419 CNTRLR   [Node 019] Node 19 did not respond after 1 attempts, it is presumed dead
21:07:43.425 CNTRLR   [Node 019] The node is dead.
..
21:07:43.445 CNTRLR   [Node 019] ping failed: Failed to send the command after 1 attempts (Status No
                      Ack)
21:07:43.454 CNTRLR   [Node 019] ManufacturerSpecificCC: doing a partial interview...
21:07:43.455 CNTRLR   [Node 019] [~] [Manufacturer Specific] interviewComple [Endpoint 0] [internal]
                      te: true => true
...
21:07:43.482 CNTRLR   [Node 019] VersionCC: doing a partial interview...
21:07:43.483 CNTRLR   [Node 019] [~] [Version] interviewComplete: true => tr [Endpoint 0] [internal]
                      ue
21:07:43.484 CNTRLR   [Node 019] trying to load device config
21:07:43.503 CNTRLR   [Node 019] device config loaded
21:07:43.504 CNTRLR   [Node 019] ZWavePlusCC: doing a partial interview...
21:07:43.504 CNTRLR   [Node 019] [~] [Z-Wave Plus Info] interviewComplete: t [Endpoint 0] [internal]
                      rue => true
21:07:43.506 CNTRLR   [Node 019] BatteryCC: doing a partial interview...
21:07:43.511 CNTRLR » [Node 019] querying battery status...
21:07:43.512 CNTRLR » [Node 019] pinging the node...
...
21:07:46.262 DRIVER » [Node 019] [REQ] [SendData]
                      │ transmit options: 0x25
                      │ callback id:      9
                      └─[NoOperationCC]
...
21:07:53.185 CNTRLR   [Node 019] Node 19 did not respond after 1 attempts, it is presumed dead
...
21:07:53.205 CNTRLR   [Node 019] ping failed: Failed to send the command after 1 attempts (Status No
                      Ack)
21:07:53.206 CNTRLR   [Node 019] Interview attempt (1/5) failed, node is dead.
...

21:15:58.798 SERIAL « 0x010e00040013063105012a0316cd0027                                  (16 bytes)
21:15:58.805 CNTRLR   [Node 019] [Multilevel Sensor] Air temperature: metadata updated  [Endpoint 0]
21:15:58.809 CNTRLR   [Node 019] [~] [Multilevel Sensor] Air temperature: 79.3 => 79    [Endpoint 0]
21:15:58.811 SERIAL » [ACK]                                                                   (0x06)
21:15:58.817 DRIVER « [Node 019] [REQ] [ApplicationCommand]
                      └─[MultilevelSensorCCReport]
                          type:  Air temperature
                          scale: Fahrenheit
                          value: 79
21:15:58.829 CNTRLR   [Node 019] The node is now alive.
21:15:58.830 CNTRLR   [Node 019] Beginning interview - last completed stage: RestartFromCache
21:15:58.831 CNTRLR » [Node 019] pinging the node...
21:15:59.008 CNTRLR   [Node 019] The node is ready to be used

Device information

Which device(s) is/are affected (make/model)?
What are the node IDs?Node 019

Last Known Working Configuration

New device
Previously working device (node-zwave-js)
- Which library version/docker image/adapter version?
- Have you made any recent configuration changes to the device? Describe.
Previously working device (other platform)
- Which platform?
- Have you made any recent configuration changes to the device? Describe.

Installation information
How did you install node-zwave-js?

zwavejs2mqtt (latest) docker image
zwavejs2mqtt (dev) docker image
ioBroker.zwave2 adapter
Pkg
Manual Docker build
- node-zwave-js branch: fix-root-to-endpoint-mapping
- zwavejs2mqtt branch: HEAD detached at v1.0.4
Manually built (as described in the docs)
Other:

The text was updated successfully, but these errors were encountered:

AlCalzone · 2021-02-03T21:50:57Z

Before we're hunting ghosts, please try 6.1.2 too, although I guess the relevant change is in there too.

How is the multisensor powered? Battery or USB? Did you switch that since inclusion?

snagytx · 2021-02-03T22:40:37Z

I built a new image:

node-zwave-js
HEAD detached at v6.1.2
nothing to commit, working tree clean

zwavejs2mqtt:
HEAD detached at v1.0.4
nothing to commit, working tree clean

The issue persists - the sleeping node starts as dead.

AlCalzone · 2021-02-03T22:43:53Z

How is the multisensor powered? Battery or USB? Did you switch that since inclusion?

snagytx · 2021-02-03T22:45:55Z

Forgot to mention that, it's battery and always was battery.

Anything in the cache I should be looking for, for this device?

snagytx · 2021-02-03T22:52:06Z

16:05:01.065 CNTRLR « [Node 019] received response for protocol info:
basic device class: Routing Slave
generic device class: Multilevel Sensor
specific device class: Routing Multilevel Sensor
is a listening device: true
is frequent listening: false
is a routing device: true
is a secure device: unknown
is a beaming device: true
maximum baud rate: 40000 kbps
version: 4

AlCalzone · 2021-02-03T23:04:09Z

is a listening device: true

That looks different. If the node was included with battery power, it should not say that it is listening here.
Unfortunately I have no battery to test this on mine.

Please try excluding it, then reset it and include it while definitely not on USB.

snagytx · 2021-02-03T23:57:31Z

I know this is not something "supported" but I manually updated the cache json file and marked the node as "isListening: false" and restarted the container. This is my production environment and I have automations configured bases on the motion sensor of this device. It would be really impactful to exclude it and re-include it - I'll have to change all my automations to use the new node ID.

All the fields in "Control Panel" are empty, except "Status" which is "Unknown", "Interview stage" which is "Complete".

And remains like that until the end of the scan.

2021-02-03 23:47:43.196 INFO ZWAVE: Node 19 ready: AEON Labs - ZW100 (Multisensor 6)
2021-02-03 23:47:43.197 INFO ZWAVE: Controller status: Scan completed
2021-02-03 23:47:43.198 INFO ZWAVE: Network scan complete. Found: 9 nodes

I have attached the zwavejs log file
zwavejs_1.log

snagytx · 2021-02-04T01:18:11Z

I excluded and re-included the device, as node 36, same behavior as above, all the fields in "Control Panel" are empty, except "Status" which is "Unknown" and "Interview stage" which is "Complete".

For some reason it seems that the node is marked asleep at the very end of the scanning.

I have attached the log.

zwavejs_1.log

AlCalzone · 2021-02-07T16:43:19Z

I find this in both logs:

01:12:40.639 DRIVER » [Node 036] [REQ] [SendData]
                      │ transmit options: 0x25
                      │ callback id:      124
                      └─[BatteryCCGet]
...
01:12:47.544 DRIVER « [REQ] [SendData]
                        callback id:     124
                        transmit status: NoAck

which means that the node did not acknowledge the receipt of the message we sent it. If the node is mains-powered, that often means there is some connectivity issue or that the node is dead.

If it is battery-powered, this is perfectly normal happens while it is asleep.

Unfortunately, both your logs don't include the inclusion and the initial interview, so I have to assume that the device is still operating in USB mode. Did you factory-reset it after exclusion?

snagytx · 2021-02-07T17:34:58Z

Yes, I did a factory reset after excluding the node. I did NOT use any USB cables for this inclusion. I just unscrew it from the wall mount, brought it closer to the computer, from where I can access zwavejs control panel, did the exclusion, did a factory reset, did the inclusion.

AlCalzone · 2021-02-07T17:40:26Z

Okay, can you re-interview the device (refreshInfo) and send me a complete log of that?

snagytx · 2021-02-07T21:22:54Z

I just dod a node re-interview, see attached.
Node036.log

AlCalzone · 2021-02-07T21:24:18Z

There's a bit missing before the start of the file :(

snagytx · 2021-02-07T21:37:54Z

is this better?

Node036.log

AlCalzone · 2021-02-07T21:41:59Z

Interestingly, now it shows that it is not a listening device (i.e. sleeps) and it was properly detected:

18:51:33.906 CNTRLR « [Node 036] received response for protocol info:
                      basic device class:    Routing Slave
                      generic device class:  Multilevel Sensor
                      specific device class: Routing Multilevel Sensor
                      is a listening device: false <== HERE!
                      is frequent listening: false
                      is a routing device:   true
                      is a secure device:    unknown
                      is a beaming device:   true
                      maximum baud rate:     40000 kbps
                      version:               4
18:51:33.909 CNTRLR   [Node 036] The node is asleep.

I have the same device USB powered and it reads true in the same line.

About half an hour later the device actually wakes up and the interview goes through:

19:26:58.675 DRIVER « [Node 036] [REQ] [ApplicationCommand]
                      └─[WakeUpCCWakeUpNotification]
19:26:58.677 CNTRLR « [Node 036] received wakeup notification
19:26:58.683 CNTRLR   [Node 036] The node is now awake.

So... all good?

snagytx · 2021-02-07T22:24:13Z

Yes, the main issue here is that on zwavejs2mqtt container reset the behavior of the device changes in the last releases. It used to me marked at sleeping and restored from cache, right at the beginning of the start sequence. Now it’s marked and unknown, or something similar - as mentioned a few posts back, until the end the the start sequence. In large networks it can take 10-15 minutes to complete the start sequence. This means that for 10-15 minutes the device might not be usable. Since it’s not initialized form cache until the end.

AlCalzone · 2021-02-07T22:38:52Z

Please check that again. I'm guessing that was some weird interaction with the device being asleep but the controller thinking that it cannot sleep. Especially since the listening flag has changed from your first log to the last one.

Since 6.1.2, sleeping devices are almost immediately marked as asleep (and ready if they have been interviewed before).

snagytx · 2021-02-08T14:33:44Z

Pease see attached the screenshot of the Control Panel, you can see how it starts.

Node 21 and 33 take the most to complete, 21 completed first, then 33. The moment 33 completed then 36 switched to Sleep and RestartFromCache.

zwavejs_1.log

snagytx · 2021-02-08T14:41:32Z

Forgot to mention that I'm using the latest zwavejs2mqtt:dev, revision b9d066b9f069ae81b4fab37d5bdb95fde8f0e7bd

zwave-js 6.1.3

AlCalzone · 2021-02-08T14:51:05Z

Did you intentionally restart the driver at 14:20:54.151?

snagytx · 2021-02-08T14:55:12Z

Yes, I did because of:

14:20:45.246 DRIVER Restoring the network from cache failed: SyntaxError: Unexpected end of JSON input

Sometime when I stop the docker container the cache file, f4c9e959.json, get's cleared - size 0b, so I have a process to backup the json file before I stop the container, and restore the cache if f4c9e959.json is empty before I start again - sometime I forget to check.

AlCalzone · 2021-02-08T15:03:06Z

Whoa that shouldn't happen. @robertsLando any idea how we could avoid stopping the container before zwave-js has shut down and saved the cache?

That could also explain why the interview first starts with stage "None" and then suddenly after the restart it is at RestartFromCache. Your cache files (our info about the node) get out of sync and confuse the hell out of zwave-js.

I wouldn't be surprised if the f4c9e959.json says the interview was completed, but f4c9e959.values.jsonl doesn't have some of the necessary info it should have at that point.

AlCalzone · 2021-02-08T15:11:34Z

@snagytx about the delay: While sleeping nodes no longer delay the interview of non-sleeping nodes, it seems that their status is not correctly propagated to applications after a restart (which causes zwavejs2mqtt to not display the correct info). I'll raise an issue to track that part.

snagytx · 2021-02-08T15:11:55Z

This is what I backup:

cp -p store/f4c9e959.* ${BK_FINAL}
cp -p store/customDevices.json ${BK_FINAL}
cp -p store/settings.json ${BK_FINAL}
cp -p store/nodes.json ${BK_FINAL}

I do the backup, before I stop the container, but I don't have to restore the files every time, even when I don't need to restore the file the outcome in the UI is the same - this is why I didn't mention it before.

AlCalzone · 2021-02-08T15:14:34Z

cp -p store/f4c9e959.* ${BK_FINAL}

not sure - does cp overwrite by default?

Because I am seeing this in the log:

before restart:
14:20:45.671 CNTRLR   [Node 036] Interview stage completed: ProtocolInfo

after restart:
14:21:17.462 CNTRLR   [Node 036] Beginning interview - last completed stage: RestartFromCache

Which means the stage in the cache file was "Completed", not "ProtocolInfo"

snagytx · 2021-02-08T15:15:42Z

I did a restore from backup between 14:20:45.671 and 14:21:17.46

AlCalzone · 2021-02-08T15:30:19Z

Alright, then some of what I'm seeing starts to make sense. Thinking out loud here:

After the restore, the status of the node was Complete. zwave-js starts and resets that to RestartFromCache, so it can do the queries it does on startup.

14:21:17.462 CNTRLR   [Node 036] Beginning interview - last completed stage: RestartFromCache

You don't see this change, since it is not propagated to the UI. @robertsLando when/how do you update the node's interview stage in the UI?

The device is not immediately marked as ready although it should be (#1657), so zwavejs2mqtt does not show the information we have so far.

A few minutes later, the node is finally marked as asleep, causing it to also be marked as ready.

14:30:32.534 CNTRLR   [Node 036] The node is asleep.

You now see the RestartFromCache stage (which is correct) until it wakes up and can be queried for its current values.

robertsLando · 2021-02-08T16:01:29Z

Whoa that shouldn't happen. @robertsLando any idea how we could avoid stopping the container before zwave-js has shut down and saved the cache?

We should listen for the container kill command with process.on and try to gracefully close it there

AlCalzone · 2021-02-08T16:13:26Z

Should I open an issue in zjs2m?

robertsLando · 2021-02-08T16:55:20Z

Ok! Not 100% sure it wil work but we could try

snagytx added the bug Something isn't working label Feb 3, 2021

AlCalzone added waiting for info ⏳ and removed bug Something isn't working labels Feb 7, 2021

AlCalzone mentioned this issue Feb 8, 2021

The status of sleeping nodes is not correctly propagated after restarting from cache #1657

Closed

This was referenced Feb 8, 2021

Avoid stopping the container before zwave-js has shut down and saved the cache zwave-js/zwave-js-ui#506

Closed

Aeotec Multisensor 6 incorrectly becomes Asleep when USB powered #1504

Closed

AlCalzone added bug Something isn't working and removed waiting for info ⏳ labels Feb 11, 2021

AlCalzone mentioned this issue Feb 11, 2021

fix(node): mark sleeping nodes as ready immediately on cached restart #1714

Merged

AlCalzone mentioned this issue Feb 12, 2021

[bug] Battery devices shows as powered in mesh zwave-js/zwave-js-ui#546

Closed

4 tasks

AlCalzone closed this as completed in #1714 Feb 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sleeping node is marked as dead #1578

Sleeping node is marked as dead #1578

snagytx commented Feb 3, 2021

AlCalzone commented Feb 3, 2021

snagytx commented Feb 3, 2021

AlCalzone commented Feb 3, 2021

snagytx commented Feb 3, 2021

snagytx commented Feb 3, 2021

AlCalzone commented Feb 3, 2021 •

edited

Loading

snagytx commented Feb 3, 2021

snagytx commented Feb 4, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021 •

edited

Loading

snagytx commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

robertsLando commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

robertsLando commented Feb 8, 2021

Sleeping node is marked as dead #1578

Sleeping node is marked as dead #1578

Comments

snagytx commented Feb 3, 2021

AlCalzone commented Feb 3, 2021

snagytx commented Feb 3, 2021

AlCalzone commented Feb 3, 2021

snagytx commented Feb 3, 2021

snagytx commented Feb 3, 2021

AlCalzone commented Feb 3, 2021 • edited Loading

snagytx commented Feb 3, 2021

snagytx commented Feb 4, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021

snagytx commented Feb 7, 2021

AlCalzone commented Feb 7, 2021 • edited Loading

snagytx commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

snagytx commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

robertsLando commented Feb 8, 2021

AlCalzone commented Feb 8, 2021

robertsLando commented Feb 8, 2021

AlCalzone commented Feb 3, 2021 •

edited

Loading

AlCalzone commented Feb 7, 2021 •

edited

Loading