Replies: 7 comments 12 replies
-
Can you run I believe I do have a broken PLM that every once in a while sends the wrong group number that would cause my bathroom group to turn on randomnly. I replaced it years ago and haven't had any issues, I always meant to rig up a system to sniff the packets to see if I could figure out why this happened. |
Beta Was this translation helpful? Give feedback.
-
I think I have the same issue as originally described. Sometimes when I send a command from the modem (it could be a modem virtual scene or a device control) a bunch of random lights turn on. The devices turned on do not share the same modem group (also not the same group that is created in pair/link process). The number of impacted devices varies from instance to instance. I am able to increase the chance of this happening by sending something from the modem at the same time as some device induced traffic occurs. This makes me believe it is a race condition somewhere in the software/firmware/hardware stack . I did not spend the time to dissect where it happens. My hunch is that it could be the modem - since it appears both on a device command (turn switch on/off) as well as turn on/off a modem virtual scene. How the modem sends the scene command out is completely in the modem's firmware control. This hunch is also supported by krkeegan's observation that it went away with a new modem (but we can't buy new PLMs anymore #501). I have decreased the chance that this error is happening. I added a delay to Insteon message triggered automation. Example: bath room light on -> 10s -> turn bath fan on. |
Beta Was this translation helpful? Give feedback.
-
I am having the same or similar problem. For me, the consequences are painful because in addition to about ten lights and fans that come on there are 4 iolincs that 1) open my garage doors, 2) trigger a home security alarm with remote monitoring and 3) trigger a remote start of my car. ( 1 and 2 are especially harmful to my marriage.) I have had insteon-mqtt up and running for more than a year and had not had this problem for about the first 8 months. I'm in the process of looking for clues in 13 months of logs. One common element evident in the logs is a flood of "No read handler found" messages for 0x56 All link fail (mostly) and 0x50 (clean up), but so far I think the real tell-tale is the 0x56 messages. I'm not saying these are the cause, just the log evidence the problem occurred. I'm in the process of looking for any common actions in the logs that lead up to this flurry of "No read handler" messages. Two requests for help while I try to troubleshoot this: 1) can someone tell me what in the normal function of insteon-mqtt triggers "no read hander" messages for All link messages and 2) if anyone has any other ideas on what I should be looking for as I comb through the logs preceding these events, please let me know. I've run some statistics on the # of "no read handler" messages and there has been a dramatic increase in the number of these in the past 4 months or so of the 13+ months I've had this running. Here's the log during a minute when one of these events occurred: 2023-06-07 01:08:38.872 INFO Broadcast: Handling all link broadcast for 27.f0.83 'ms_i_tvroom' |
Beta Was this translation helpful? Give feedback.
-
@jb1228 Thanks for replying. I'll check out the HomeAssistant official
integration link you provided. Right now, I'm using insteon-mqtt outside
of HomeAssistant.
…On Wed, Aug 2, 2023 at 7:11 PM jb1228 ***@***.***> wrote:
Unfortunately I cannot help much since I was never able to resolve the
issue using this integration.
Although, I did migrate to the official integration
<https://www.home-assistant.io/integrations/insteon/> *and have not seen
the problem since.*
However, shortly after migrating, I *also* decided to replace my I/O Link
(garage door opener) and a few problematic dimmer modules with non-Insteon
equivalents. So I suppose it is possible the problem still exists, and I
simply haven't noticed it.
—
Reply to this email directly, view it on GitHub
<#498 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACSR2ELE2RJBQPIP6GTICK3XTLNDXANCNFSM5QXPWPPQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I have battled with this problem for a while. I might have a work around. I could fairly regularly produce the issue by turning off a large virtual scene (I believe scene 21 with 16 members) followed by dedicated on commands to a few other devices (all as part of an automation). Then, some other random devices suddenly turn on. I get a flood of "no read hander" messages. I believe those are ACKs from each device that got a spurious message and subsequently changed state (I spot checked a few). I observed that insteon-mqtt does not realize the unintended state change. Since insteon-mqtt does not know about the state change, I believe the outgoing spurious message is probably something like a scene command. I believe the race condition happens when a message from a device arrives at the modem (e.g., the ACKs for the original scene command), when at the same time the modem tries to send a command out (e.g. my individual on commands). Observation: all devices are linked as responders to the virtual scene 1. This is as result of either pair or join, I forgot which. Nonetheless, we do not use this modem virtual group for controlling devices typically. My hypothesis is that in the failure event the modem sends something similar to a virtual scene 1 message. The flood of ACKs coming back in would be an indication of this as they all refer to group 1. Work around: Delete each device's CTRL entry from modem's DB . E.g., by:
Note, this does not touch the device DB. The device still has a corresponding RESP entry in its table. The device still reacts on commands from the modem - I did not notice any difference in controlling the devices or getting responses. Since I did this >6 months ago, I have not seen these massive random events anymore. I have even removed the delays in automations that send command(s) in reaction to a particular state update (i.e. received message). And still the system is stable. I did not change any hardware (no new modem, no changed devices, no change in power line topology). So it is a purely SW change that made the trick for me. Disclaimer: My work around may not work for you, and I don't know if deleting group 1 CTRL has other side effects. It is plausible that the work around depends on how the modem stores the DB (table). If that would be true, then the sequence on how the modem is programmed may make a difference. I programmed my modem many years ago when moving from one modem to another: I believe: I had a factory reset modem, did first the pair / join on all (or almost all) devices, and then programmed all the links. Detail background: My assumption is that a race condition (simultaneous send and receive) causes the modem FW to send message(s) to the virtual scene 1 (group 1). By deleting the group 1 CTRL entries, the race condition probably still happens, but the effect of sending the spurious message(s) does not happen anymore. Please take all this with a grain of salt, I am purely speculating from observing black box behavior. This seems to be a known issue and documented on the Universal Devices (ISY) wiki, and appears in blog posts, e.g., 1, 2. @EnGamma if you want to give it a shot. You could validate that indeed the devices with the "No read handler found" messages change state. Pick one of them and apply the work around. Then, trigger the situation and see if the device with work around remains unchanged while others change in the error situation. |
Beta Was this translation helpful? Give feedback.
-
This is awesome news. Thanks so much for reporting your findings. I was so hesitant in publishing as I thought my workaround might be too specific to my particular setup. Your (partial) success is extremely encouraging. Thanks also for the additional description and guidance! It is curious to hear that you still have a blizzard of spurious messages without a handler. Having a few is OK. if multiple responses to one request are received, the first will be handled and all others get the warning. But this certainly would not be a "blizzard of activity". Some questions to develop my thinking about the error behavior a little more: Do the "no message handler" messages come only from devices that still have a CNTRL entry on the modem side? Or do they also come from devices that you applied the work around to. I did remove them for all devices in my network (apparently in 12/2022). I just checked my current logs which go back to 5/14/2023 (260k lines). I am surprised: they do not have any unaccounted "no message handler" messages! To make log checking easier, here is my python script detect_spurious.txt (I renamed to .txt to attach to this issue). It ignores "no message handler" messages for 1 second after a command was sent to the same device.
|
Beta Was this translation helpful? Give feedback.
-
Your filter script output is very different from how mine looked. I had few events with a very large number of different devices. You have many events with relatively few devices. My guess is that there is something else going on in your system, which is just flagged by my script. But I would assume there is a different underlying reason. It is good that my work around helped you for the spurious events. However, I don't think it will avoid the secondary (remaining) problem. Can you share your original log file? Maybe I can spot something -- albeit I am no expert. PS: The original problem seems to be a known issue and documented on the Universal Devices (ISY) wiki, and appears in blog posts 1,2. I also added this to my earlier post |
Beta Was this translation helpful? Give feedback.
-
Some kind of event is triggering random Insteon devices to turn on (or off), and I have been pulling my hair out for months trying to narrow it down. This affects all my wired devices (plugin modules, wall dimmers/switches, I/O link, etc.).
It seems to happen when some other Insteon device is controlled. I originally thought my motion sensors were triggering it, but after disabling them (and the devices they controlled) it was still happening... although less so.
This is what I know so far:
This is a sample of the messages I see in the log for each device when it happens:
I can provide full logs with DEBUG enabled if it helps.
Does anyone have any suggestions on what might be causing this? It is driving me crazy! 😖
Beta Was this translation helpful? Give feedback.
All reactions