You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature
I want to improve the observability of devices making their way into and out of the penalty box. Our standard info-level logs are a simple way to achieve this. The volume should be low, so I have no capacity concerns with this recommendation.
The problem I'm trying to help address is the automated monitoring around when we make changes that push firmware out to a large number of devices. These logs would let us generate metrics so we could end up with a graph over time of devices put in the box. They would also provide useful debugging context to other tools outside of Nerves Hub (assuming they can search the nerves hub logs) that can help highlight why a device is not receiving its expected firmware updates.
We currently have these types of connect/disconnect logs:
We could build off of this model and introduce additional events:
nerves_hub.devices.penaltybox.in
nerves_hub.devices.penaltybox.out
Additional context
Bonus points if we can include some reasoning in the logs (did someone manually select to move them in/out or was this an automatic action based on thresholds)
The text was updated successfully, but these errors were encountered:
I'm sorry for my delayed reply. I was pondering over this a bit and had an idea.
I agree we could log more, and that is a quick win.
I also think we should add some more telemetry, which is another quick win.
But the bigger idea I had was adding a 'key' to the Audit Logs table, which could allow us to show metrics based on audit log events across a product.
I need to play this out more. I essentially want to see more of this data in the UI so you can see spikes quickly, without having to hunt for this info. I'd also like to see some alerting too, most likely to Slack, so you can be warned of these issues as they start to appear.
Describe the feature
I want to improve the observability of devices making their way into and out of the penalty box. Our standard info-level logs are a simple way to achieve this. The volume should be low, so I have no capacity concerns with this recommendation.
The problem I'm trying to help address is the automated monitoring around when we make changes that push firmware out to a large number of devices. These logs would let us generate metrics so we could end up with a graph over time of devices put in the box. They would also provide useful debugging context to other tools outside of Nerves Hub (assuming they can search the nerves hub logs) that can help highlight why a device is not receiving its expected firmware updates.
We currently have these types of connect/disconnect logs:
We could build off of this model and introduce additional events:
nerves_hub.devices.penaltybox.in
nerves_hub.devices.penaltybox.out
Additional context
Bonus points if we can include some reasoning in the logs (did someone manually select to move them in/out or was this an automatic action based on thresholds)
The text was updated successfully, but these errors were encountered: