You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do we alarm on all error log. If not let's make sure we add alarm on this, either via Datadog parsing the error message or do the direct alarm (?) from code?
Good idea! I think we can do with an app alarm when a block submission is pending for more than 2 blocks (since we're aiming for next-block inclusion, but I'll make number configurable in case of too many false panics). Then Datadog can consume this app alarm similar to :ethereum_stalled_sync and friends.
Since we're losing this visibility because this PR is suppressing the exception, I'll add :block_submission_stalled alarm to this PR.
Queue observability size. Hell yeah!
TODOs:
Add a monitor/metrics to track pending chch block submissions (pending_block_submissions)
There's a similar naming :pending_block_queue_length but this is watcher_info's mechanism. Come up with better metrics names to separate the two.
Raises an alarm when pending_block_submissions > 2
Clears the alarm when pending_block_submissions == 0 (we shouldn't be expecting the queue to be around for too long)
The text was updated successfully, but these errors were encountered:
Spinning off from #1617 (comment)
TODOs:
pending_block_submissions
):pending_block_queue_length
but this is watcher_info's mechanism. Come up with better metrics names to separate the two.pending_block_submissions > 2
pending_block_submissions == 0
(we shouldn't be expecting the queue to be around for too long)The text was updated successfully, but these errors were encountered: