-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[node] Add additional sanitization checks to overseer #3427
Comments
This and everything under it is amazing. Looking forward to it |
Why do we try to work around the fundamental issue and not solve it properly? The fundamental issue here is that oneshot channels are used inside these messages and that we await them, blocking the entire subsystem. IMO oneshot channels should be forbidden in these messages or only allowed in very rare cases where we can proof that no dead lock ever happens. Instead of oneshot channels there could just be a closure that constructs a Yes that would complicate the design of some subsystems, but it would make it much easier to reason about the implementation of a subsystem and problematic code pieces. |
In principle you are correct. I also think that spawning additional tasks whenever there is a response channel could get the same effect with less impact. Note that the unbounded channel also solves potential congestion due to loops in our subsystem communication, which is something that is no easier with IMO we should get the observability up for oneshots before making grand changes - hence implementing #3825, #3824, and #3648 to get a top down view of our entire system and which oneshots are actually delayed, and then based on that make a decision what to do about it. |
A low impact change would be to just use unbounded channels for all subsystem communications. This problem exists now since more than 1 year and isn't solved. And the more we roll this out, the more we get reports of users nodes going down.
Not sure how this solves congestion, it just trades memory usage for execution time. However, as one subsystem probably doesn't send 100 messages to another susbsystem at once, more like it receives one, processes the message and sends a new message to other subsystem, the cognestion is probably not happening. |
#2962 lays the ground work of adding additional logic to the overseer, such that certain properties can be detected at compile time:
Chores:
AllSubsystems
and proc-macroAllSubsystemGen
#3773Remove thefocus()
method and streamline the incoming information flowbaggage
genericsSignificant enhancements:
graphviz
chart of the subsystem connection #3826unbounded
channel into each cycle, to prevent potential dead-locks until there is a more systematic approach with prioritization #5426The text was updated successfully, but these errors were encountered: