[node] Add additional sanitization checks to overseer #3427

drahnr · 2021-07-07T07:48:49Z

#2962 lays the ground work of adding additional logic to the overseer, such that certain properties can be detected at compile time:

Chores:

[overseer] remove AllSubsystems and proc-macro AllSubsystemGen #3773
create a more rigid builder pattern, that fails at compile time, rather than runtime (current) #3772
- overseer: add closure as subsystem init method #3771
~~Remove the focus() method and streamline the incoming information flow~~
Handle where clauses for baggage generics

Significant enhancements:

[overseer] annotate subsystems with outgoing messages #3774

The text was updated successfully, but these errors were encountered:

rphmeier · 2021-07-09T22:36:48Z

annotated subsystems which messages are sent by a subsystem

This and everything under it is amazing. Looking forward to it

bkchr · 2021-09-09T10:10:52Z

* inject a `unbounded` channel into each cycle, to prevent potential dead-locks until there is a more systematic approach with prioritization

Why do we try to work around the fundamental issue and not solve it properly? The fundamental issue here is that oneshot channels are used inside these messages and that we await them, blocking the entire subsystem. IMO oneshot channels should be forbidden in these messages or only allowed in very rare cases where we can proof that no dead lock ever happens.

Instead of oneshot channels there could just be a closure that constructs a AllMessages message and this message would be dispatched again.

Yes that would complicate the design of some subsystems, but it would make it much easier to reason about the implementation of a subsystem and problematic code pieces.

drahnr · 2021-09-09T13:25:01Z

* inject a `unbounded` channel into each cycle, to prevent potential dead-locks until there is a more systematic approach with prioritization
Why do we try to work around the fundamental issue and not solve it properly? The fundamental issue here is that oneshot channels are used inside these messages and that we await them, blocking the entire subsystem. IMO oneshot channels should be forbidden in these messages or only allowed in very rare cases where we can proof that no dead lock ever happens.

Instead of oneshot channels there could just be a closure that constructs a AllMessages message and this message would be dispatched again.

Yes that would complicate the design of some subsystems, but it would make it much easier to reason about the implementation of a subsystem and problematic code pieces.

In principle you are correct.
The above was meant as low-impact change, and that's the precise advantage of it - not to re-architect half of our subsystems logic.

I also think that spawning additional tasks whenever there is a response channel could get the same effect with less impact. Note that the unbounded channel also solves potential congestion due to loops in our subsystem communication, which is something that is no easier with AllMessages-based request-response. I am happy to discuss this further.

IMO we should get the observability up for oneshots before making grand changes - hence implementing #3825, #3824, and #3648 to get a top down view of our entire system and which oneshots are actually delayed, and then based on that make a decision what to do about it.

bkchr · 2021-09-10T10:53:31Z

The above was meant as low-impact change, and that's the precise advantage of it - not to re-architect half of our subsystems logic.

A low impact change would be to just use unbounded channels for all subsystem communications. This problem exists now since more than 1 year and isn't solved. And the more we roll this out, the more we get reports of users nodes going down.

Note that the unbounded channel also solves potential congestion due to loops in our subsystem communication, which is something that is no easier with AllMessages-based request-response.

Not sure how this solves congestion, it just trades memory usage for execution time. However, as one subsystem probably doesn't send 100 messages to another susbsystem at once, more like it receives one, processes the message and sends a new message to other subsystem, the cognestion is probably not happening.

drahnr added the J0-enhancement An additional feature request. label Jul 7, 2021

drahnr self-assigned this Jul 7, 2021

drahnr mentioned this issue Jul 7, 2021

refactor overseer into proc-macro based pattern #2962

Merged

4 tasks

drahnr changed the title ~~Add additional sanitization checks to overseer~~ [node] Add additional sanitization checks to overseer Jul 7, 2021

drahnr mentioned this issue Jul 9, 2021

remove the overseer trait bounds #3454

Closed

drahnr mentioned this issue Jul 15, 2021

enable disputes #3478

Merged

2 tasks

drahnr mentioned this issue Oct 11, 2021

Building warnings #4042

Closed

drahnr mentioned this issue Apr 16, 2022

Overseer warning due to unused unbounded channels #5333

Closed

drahnr mentioned this issue Oct 3, 2022

stage 2 enhancements paritytech/orchestra#7

Open

3 tasks

sandreim mentioned this issue Oct 3, 2022

Handle where clauses for baggage generics paritytech/orchestra#6

Open

sandreim closed this as completed Jun 9, 2022

ordian added the T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. label Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[node] Add additional sanitization checks to overseer #3427

[node] Add additional sanitization checks to overseer #3427

drahnr commented Jul 7, 2021 •

edited by sandreim

Loading

rphmeier commented Jul 9, 2021

bkchr commented Sep 9, 2021

drahnr commented Sep 9, 2021

bkchr commented Sep 10, 2021

[node] Add additional sanitization checks to overseer #3427

[node] Add additional sanitization checks to overseer #3427

Comments

drahnr commented Jul 7, 2021 • edited by sandreim Loading

rphmeier commented Jul 9, 2021

bkchr commented Sep 9, 2021

drahnr commented Sep 9, 2021

bkchr commented Sep 10, 2021

drahnr commented Jul 7, 2021 •

edited by sandreim

Loading