Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

feat: allow subsystems to send only declared messages, generate graphviz #5314

Merged
merged 139 commits into from
May 12, 2022

Conversation

drahnr
Copy link
Contributor

@drahnr drahnr commented Apr 12, 2022

Specializes the subsystems with a generated type ${SubsystemName}OutgoingMessages which is use for the sender instead of the generated AllMessages types. This way there is a compile time guarantee per subsystem that only the intended messages can be sent

Knowing the enabled declared messages to be sent also allows to create a directed graph from the relationship between subsystems

closes #3774
closes #3826

@drahnr drahnr changed the title refactor: only allow to send declared messages refactor: allow subsystems to send only declared messages Apr 12, 2022
@github-actions github-actions bot added the A0-please_review Pull request needs code review. label Apr 12, 2022
@drahnr drahnr added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Apr 12, 2022
@drahnr drahnr marked this pull request as draft April 12, 2022 18:16
@drahnr drahnr marked this pull request as draft April 12, 2022 18:16
@drahnr drahnr force-pushed the overseer-janitor branch 2 times, most recently from 9dbbe64 to a62b113 Compare April 20, 2022 16:44
@drahnr
Copy link
Contributor Author

drahnr commented Apr 21, 2022

This already generates the desired graphs but still needs some attention to details and make it a bit more ergnomic

  • code cleanup and restructure
  • grunt work to make all subsystems work as anticipated
  • remove all convenience wrappers which actually only made it worse
  • don't leak generated data types with generated names the 🚂 has left the station
digraph {
    0 [ label="CandidateValidation"]
    1 [ label="PvfChecker"]
    2 [ label="CandidateBacking"]
    3 [ label="StatementDistribution"]
    4 [ label="AvailabilityDistribution"]
    5 [ label="AvailabilityRecovery"]
    6 [ label="BitfieldSigning"]
    7 [ label="BitfieldDistribution"]
    8 [ label="Provisioner"]
    9 [ label="RuntimeApi"]
    10 [ label="AvailabilityStore"]
    11 [ label="NetworkBridge"]
    12 [ label="ChainApi"]
    13 [ label="CollationGeneration"]
    14 [ label="CollatorProtocol"]
    15 [ label="ApprovalDistribution"]
    16 [ label="ApprovalVoting"]
    17 [ label="GossipSupport"]
    18 [ label="DisputeCoordinator"]
    19 [ label="DisputeDistribution"]
    20 [ label="ChainSelection"]
    2 -> 14 [ label="CollatorProtocolMessage"]
    2 -> 3 [ label="StatementDistributionMessage"]
    2 -> 4 [ label="AvailabilityDistributionMessage"]
    2 -> 8 [ label="ProvisionerMessage"]
    2 -> 0 [ label="CandidateValidationMessage"]
// yet to be completed, but needs some grunt work
}

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I can't hardly wait to see the dependency graph 😀

node/overseer/src/lib.rs Show resolved Hide resolved
node/overseer/src/lib.rs Outdated Show resolved Hide resolved
@drahnr
Copy link
Contributor Author

drahnr commented Apr 21, 2022

Remaining issue:

Shared code consumes minimum viable trait bounds i.e. impl (SubsystemSender<X> + SubsystemSender<Y>) where X and Y are individual messages. For subsystems we generate wrapper types and commonly only handle them ((X|Y)⊂ enum FooOutgoingMessagesenum AllMessages) but the trait bounds are not enforced, since the type Sender : SubsystemSender<FooOutgoingMessages>; is only bounded by the generated per subsystem wrapper type.

Solution: Create per subsystem SubsystemContext traits, which would also reduce the boiler plate at the cost of one more generated type. Note that this only works as long as we don't use the SubsystemContext in the context of shared functionality that is used by multiple uses, which we shouldn't anyways.
To be investigated: Impact on the test helper code.

@drahnr drahnr changed the title refactor: allow subsystems to send only declared messages feat: allow subsystems to send only declared messages, generate graphviz Apr 21, 2022
@drahnr drahnr added the J0-enhancement An additional feature request. label Apr 21, 2022
@drahnr drahnr force-pushed the overseer-janitor branch from eb9cfe3 to db7dc7d Compare April 22, 2022 14:43
@drahnr
Copy link
Contributor Author

drahnr commented Apr 22, 2022

TODOs:

  • Job based subsystems don't quite work yet, some SubsystemSender<OutgoingM> bounds are missing to work smoothly, there could be an ok~ish workaround by using JobSender and some Into<overseer::${Generic}OutgoingMessage>.
  • Testing needs some more thought, I left this out intentionally since it wants to override the type Sender with TestSender, which is currently hardcoded by ${Generic}ContextTrait - leaving this out would cause some churn but something to consider

@rphmeier
Copy link
Contributor

rphmeier commented Apr 22, 2022

Job based subsystems don't quite work yet, some SubsystemSender bounds are missing to work smoothly, there could be an ok~ish workaround by using JobSender and some Intooverseer::${Generic}OutgoingMessage.

#4259 is needed (at least for backing, provisioner, so the rest is easy after that) for async backing. So either sidestepping or ignoring Job based subsystems would be good.

Coincidentally, I started working on #5377 today, because it blocks the statement-distribution / candidate-backing refactorings for async backing.

@drahnr drahnr mentioned this pull request Apr 25, 2022
@drahnr drahnr force-pushed the overseer-janitor branch from 5a0ae03 to 499292a Compare April 27, 2022 11:03
@drahnr
Copy link
Contributor Author

drahnr commented Apr 29, 2022

Given the annotation pain, there will be

  • #[subsystem(error=Yikes)] instead of requiring all trait bounds that are not relevant and internal to the generator to work, i.e. SubsystemContext and SubsystemSender and also simplify the trait bound annotations and their changes moving forward
  • code cleanup and restructure - a bunch a names are not used consistently and the code became a bit more fractured as more rustc trait bound resolution hurdles appeared
  • get rid of the dispatch_iter and related annotations
  • remove unused_msg in generated match arms and error at compile time sidestepped by annotating the match arm with #[allow(unreachable_patterns)] and yielding a runtime warning
  • move away from a regular ImplItem definition to something that references the generic as used in the #[overlord] tagged struct with named fields
  • make it work beyond the cargo run -p polkadot-overseer-gen --example duo, for struct Overseer
  • SubsystemContext { type Message = (); ..} is required for the test cases to work

and the user will have to use generated types and "special" names, ${Subsystem}OutgoingMessages, ${Subsystem}ContextTrait, ${Subsystem}SenderTrait - looking into creating a marker type per subsystem and a helper trait bound might proof worthwhille for the forward path.

@drahnr drahnr self-assigned this Apr 29, 2022
@drahnr drahnr force-pushed the overseer-janitor branch 2 times, most recently from 97ed4b5 to f198355 Compare April 29, 2022 15:18
@drahnr drahnr marked this pull request as ready for review April 29, 2022 15:44
@drahnr drahnr force-pushed the overseer-janitor branch 3 times, most recently from b85757c to 980dd0f Compare May 9, 2022 09:36
@drahnr drahnr force-pushed the overseer-janitor branch from 9e8853b to 4cde621 Compare May 9, 2022 11:49
@drahnr
Copy link
Contributor Author

drahnr commented May 9, 2022

Hints to reviewers:

  • the dispatch logic was previously generated, and is now hand rolled - this is our primary entry point for network messages and their dispatch to other subsystems, fn focus still exists but the overseer is now decoupled from it, which is another thing checked off the list. But with that refactor it's very important to double check all intended subsystems still receive their messages as anticipated, dispatch_iter is the thing that got removed.
  • Each subsystem has a consuming message which is often referred to as generic M (no change on that, is as before), but now we have trait AssociateOutgoing { type OutgoingMessages = ..; } which defines an outgoing helper enum that is generated with an ident constructed as ${Subsystem}OutgoingMessages where ${Subsystem} is the subsystem identifier as used in the overseer declaration. ${Subsystem}OutgoingMessages is used throughout everywhere to constrain the outgoing messages (commonly referred to as OutgoingMessage bounded by ${Subsystem}OutgoingMessages: From<OutgoingMessage> or ::OutgoingMessages: From`. This allows the construction of the graph and compile time verification.
  • ${Subsystem}SenderTrait and ${Subsystem}ContextTrait are accumulation traits or wrapper traits, that combine over all annotated M or OutgoingMessages from the overseer declaration or their respective outgoing types. This is usage convenience and assures consistency within a subsystem while also maintaining a single source of truth for which messages can be sent by a subsystem. Note that this is sidestepped for the test subsystem, which may consume AllMessages, the global message wrapper type.
  • Job-based subsystems, being on their way out, are patched, but they now are generic over the Sender, which is leaked up to the point of the OverseerGen. This is not pretty, not good, but given they are marked for erasure, good enough.

Review recommendation:

There are two examples: duo and solo (use cargo run -p polkadot-overseer --example duo) , looking into the expanded file (enabled expand feature on the overseer) and investigating that rather than the source, might yield more insight and understanding than the proc-macro source code.


Out of scope:

  • cycle detection, will come as a follow-up
  • unused outgoing messages detection
  • expand per overseer rather than per crate
  • unbounded annotation for outgoing messages, will be part of a follow up PR

@drahnr drahnr requested a review from vstakhov May 9, 2022 15:45
@drahnr drahnr force-pushed the overseer-janitor branch from c16790c to a4ab390 Compare May 12, 2022 14:48
@drahnr
Copy link
Contributor Author

drahnr commented May 12, 2022

bot merge

@paritytech-processbot
Copy link

Waiting for commit status.

@paritytech-processbot
Copy link

Merge cancelled due to error. Error: Statuses failed for a4ab390

@drahnr drahnr merged commit 2c934ed into master May 12, 2022
@drahnr drahnr deleted the overseer-janitor branch May 12, 2022 15:39
@shawntabrizi
Copy link
Member

is there a companion to this?

@drahnr
Copy link
Contributor Author

drahnr commented May 13, 2022

No. I just checked and cumulus master with an update to polkadot-service of the polkadot current master and it compiles just fine. What motivated that question?

@shawntabrizi
Copy link
Member

cumulus wasn't compiling, but it seems it just needed that bump. ty

@shawntabrizi
Copy link
Member

shawntabrizi commented May 16, 2022

@drahnr please look at the errors appearing here: https://gitlab.parity.io/parity/mirrors/substrate/-/jobs/1567984

image

@drahnr
Copy link
Contributor Author

drahnr commented May 16, 2022

Tracked in https://github.com/paritytech/ci_cd/issues/433 and workaround in #5530 . I'd have a preference to have these as issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit. J0-enhancement An additional feature request.
Projects
None yet
7 participants