Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op-conductor,op-node: allow system to select port, make op-node wait for conductor endpoint #12863

Merged
merged 4 commits into from
Nov 8, 2024

Conversation

protolambda
Copy link
Contributor

@protolambda protolambda commented Nov 7, 2024

Description

  • Make conductor actually listen on the host that it is configured to, rather than always 0.0.0.0
  • Make conductor optionally default to the address it binds to, rather than a preconfigured address, for advertising their self as peer
  • Add conductor CLI option to change the server-address it advertises itself as, to not only use the same address it binds to, in case you bind it to an IP that doesn't resolve otherwise. If unspecified, it defaults to what address it is listening on.
  • Make op-node wait for the conductor RPC to be available. This is instantly available in regular operation, from CLI. But may take longer in case of an e2e test, where the conductor is started after the op-node. The conductor RPC client is lazy-loaded (initialize already runs on usage of the RPC method, rather than creation of the RPC client).

Tests

  • E2E system now sets up conductors without trying to reserve ports. This prevents flakes where endpoints aren't actually reserved, or where endpoints are not up and running yet.

Note: TestSequencerFailover_ActiveSequencerDown still seems to flake due to a leadership transfer timeout. Maybe because of the 1-second timeout, which is quite low for CI on limited resources. So I bumped the conductor RPC timeout to 5 seconds.

And I am leaving the op-conductor logging on debug level, so we can look at any further flakes if they do happen.

Copy link
Collaborator

@mslipper mslipper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, but would like someone with more conductor experience to opine before merging.

@tynes
Copy link
Contributor

tynes commented Nov 7, 2024

I think this code should be moved over to the infra repo given the infra team owns the code in the infra repo and this repo is mostly owned by the protocol team

@axelKingsley
Copy link
Contributor

@axelKingsley axelKingsley reopened this Nov 7, 2024
@axelKingsley
Copy link
Contributor

(My bad, I misclicked when reading this review and closed it momentarily)

Copy link
Contributor

@zhwrd zhwrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one change suggestion, otherwise lgtm

op-conductor/consensus/raft.go Show resolved Hide resolved
op-conductor/consensus/raft.go Show resolved Hide resolved
@zhwrd
Copy link
Contributor

zhwrd commented Nov 7, 2024

I think this code should be moved over to the infra repo given the infra team owns the code in the infra repo and this repo is mostly owned by the protocol team

Not sure its about what team owns what code, imo we really need e2e tests for conductor to run when op-stack changes. We actually need to expand the e2e test suite for conductor since we've had a few regressions in op-stack introduced that could be caught earlier. I'm actually also considering versioning conductor with the rest of the op-stack since they are pretty tightly coupled, but tbd

@protolambda protolambda added this pull request to the merge queue Nov 8, 2024
Merged via the queue into develop with commit 5662448 Nov 8, 2024
49 checks passed
@protolambda protolambda deleted the conductor-network-setup-fix branch November 8, 2024 12:58
GrapeBaBa pushed a commit to optimism-java/optimism that referenced this pull request Nov 13, 2024
…for conductor endpoint (ethereum-optimism#12863)

* op-conductor,op-node: allow system to select port, make op-node wait for conductor endpoint

* op-conductor,op-node: debugging conductor test

* op-conductor: more debugging

* op-e2e: increase conductor timeout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants