Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Fix simultaneous dials #246

Closed
wants to merge 10 commits into from
Closed

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Mar 29, 2021

This implements the necessary logic for dealing with simultaneous dials which may have conflicting constraints.

TBD: tests

@vyzo vyzo requested a review from Stebalien March 29, 2021 16:13
Bugz:
- Remove CancelDial from addConn, which would short-circuit the dials waiting for its result
- Fix race in decref reaching 0

Test issues:
- Adjust for the new signature of the DialFunc
- Connections can be empty shims in tests, so Close can panic
- Comment out TestBasicDialSync which is now not applicable and needs to be rewritten
- Account for multerrors instead of flat DialErrors
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about an approach where a new dial joins existing dials?

  1. Spawn a dial worker.
  2. Each dial sends a message down a channel to the dial worker with options & address filters.
  3. When a dial joins, check all undialed addresses (addresses in the peerstore minus the set of already tried addresses) against the address filter and consider adding them to the back of the dial queue.
  4. When we form connections, check them against the address filters.
  5. When a dial "leaves" (cancels), remove addresses that no longer match any filter.

I'm open to approaches, but the important parts are:

  1. Seamlessly handle multiple dials joining/leaving without wasting work.
  2. Eventually, we want this to morph into something like the "dialer v2 work". Whatever we do here, we want to be able to "replan" dials to pending addresses while the dial is active. The current approach will force us to wait for the current dial to complete.

defer ad.decref()

dialCtx := ad.dialContext(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: by go convention, xContext means "x but with a context". Rename to deriveDialContext.

break startDial
case <-ad.waitch:
// we lost a race, we need to try again because the connection might not be what we want
ad.decref()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dereferencing? Won't that cause the in-progress dial to be canceled if the original caller bails?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we'll just restart, but that's pretty inefficient if we were half-way through opening a bunch of connections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, we only invoke this code if the active dial is currently "finishing". So we're not going to cancel anything.

Comment on lines +57 to +61
// for mock tests
if c == nil || c.conn == nil {
return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???? This is really scary. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary after fixing a bug, will remove!

ad.addrsLk.Lock()
defer ad.addrsLk.Unlock()

for _, a := range addrs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we never bother to dial one of these addresses because we're canceled? We'll just skip that overall.

Test case:

  1. Start dial A.
  2. Start dial B.
  3. Cancel dial A.
  4. Dial B should pick up where dial A left off.

With the current code, dial B will, I believe, fail to dial any of the addresses dial A was trying to dial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the dials will still happen -- they use the background context, so there should be no interference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

@vyzo vyzo mentioned this pull request Mar 30, 2021
@vyzo
Copy link
Contributor Author

vyzo commented Mar 30, 2021

Closing in favor of #250 which accomplishes the same goals with much cleaner code.

@vyzo vyzo closed this Mar 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants