Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dex: redesign ConnectionMaster #1474

Merged
merged 2 commits into from
Jun 7, 2022
Merged

Conversation

chappjc
Copy link
Member

@chappjc chappjc commented Feb 10, 2022

See the discussion at #1445 (comment) for the motivation.

This redesigns dex.ConnectionMaster, adding a Done() <-chan struct{} method upon which the On and Wait methods are now based.

This also uses the new Done method in (*Core).startWalletSyncMonitor to quit if the wallet shuts down without an extra goroutine sitting on Wait.

if !c.On() {
return // probably a bug in the consumer
}
c.cancel()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the old Disconnect method where the cancel field has always been ungaurded by the mutex.
Evidently we do have sane Connect/Disconnect access patterns, and the On method is the only one that really needs concurrency controls, which it gets via Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must not be used before or concurrently with Connect.

I think this is an assumption we are under at various levels. Good to doc it, though.

Comment on lines -121 to -146
c.wg.Wait()
c.cancel() // if not called from Disconnect, would leak context
Copy link
Member Author

@chappjc chappjc Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WaitGroup is no longer a field with the new design around the done channel, but it's worth noting that the old Wait method did not guard access to either of these fields.
Again, On appears to be the only method called without any synchronization with the other methods (see (*xcWallet).state for example).

func (c *ConnectionMaster) On() bool {
select {
case <-c.Done():
return false
Copy link
Member Author

@chappjc chappjc Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although Wait has the same semantics as before, the On method now correlates to the Wait method's timing instead of just when the inner context is canceled. This new way is actually a bit more consistent since depending on how the Connector died (context canceled vs. internally-initiated shutdown) On would previously indicate slightly different stages of the Connector's lifecycle (signaled to shut down vs actually shut down).

dex/runner.go Outdated Show resolved Hide resolved
dex/runner.go Show resolved Hide resolved
dex/runner.go Show resolved Hide resolved
@chappjc chappjc marked this pull request as ready for review May 18, 2022 18:01
@chappjc
Copy link
Member Author

chappjc commented May 18, 2022

Out of draft now since we keep running into issues and I'd kinda like to eliminate the fields that can be nil. #1602 (comment), #1616, #1472 (comment), 6aaa3c0, etc.

However, I suspect we still need to rethink the behavior of the dex.Connector:

Connect(ctx context.Context) (*sync.WaitGroup, error)

Seems straightforward, but to facilitate the reconnect loop starting for DEX conns that fail their initial attempt, we have the case where wg != nil while also err != nil. I'm not sure how best to fix or rework that. This PR just continues to handle that case.

dex/runner.go Outdated Show resolved Hide resolved
Copy link
Contributor

@martonp martonp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect(ctx context.Context) (*sync.WaitGroup, error) could be updated to Connect(ctx context.Context) (*sync.WaitGroup, bool, error) where if the boolean is true, it means that the the initial connection failed, but the reconnect loop is running.

@chappjc
Copy link
Member Author

chappjc commented May 20, 2022

Connect(ctx context.Context) (*sync.WaitGroup, error) could be updated to Connect(ctx context.Context) (*sync.WaitGroup, bool, error) where if the boolean is true, it means that the the initial connection failed, but the reconnect loop is running.

That's a possibility. I guess the suggestion is to have err be nil in that case, where previously it was whatever error from the initial failed conn attempt? In that case I suppose the logger for whatever subsystem that is could log the error, although that could be suboptimal if we wanna convey the cause to the frontend via a notification or something.

@buck54321
Copy link
Member

Kinda seems like we need different behavior depending on whether we are trying the connection for the first time ((*Core).Register) or we're restarting a connection that has already worked before ((*Core).initialize).

@chappjc
Copy link
Member Author

chappjc commented May 24, 2022

Connect vs ConnectOnce does it maybe.
My feeling is that's a different PR as it touches all dex.Connectors like wallets not just server conns. Main goal here was a Done method with a channel to wait on, and I just wanted to raise the design issue we're hitting with the waitgroup return.

BTW I've had thoughts about changing the *WaitGroup to something else more generic with a Wait method or even a context.Context since it has both an Err and Done method even if that's odd to return.

None of this seems high priority though compared to the mess with LTC

@chappjc chappjc added this to the 0.5 milestone May 25, 2022
@chappjc chappjc merged commit 98cf596 into decred:master Jun 7, 2022
@chappjc chappjc deleted the connmaster-done branch June 7, 2022 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants