Skip to content
This repository has been archived by the owner on Apr 21, 2022. It is now read-only.

Fix concurrency and silence period not being honoured #26

Merged
merged 2 commits into from
Dec 17, 2018

Conversation

raulk
Copy link
Member

@raulk raulk commented Dec 12, 2018

Fixes #23. When connections are added while we're above the high watermark, we trigger multiple concurrent trims at once. This patch addresses these problems:

  • The silence period was not honoured because we don't read lastTrim inside a lock; hence trims were allowed to proceed when they shouldn't have.
  • If all connections were scored the same (e.g. zero), each concurrent prune would end up trimming random connections (as a result of map iteration randomness), meaning that there's a high likelihood (captured in the test) that we end up pruning more connections than we should've (probably all of them!).
  • As multiple trims were running concurrently, they were closing overlapping subsets of connections, and the disconnect notifee ended up being called more than once per connection (visible in the test now that I've wired the disconnect notifee; run this new test against the old version of the conn manager to see the warnings).

@Stebalien
Copy link
Member

As multiple trims were running concurrently, they were closing overlapping subsets of connections, and the disconnect notifee is called more than once per connection

You mean Close was getting called multiple times, right? The swarm disconnect notifiee shouldn't get called more than once per connection no matter how many times Close is called.

@Stebalien
Copy link
Member

But yeah... this would definitely help explain some bitswap reliability issues.

@raulk
Copy link
Member Author

raulk commented Dec 12, 2018

You mean Close was getting called multiple times, right? The swarm disconnect notifiee shouldn't get called more than once per connection no matter how many times Close is called.

True, we should fix that at the origin. In this case, the dup notification has no impact other than a warning log, but for pubsub it does. @vyzo just filed this today as well: libp2p/go-libp2p-pubsub#130, stemming from vacp2p/nim-libp2p#8

But yeah... this would definitely help explain some bitswap reliability issues.

Agree. In general, bursty protocols trigger the worst case scenario.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! I'd like to consider the semantics of TrimOpenConns a bit but that shouldn't block this fix.

connmgr.go Show resolved Hide resolved
connmgr.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

True, we should fix that at the origin. In this case, the dup notification has no impact other than a warning log, but for pubsub it does. @vyzo just filed this today as well: libp2p/go-libp2p-pubsub#130, stemming from vacp2p/nim-libp2p#8

I think that's a different issue. That is, we might have multiple connections for a single peer and will trigger a separate disconnect notification per connection.

That's why I asked, I'm pretty sure we're not firing multiple disconnect notifications for a single connection.

@vyzo
Copy link
Contributor

vyzo commented Dec 13, 2018

Yeah, the pubsub issue is that we have broken peer management -- it's not designed to handle multiple connections, and it breaks as soon as one of the connections is closed.

@raulk raulk merged commit ce0b463 into libp2p:master Dec 17, 2018
@ghost ghost removed the status/in-progress In progress label Dec 17, 2018
@raulk raulk deleted the fix/concurrency-and-silence-period branch December 17, 2018 14:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants