bitswap: clean up ledgers when disconnecting #3437

whyrusleeping · 2016-11-29T01:38:10Z

License: MIT
Signed-off-by: Jeromy why@ipfs.io

Kubuxu · 2016-12-06T23:16:08Z

exchange/bitswap/decision/engine.go

-	// TODO: release ledger
+	e.lock.Lock()
+	defer e.lock.Unlock()
+	l, ok := e.ledgerMap[p]


You are not locking l.lk here, and again we have situation with two locks. It shouts to me deadlock.

Added the ledger lock. Re locking concerns, these ones are well scoped. the engine lock is either always held first, or not held while taking the ledger lock. And the engine lock is never taken while holding a ledger lock.

Kubuxu · 2016-12-07T21:11:40Z

Other concern: what if PeerConnected gets the instance but can't acquire lock for ledger as it is locked by PeerDisconnected. Then PeerConnected will increase value on ledger that is not longer in ledger map.

whyrusleeping · 2016-12-07T21:17:46Z

@Kubuxu hrm... for that to happen, PeerConnected would have to return from findOrCreate, and then PeerDisconnected would have to take the engine lock, pull the ledger out of the map, and take the ledger lock before the PeerConnected process is able to. This IS possible.

One option is to not use findOrCreate and instead take the engine lock ourselves throughout the entire call to PeerConnected, essentially reimplementing findOrCreate in that function.

Kubuxu · 2016-12-07T21:23:49Z

I know that it is something that might happen very rarely or never but those edge cases add up and create hard to track down bugs and instability. If chance of this bug occuring is 0.00001% then chance that it will occur across 100000 runs is more than 60% and if we don't stop possibly introducing bugs like that go-ipfs will be always unstable and unreliable.

whyrusleeping · 2016-12-07T21:30:22Z

@Kubuxu Right, So i think the solution is to make PeerConnected hold the engine lock through the entire method.

Kubuxu · 2016-12-09T14:31:07Z

So now it is thread safe, but does function findOrCreate makes sense if we have introduced ref counting?

Kubuxu · 2016-12-09T14:33:21Z

Also I am still not a fan of those two locks as some not really connected change can introduce deadlock (locking for engine while holding ledger some ledger) and we might not catch it when we introduce it. We should really look into Actor oriented communication and how bad/good it will be.

Kubuxu · 2016-12-09T17:24:08Z

I rebased it to run coverage on it.

Kubuxu · 2016-12-09T17:46:19Z

It isn't tested anywhere, it might be worth to do that.

whyrusleeping · 2016-12-09T21:13:26Z

@Kubuxu this part of the code used to be actor oriented,and was vastly more complicated, and much more difficult to get working properly. In addition to requiring a very large amount of coroutines to get running.

…

On Fri, Dec 9, 2016, 09:46 Jakub Sztandera ***@***.***> wrote: It isn't tested anywhere, it might be worth to do that. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3437 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABL4HPVEbO65g29cnw4M7ZJxKLHcftjJks5rGZPtgaJpZM4K-d__> .

Kubuxu · 2016-12-09T21:52:31Z

I am positive that we can make it clean and not so complicated with enough layers of sugarcoating.

I am just almost sure that we will introduce deadlock around this place sooner or later and it won't be diagnosed for a long time as reproduction of this will be almost impossible.

Also for someone to report deadlock like this one he would have to 1. encounter this deadlock 2. don't try resetting the node 3. capture goroutine dump 4. have us find those blocked routines on this lock. I miss Java's features in this regard.

This change LGTM if I get some tests. In case of not directly sharness tested features I would like the codecov/patch build check to be green.

Kubuxu · 2016-12-19T16:40:43Z

Ok, it shows as if there was no coverage due to lack of cross package cover testing.

ghost · 2016-12-20T23:36:53Z

Can I add the RFM label here? Let's continue the locking discussion in #3506.

License: MIT Signed-off-by: Jeromy <why@ipfs.io>

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

whyrusleeping added status/in-progress In progress need/review Needs a review labels Nov 29, 2016

whyrusleeping assigned ghost and Kubuxu Nov 29, 2016

Kubuxu requested changes Dec 6, 2016

View reviewed changes

whyrusleeping force-pushed the feat/bitswap-cleanup-ledger branch from 73be519 to 3c34085 Compare December 7, 2016 20:59

whyrusleeping force-pushed the feat/bitswap-cleanup-ledger branch from 3c34085 to efb2e39 Compare December 8, 2016 00:18

Kubuxu approved these changes Dec 9, 2016

View reviewed changes

Kubuxu force-pushed the feat/bitswap-cleanup-ledger branch from efb2e39 to 22ba9bb Compare December 9, 2016 17:23

Kubuxu unassigned ghost and Kubuxu Dec 14, 2016

Kubuxu mentioned this pull request Dec 14, 2016

Mutex/Lock limit #3506

Closed

Kubuxu added status/ready Ready to be worked status/in-progress In progress and removed status/in-progress In progress status/ready Ready to be worked labels Dec 19, 2016

Kubuxu force-pushed the feat/bitswap-cleanup-ledger branch from 22ba9bb to 5064976 Compare December 19, 2016 16:41

bitswap: clean up ledgers when disconnecting

f53dc7c

License: MIT Signed-off-by: Jeromy <why@ipfs.io>

test for partner removal

331e60b

License: MIT Signed-off-by: Jeromy <jeromyj@gmail.com>

whyrusleeping force-pushed the feat/bitswap-cleanup-ledger branch from 5064976 to 331e60b Compare May 20, 2017 02:05

whyrusleeping merged commit ec43fe4 into master May 20, 2017

whyrusleeping deleted the feat/bitswap-cleanup-ledger branch May 20, 2017 02:23

whyrusleeping removed the status/in-progress In progress label May 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitswap: clean up ledgers when disconnecting #3437

bitswap: clean up ledgers when disconnecting #3437

whyrusleeping commented Nov 29, 2016 •

edited by Kubuxu

Loading

Kubuxu Dec 6, 2016

whyrusleeping Dec 7, 2016

Kubuxu commented Dec 7, 2016 •

edited

Loading

whyrusleeping commented Dec 7, 2016

Kubuxu commented Dec 7, 2016 •

edited

Loading

whyrusleeping commented Dec 7, 2016

Kubuxu commented Dec 9, 2016

Kubuxu commented Dec 9, 2016

Kubuxu commented Dec 9, 2016

Kubuxu commented Dec 9, 2016

whyrusleeping commented Dec 9, 2016 via email

Kubuxu commented Dec 9, 2016 •

edited

Loading

Kubuxu commented Dec 19, 2016

ghost commented Dec 20, 2016

bitswap: clean up ledgers when disconnecting #3437

bitswap: clean up ledgers when disconnecting #3437

Conversation

whyrusleeping commented Nov 29, 2016 • edited by Kubuxu Loading

Kubuxu Dec 6, 2016

Choose a reason for hiding this comment

whyrusleeping Dec 7, 2016

Choose a reason for hiding this comment

Kubuxu commented Dec 7, 2016 • edited Loading

whyrusleeping commented Dec 7, 2016

Kubuxu commented Dec 7, 2016 • edited Loading

whyrusleeping commented Dec 7, 2016

Kubuxu commented Dec 9, 2016

Kubuxu commented Dec 9, 2016

Kubuxu commented Dec 9, 2016

Kubuxu commented Dec 9, 2016

whyrusleeping commented Dec 9, 2016 via email

Kubuxu commented Dec 9, 2016 • edited Loading

Kubuxu commented Dec 19, 2016

ghost commented Dec 20, 2016

whyrusleeping commented Nov 29, 2016 •

edited by Kubuxu

Loading

Kubuxu commented Dec 7, 2016 •

edited

Loading

Kubuxu commented Dec 7, 2016 •

edited

Loading

Kubuxu commented Dec 9, 2016 •

edited

Loading