Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm: send notifications synchronously #1562

Merged
merged 1 commit into from
Jun 1, 2022
Merged

Conversation

marten-seemann
Copy link
Contributor

Recreating libp2p/go-libp2p-swarm#295.

We've notified downstream (especially bitswap) a long time ago, so we'll go ahead and make this change.

@marten-seemann marten-seemann requested a review from MarcoPolo May 26, 2022 12:11
@marten-seemann marten-seemann mentioned this pull request May 26, 2022
41 tasks
@Stebalien
Copy link
Member

Given that this maintains ordering, it at least won't cause anything to be incorrect.

However:

We've notified downstream (especially bitswap) a long time ago, so we'll go ahead and make this change.

You need to keep your main downstream users happy. If you're introducing a change that'll adversely affect Lotus and go-ipfs, handling the fallout is going to be your problem.

But, if you're willing to handle that, then going ahead and making the change is a good way to make progress. But that means:

  1. Merging this change.
  2. Working with IPFS to test it out on the gateways to make sure it doesn't cause them to fall over.
  3. Work with IPFS to fix any issues.

@MarcoPolo
Copy link
Collaborator

Can we move the goroutine invocation to the netNotifee implementations? i.e. a patch like this https://gist.github.com/MarcoPolo/278d75da3ea2ce804d55ea0c8a6b0495 should give about the same semantics to bitswap as it currently has (the one difference is the notifs.RLock is held until all notifees return, but I'm not sure if they rely on that lock being held – they shouldn't).

This way current implementations can defer updating their logic by doing this transform that is mostly the same.

Another thing that might help here is a context parameter being passed in. This way the different semantics also have a different api and downstream users have to change their code. Otherwise if we make this change and no downstream code is changed things will just get automatically slower.

@Stebalien
Copy link
Member

Stebalien commented May 27, 2022

the one difference is the notifs.RLock is held until all notifees return, but I'm not sure if they rely on that lock being held – they shouldn

They are, and they need to. This is how we ensure that disconnect and connect events are delivered in-order.

A better solution would likely be to either:

  1. Use the correct connect/disconnect events from the event bus. Any idea how this interacts with transient connections?
  2. Write connect/disconnect events to a channel (synchronously) and manage them in a background event loop (in-order).

@marten-seemann
Copy link
Contributor Author

Use the correct connect/disconnect events from the event bus. Any idea how this interacts with transient connections?

Maybe we should rework our Connectedness event so that we can signal the transition from transient -> not transient?

@MarcoPolo
Copy link
Collaborator

MarcoPolo commented May 27, 2022

A better solution would likely be to either:
2. Write connect/disconnect events to a channel (synchronously) and manage them in a background event loop (in-order).

I agree, but just to clarify: This should be done by the protocol implementation itself and not the library. The implementation may want to keep a buffer here or spawn workers or something else. The library shouldn't do extra allocs or work to try to predict what a protocol implementation would want.

So the patch I linked above could be expanded to write to a channel and spawn a goroutine to do the work. Ordering would be guaranteed and you'd have the same semantics as now.

@Wondertan
Copy link
Contributor

Wondertan commented May 27, 2022

EDIT: just saw #1692

Maybe we should rework our Connectedness event so that we can signal the transition from transient -> not transient?

Yes, pls. Currently, this is what I've been doing when handling Connectedness event and it would be nicer to have transient flag on the event instead.

for _, c := range m.host.Network().ConnsToPeer(p) {
		if c.Stat().Transient {
			continue
		}
}

Copy link
Collaborator

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about changing the API to accept a new context parameter? I'd like to make this a breaking change for users of this interface because the semantics change a bit (now a notifiee can delay other notifiee for the same event).

@Stebalien
Copy link
Member

Maybe we should rework our Connectedness event so that we can signal the transition from transient -> not transient?

That would be great. For now, I'm playing around with some other fixes.

I agree, but just to clarify: This should be done by the protocol implementation itself and not the library. The implementation may want to keep a buffer here or spawn workers or something else. The library shouldn't do extra allocs or work to try to predict what a protocol implementation would want.

I agree that go-libp2p shouldn't have to spawn goroutines, that's technical debt that we need to handle in the users of go-libp2p.

However, go-libp2p definitely needs to continue delivering notifications in-order.

What do you think about changing the API to accept a new context parameter? I'd like to make this a breaking change for users of this interface because the semantics change a bit (now a notifiee can delay other notifiee for the same event).

I'm not sure what we'd use the context for?

p2p/net/swarm/swarm.go Show resolved Hide resolved
@marten-seemann
Copy link
Contributor Author

What do you think about changing the API to accept a new context parameter?

How would that context be used? Under which conditions would we cancel it?

@MarcoPolo
Copy link
Collaborator

How would that context be used? Under which conditions would we cancel it?

No specific thoughts here, but maybe a 1 second timeout would be good enough. It's not like we can guarantee that notifiees would use it anyways. And maybe in the future if we have a useful parent context we could pass that in.

@MarcoPolo
Copy link
Collaborator

I'm not sure what we'd use the context for?

My reasoning here is to purposely break the api so that downstream users notice this change.

@MarcoPolo
Copy link
Collaborator

No strong opinion re context here really.

@marten-seemann
Copy link
Contributor Author

I do like the idea of breaking the API, but I'm a bit worried introducing a context that's not really used. Let's merge it as is.

@Stebalien
Copy link
Member

Bitswap patch to avoid blocking: ipfs/go-bitswap#565.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants