Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

[RFC] Delay dials to relay addresses #57

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

magik6k
Copy link
Contributor

@magik6k magik6k commented Feb 15, 2018

This is a somewhat naive way to do this, but I can't think of an easy way to make this smarter without making this ugly in some way.

One way this could be improved would be to switch from chan to an array in dialAddrs, I just wasn't sure what is the reason behind it being a channel, so I decided to not touch that for now.

@ghost ghost assigned magik6k Feb 15, 2018
@ghost ghost added the status/in-progress In progress label Feb 15, 2018
Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what happens if a peer only has a relay address?
shouldn't we dial that immediately instead of waiting for 2s?

@Stebalien
Copy link
Member

I see the issue but I'd like to discuss this thoroughly first (in an issue). More generally, we need some way to prefer some transports over others.

Note: We're using a channel because, technically, we'd like to dial and discover addresses in parallel. However, we don't currently do that. This does make things a bit annoying... but we can always try reading off what's on the channel into an array and then sorting there. We can even, e.g., introduce delays if we have different "preference" tiers.

@whyrusleeping
Copy link
Contributor

Yeah, ideally we are able to discover new addresses and inform the dial process of them during a dial. I did the refactor to move towards that, but the other side (the dht address finder) hasnt caught up yet. This is a hard problem.

@magik6k
Copy link
Contributor Author

magik6k commented Feb 15, 2018

so what happens if a peer only has a relay address?
shouldn't we dial that immediately instead of waiting for 2s?

That's the hard part

@Stebalien
Copy link
Member

That's the hard part

Expanding on my previous comment, I think the better (although certainly not perfect) solution here is to:

  1. Read off the channel until it blocks (using a select). Maybe with a timeout (e.g., wait until it blocks for more than a millisecond).
  2. Sort that slice of addresses into preference tiers.
  3. Add delays between preference tiers (iff there exists an address in the preference tier).

@magik6k magik6k force-pushed the feat/dial-priority branch 3 times, most recently from 454d440 to 2012f5d Compare February 15, 2018 23:36
@magik6k
Copy link
Contributor Author

magik6k commented Feb 15, 2018

Ok, I think I pushed something that makes sense, please review.

todo:

  • tests
  • a bit more abstracted dialOffset
    • dialLimiter doesn't exactly abstract it's checks.
    • Should I PR p2p-circuit entry to go-multiaddr and to mafmt?

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't think we really need sorting. Really, we just want to bucket. Also, preempting would be nice...

Here's a quick sketch of what I'm thinking (could probably use some optimizations (also, not tested at all)). Do you think this is too complicated? Probably overkill for what we need now but we'll want something like this eventually.

(also, mixing context stuff and probably buggy as hell)

	out := make(chan ma.Multiaddr)
	go func() {
		// We have a preset number of "tiers"
		var pending [NumTiers][]ma.Multiaddr
		lastTier := 0

		// put enqueues the mutliaddr
		put := func(addr ma.Multiaddr) {
			tier := getTier(addr)
			pending[tier] = append(pending[tier], addr)
		}

		// get gets the best multiaddr available.
		get := func() (ma.Multiaddr, int) {
			for i, tier := range pending[:] {
				if len(tier) > 0 {
					addr := tier[len(tier)-1]
					tier[len(tier)-1] = nil
					pending[i] = tier[:len(tier)-1]
					return addr, i
				}
			}
		}

		// Always delay 2 seconds between tiers.
		delay := timer.Timeout(time.Second * 2)
        defer delay.Stop()

	outer:
		for {
		fill:
			for {
				select {
				case addr, ok := <-addrs:
					if !ok {
						break outer
					}
					put(addr)
				default:
					break fill
				}
			}

			next, tier = get()

			// Nothing? Block!
			if next == nil {
				addr, ok := <-addrs
				if !ok {
					break outer
				}
				put(addr)
				continue
			}

			// Jumping a tier?
			if tier > lastTier {
            	// Wait the delay (preempt with new addresses)
				select {
				case addr, ok := <-addrs:
					put(addr)
					continue
				case <-delay.C:
					// So we always get a zero delay, there
					// are better ways to deal with this...
					delay.Reset(0)
				}
			}

			lastTier = tier

			select {
			case addr, ok := <-addrs:
				put(next)
				if !ok {
					break outer
				}
				put(addr)
				continue
			case out<-addr:
            	// Always count the timeout since the last dial.
				delay.Reset(time.Second * 2)
			}
		}

		// finish sending
		for {
			next, tier := get()
			if next == nil {
				return
			}
			if tier > lastTier {
				<-delay.C
				delay.Reset(time.Second * 2)
			}
			tier = lastTier
			out<-next
		}
	}

@whyrusleeping
Copy link
Contributor

cc @diasdavid. Something something relay dialing logic.

@magik6k
Copy link
Contributor Author

magik6k commented Feb 16, 2018

Refactored to use buckets

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is crazy complicated...

dial_delay.go Outdated
const p_circuit = 290

const numTiers = 2
const tierDelay = 2 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make it just 1s? 2s might already be too much for a dial.

dial_delay.go Outdated
}
}

next, tier := get()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with the current setup we don't ever reach this point in this loop as the channel is closed by now. All sending will happen in the loop in L127. I'll add some tests later today to test that it works.

@Stebalien
Copy link
Member

With a test, LGTM. You even covered the "out of addresses" case. Nice!

(I kind of expected you to tell me to tell me my idea was way too complicated and to take a hike...)

dial_delay.go Outdated
delay := time.NewTimer(tierDelay)
triggerNext := make(chan struct{}, 1)

go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have this broken out into a separate function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

dial_delay.go Outdated
}

// get gets the best (lowest tier) multiaddr available
get := func() (ma.Multiaddr, int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting that we're using a stack within each tier. Might not be the best approach (though definitely simpler and cheaper to implement)

@magik6k
Copy link
Contributor Author

magik6k commented Feb 19, 2018

  • Broken the code into some more functions
  • Added tests for all cases I could imagine
    • almost all cases covered (except contexts)
    • ran the tests 20 times, none have failed (not even on windows)

@whyrusleeping
Copy link
Contributor

I would also like @diasdavid to review this. This is a libp2p design decision I think he should have a say in.

dial_delay.go Outdated
var pending [numTiers][]ma.Multiaddr
lastTier := -1

// put enqueues the mutliaddr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/mutliaddr/multiaddr/

dial_delay.go Outdated
}
put(addr)
default:
break loop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Given that we have a function here, I'd just move the return true statement here and get rid of the named break.

// swarm and agrees on the |
// muxer to use /---\
// The result is distributed -> |||||
// to callers of Dial |||||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien after the transport refactor only the part after 'dialConnSetup' changed, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

dial_delay.go Outdated

var TierDelay = 1 * time.Second

var relay = mafmt.Or(mafmt.And(mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS)), mafmt.And(mafmt.Base(ma.P_IPFS), mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could have the transports have some way to indicate that they should be delayed, possibly an optional interface like

type ExpensiveTransport interface {
  TransportCost() int //basically the 'tier' to use here
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be aware of potential problems with changes in the circuit addressing here -- cf libp2p/specs#72

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might break with the changes in libp2p/go-libp2p-circuit#48
We should drop the trailing /ipfs part from the filter and just rely on the presence of /p2p-circuit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also problematic for explicit relay addresses.

@whyrusleeping whyrusleeping changed the title [RFC] Delay dials to realy addresses [RFC] Delay dials to relay addresses Jun 12, 2018
@bigs bigs added the status/blocked Unable to be worked further until needs are met label Sep 4, 2018
@bigs
Copy link
Contributor

bigs commented Sep 4, 2018

per @diasdavid's suggestion should we close this temporarily and move to an RFC?

edit: i actually mean move it to a discussion and specification session in specs?

@bigs bigs added status/deferred Conscious decision to pause or backlog and removed status/in-progress In progress labels Sep 4, 2018
@Stebalien
Copy link
Member

I really can't remember what this was blocked on other than "holy shit this is complicated".

@vyzo
Copy link
Contributor

vyzo commented Sep 27, 2018

Reopening this, as it's a high priority issue.

@vyzo vyzo added P1 High: Likely tackled by core team if no one steps up and removed status/deferred Conscious decision to pause or backlog labels Sep 27, 2018
@vyzo
Copy link
Contributor

vyzo commented Sep 27, 2018

So where are we blocked here?

@magik6k
Copy link
Contributor Author

magik6k commented Sep 27, 2018

#57 (comment)

holy shit this is complicated

No, but seriously, IIRC the core logic here is done, one thing that would be nice would be to have some method in the transport interface saying that the transport is lower tier so we don't hardcode relay here - https://github.com/libp2p/go-libp2p-swarm/pull/57/files#diff-46c472072de3b864e2fabb74ab784758R202, but I think this can by done separate to this PR

@magik6k magik6k removed the status/blocked Unable to be worked further until needs are met label Sep 27, 2018
dial_delay.go Outdated
mafmt "github.com/whyrusleeping/mafmt"
)

const p_circuit = 290
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably capitalize this to P_CIRCUIT -- that's what we do everywhere else.

dial_delay.go Outdated

var TierDelay = 1 * time.Second

var relay = mafmt.Or(mafmt.And(mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS)), mafmt.And(mafmt.Base(ma.P_IPFS), mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be aware of potential problems with changes in the circuit addressing here -- cf libp2p/specs#72

@vyzo
Copy link
Contributor

vyzo commented Sep 27, 2018

it's still quite complicated, but it looks ok; will reread in the morning.

@Stebalien care for a refresher review? Let's move it forward.

Just the presence of /p2p-circuit in the addr is enough to warrant delayed dialing.
This accepts both explicit relay addresses and is also future-proof for changes in
go-libp2p-circuit#48 and specs#72.
@ghost ghost assigned vyzo Sep 28, 2018
@ghost ghost added the status/in-progress In progress label Sep 28, 2018
@vyzo
Copy link
Contributor

vyzo commented Sep 28, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1 High: Likely tackled by core team if no one steps up status/in-progress In progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants