-
Notifications
You must be signed in to change notification settings - Fork 1k
Conversation
internal/gps/source.go
Outdated
return 0, err | ||
} | ||
} | ||
//TODO(jmank88) broadcast sg.src.upstreamURL() changes here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps now is the time to address this, but I don't have a simple solution. Remapping the URLs would catch many cases, but it seems like there is always a worst case scenario where two of (or sets of) sources/gateways/caches have to be merged after both being in use. However, I may be considering a more general case than is necessary. This partially depends on the possible URLs available from maybeSource
s, and whether a source
's upstreamURL
is always from this set (meaning we could map the possibilities ahead of time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think our best solution is likely to be in changing the scopes of responsibility, here. something like the following: we create some new indirection layer that basically acts like a persistent sourceGateway
factory: it takes a maybeSources
and occupies all the corresponding URL slots in the sourceCoordinator
's URL lookup map, and spits out a sourceGateway
on request.
now, the sourceCoordinator
is only responsible for mapping to families of related URLs, and the sourceGatewayFactory
(or whatever) is doing the final layer of mapping work. it's likely that we'll need to rework some of sourceCoordinator.getSourceGatewayFor()
to accommodate this, possibly including the way that sourceCoordinator.srcs
and sourceCoordinator.nameToURL
actually work.
that expands the scope of this PR significantly, though, and my gut currently says we'll be able to defer it until a later PR. for now, we'll probably be able to get away with just having a hard failure here where we might otherwise be able to recover by going back to other, possible upstream URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, we'll probably be able to get away with just having a hard failure here where we might otherwise be able to recover by going back to other, possible upstream URLs.
FWIW I've been running with a panic
on any url change and it hasn't happened yet (mostly ran tests and cockroachdb). A regular error is probably sufficient though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it takes a maybeSources and occupies all the corresponding URL slots in the sourceCoordinator's URL lookup map, and spits out a sourceGateway on request.
Can we be certain about all of the possible URLs ahead of time? e.g. gopkgin code currently returns one url from maybeSource
, but a different aliasURL
from gopkginSource
. That one would be trivial to change, and the others look pretty standard and static. For some reason I had it in my head that it would be possible to ping upstream and be forwarded to another remote or something. If that is not the case, then we may be able to get away with the existing data structures (with more mapping) and with some mechanism for merging additional maybeSource
s into a gateway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I've been running with a panic on any url change and it hasn't happened yet (mostly ran tests and cockroachdb).
I take that back. Now I'm seeing test failures on CI like: upstream source url changed from "ssh://git@github.com/sdboyer/gpkt" to "https://github.com/sdboyer/gpkt"
which seems like something we need to handle now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed.
internal/gps/source.go
Outdated
existsCallsListVersions() bool | ||
// listVersionsRequiresLocal returns true if calling listVersions first | ||
// requires the source to exist locally. | ||
listVersionsRequiresLocal() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments became out of sync. Any better name/doc suggestions before fixing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem fine for now. there's a refactoring in store for these, anyway, i think.
@sdboyer This is green now. Let me know what you think of the gateway mapping change, and whether I should squash any or all of the commits. |
Don't merge yet. Tagging as WIP since I'm getting real-world failures, even though tests are passing. |
After refactoring |
Rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holy crap, there's a lot in here 🎉
by and large, this looks like great progress. i've got a couple of more specific questions, but i'd ideally like to get this in ASAP so that it can percolate as long as possible before next release.
internal/gps/error.go
Outdated
@@ -0,0 +1,34 @@ | |||
// Copyright 2017 The Go Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a little uneasy about giving this its own file. i appreciate that it doesn't really fit where it was, but...well, we already have a bunch of antipatterns with errors, and i think it's becoming increasingly harmful.
no change to make here, just making a note of it.
internal/gps/manager_test.go
Outdated
@@ -432,8 +476,20 @@ func TestSourceCreationCounts(t *testing.T) { | |||
ProjectIdentifier{ProjectRoot: ProjectRoot("github.com/sdboyer/gpkt"), Source: "https://github.com/sdboyeR/gpkt"}, | |||
mkPI("github.com/sdboyeR/gpkt"), | |||
}, | |||
namecount: 6, | |||
srccount: 1, | |||
names: []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes i feel like you just follow my code around, tidying up icky tests ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, well this one had to change. Actually the name should probably be changed.
internal/gps/manager_test.go
Outdated
} else if srcg == srcg4 { | ||
t.Error("explicit http source should create a new src") | ||
} else if srcg != srcg4 { | ||
t.Error("explicit http source should reuse autodetected https source") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hang on, this feels like a significant departure from existing logic. is this suggesting that http should fold in with https?
what about doing this is worth breaking the simpler rules of the existing model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly are those rules? Since the map was keyed on both name and url, I thought it was in line with existing behavior to fold them all together. Was sharing a map only valid since the url was static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the "rule" here - no, not expressed anywhere, because i have, as we know, not been great at articulating these really low-level invariants anywhere, 💯 my fault - is that if an explicit request is made for e.g. an http
protocol access pattern for a URL, then we have to respect it, and that's how we access. it's only in the default inference case, where no protocol is provided (a bare import path) where we can "pick" from the various possibilities, and where the specific one used may vary from one run to the next (but, once selected, is fixed within the duration of any given SourceMgr
's lifetime).
this might seem to push us in a direction that i was considering (and i think we discussed somehwere?), where the sourceGateway
effectively becomes more of a factory/intermediate layer that hangs on to the entire maybeSources
set and multiplexes to children that are responsible for each individual URL, depending on which particular input pattern comes in.
(if we're thinking about that, one crucial reminder about the maybeSources
model is that e.g. bitbucket abstracts maybeSources
to allow for a project to be either git or hg - so we're encompassing in there the idea that ).
internal/gps/source.go
Outdated
@@ -268,43 +258,38 @@ func newSourceGateway(maybe maybeSource, superv *supervisor, cachedir string) *s | |||
return sg | |||
} | |||
|
|||
func (sg *sourceGateway) addMaybeSources(mb maybeSources) { | |||
sg.maybeMu.Lock() | |||
sg.maybe = append(sg.maybe, mb...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we're conjoining gateways and caches for multiple maybeSources
, then - only one ever active per SourceMgr
? interesting. seems like that can work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the big question i have (without having checked for myself) - is there a scenario where the user can get stuck if a repo exists on local disk for one scheme, but the credentials they'd previously configured to access that scheme no longer work/are missing? in such a case, we should ideally continue moving through the maybeSource
options - is that where this PR moves us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should not get stuck. In that case, whenever sourceExistsUpstream
is required and fails, another source would be chosen.
internal/gps/source.go
Outdated
} | ||
|
||
func (sg *sourceGateway) existsUpstream(ctx context.Context) bool { | ||
func (sg *sourceGateway) existsUpstream(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...i think i may want us to more broadly refactor these. i think (not quite sure yet) that this PR is taking us into territory where local can diverge from upstream. hand-in-hand with that is being a lot more careful about being able to inquire about current state without having the SourceMgr
change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we have to allow divergence from upstream at some level to benefit from caching, but I hope I haven't done that unintentionally. Much of the language unfortunately becomes fuzzy. For example, when does 'exists' mean now and when does it mean within the cache window? I tried to be careful about these, and to document the sourceState
s in a helpful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, that sounds reasonable for now, at least. but we're going to need to articulate this very, very clearly (i think i need to sketch a fuller FSM) if we have any hope of keeping track of this in a clear, understandable way.
internal/gps/source.go
Outdated
// otherwise it returns a slice of all errors. If force is true, then each | ||
// maybeSource may make a second attempt after removing the cache directory. testFn | ||
// is an optional check which may access src and return additional sourceState. | ||
func (sg *sourceGateway) setUpSource(ctx context.Context, force bool, testFn func(context.Context) (sourceState, error)) (sourceState, errorSlice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going to re-expose these methods, we need to be diligent about not calling them from the outside. there was a rather huge mess back before i refactored to use the FSM, and much of it arose from mixing locks with intra-source object calls.
in fact, i'd probably be more comfortable if we separated these out by hiding the ones we expect SourceMgr
itself to call (which should be mutex-protected) directly into an interface, and leaving the remainder unexposed.
internal/gps/source.go
Outdated
existsCallsListVersions() bool | ||
// listVersionsRequiresLocal returns true if calling listVersions first | ||
// requires the source to exist locally. | ||
listVersionsRequiresLocal() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem fine for now. there's a refactoring in store for these, anyway, i think.
internal/gps/source_cache.go
Outdated
ImportRoot: string(pr), | ||
Packages: make(map[string]pkgtree.PackageOrErr), | ||
} | ||
// Return a copy, with full import paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use PackageTree.Copy()
here instead of re-rolling what's already there - even if it means another iteration pass over the packages.
Or, implement a new CopyWithRoot()
func that does the same as here, then reimplement Copy()
in terms of that.
internal/gps/source_test.go
Outdated
|
||
// Run test twice so that we cover both the existing and non-existing case; | ||
// only difference in results is the initial setup state. | ||
t.Run("empty", do(sourceIsSetUp|sourceExistsUpstream|sourceHasLatestVersionList)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't fully grokked changes to tests here yet, but i do look at the loss of these and worry that we're losing some of the meager set of state-changing cases that we do manage to cover now. no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at this, it is still valuable to cover existing and non-existing cases (though the states will be the same). I'll restore that. Is there something else that is lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think so, that's all that i was introducing with the second pass.
Rebased and ready for review again. |
Rebased and infinitely queued on OSX build. |
gps: source coord: set-up sources before returning gateways gps: source cache: improve PackageTree ProjectRoot handling
Green again. |
strcount = strcount + len(poe.P.Imports) + len(poe.P.TestImports) | ||
} | ||
pool := make([]string, strcount) | ||
|
||
for path, poe := range t.Packages { | ||
for path, poe := range p { | ||
var poe2 PackageOrErr | ||
|
||
if poe.Err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are drop all this? It's necessary to safely copy the various possible Err types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was broken. The new TestPackageTreeCopy
fails with the old logic - a build.NoGoError
's Dir
field is not copied.
Why do we need to copy them? Wouldn't it be a mistake to mutate an error anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could call .Error()
and make a new error from the string. This is how the cache persists them.
gps/source.go
Outdated
url = toFold(url) | ||
src, st, err := m.try(ctx, sc.cachedir, sc.supervisor) | ||
if err == nil { | ||
srcGate = newSourceGateway(st, src, sc.supervisor, sc.cachedir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer defer trying the source. Instead we try up front, and the gateway gets a single fixed source.
There is more work while holding sc.srcmut
, but I haven't noticed a slowdown.
@@ -104,7 +104,7 @@ type pathDeducer interface { | |||
// So the return of the above string would be | |||
// "github.com/some-user/some-package" | |||
deduceRoot(string) (string, error) | |||
deduceSource(string, *url.URL) (maybeSource, error) | |||
deduceSource(string, *url.URL) (maybeSources, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using concrete maybeSources
everywhere was more significant in a previous iteration, but is still an overall simplifying change - barring some test boilerplate.
@@ -21,34 +22,50 @@ import ( | |||
type sourceState int32 | |||
|
|||
const ( | |||
sourceIsSetUp sourceState = 1 << iota | |||
sourceExistsUpstream | |||
// sourceExistsUpstream means the chosen source was verified upstream, during this execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourceIsSetUp
goes away, since it's implied by the existence of the gateway.
// treat it as a new not-yet-cloned repo. | ||
|
||
// TODO(marwan-at-work): warn/give progress of the above comment. | ||
os.RemoveAll(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was essentially inlined into each maybeSource
, which allows them to directly create their respective vcs.Repo
s, rather than switching on the type, at the cost of bit of duplication. IMHO it is clearer, especially when trying to unravel the chain of similarly named types we create.
panic(fmt.Sprintf("Unrecognized format: %v", s)) | ||
} | ||
// ensureCleaner is an optional extension of ctxRepo. | ||
type ensureCleaner interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generalized from git specific, and used in baseVCSSource.updateLocal()
@@ -52,73 +48,6 @@ func TestErrs(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestNewCtxRepoHappyPath(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps these tests should be revived and adapted to the maybeSource
s.
I wasn't very clear in the initial post, but I said it better in the changelog:
Basically, by adding the persistent cache, what was previously an optimization by going to the network sooner might undermine the cache entirely in situations where it's no longer necessary to eventually go to the network. |
@sdboyer Updated with a version of the changes discussed last night. Restoring those ctxRepo tests adapted for a maybeSource was going to be messy with the extra work that the |
Here is a clean preview of the next step: https://github.com/jmank88/dep/compare/source_opt...jmank88:opt_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Onward and upward!
What does this do / why do we need it?
This PR contains a set of changes primarily motivated by making sourceGateway state transitions more fine grained and special cases explicit, so that the persistent cache can be fully leveraged when integrated. From the changelog:
What should your reviewer look out for in this PR?
Should a note be added to theDid this.CHANGELOG
?Which issue(s) does this PR fix?
Toward #431
Fixes #415