-
Notifications
You must be signed in to change notification settings - Fork 24
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use of big anonymous closures like this is dangerous. They're tricky to read, because you have to scan all the surrounding code to understand which things are free variables and which are created within. That leads to shadowing bugs like #187 quite quickly.
It also complicates tests and makes it difficult to follow logic while debugging.
I'd really encourage you to consider breaking those anonymous closures out, making them named functions or methods, so it's clearer what their inputs and outputs are. My experience with this was in debugging that error shadowing case, and it was only hard because of the use of closures and callbacks.
Ultimately, that's a style thing, not a correctness thing, so it's your call, of course. But I think correctness (or incorrectness!) would be easier to see.
deduce.go
Outdated
callMgr: cm, | ||
ctx: cm.getLifetimeContext(), | ||
rootxt: radix.New(), | ||
deducext: pathDeducerTrie(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a line for action: make(chan func())
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
deduce.go
Outdated
for { | ||
select { | ||
case <-dc.ctx.Done(): | ||
close(dc.action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This risks either a double close or a read of a nil
action in the next loop iteration. Do you need to return
in this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, totally, thank you
deduce.go
Outdated
@@ -690,6 +695,253 @@ func (sm *SourceMgr) deduceFromPath(path string) (deductionFuture, error) { | |||
}, nil | |||
} | |||
|
|||
type deductionCoordinator struct { | |||
ctx context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context
package warns against using contexts this way: "Do not store Contexts inside a struct type; instead, pass a Context explicitly to each function that needs it."
I think it's good advice. Do you have a reason you need the context to be a struct field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really follow most of that discussion, but at the very least, this is unusual, which in the Go community is a strong knock against it :)
It appears that the stored context is only really used in deduceRootPath
to check whether the coordinator has already been terminated. The fact that it's done with a stored context (but other times a context is passed in to the function call - why? what's the difference?) will make it harder to see how you go about canceling calls, I think. I could be wrong! There might be more inherent complexity here than I see, so maybe that difficulty is going to be here no matter what.
But I agree with Peter that I've regretted doing things this way every time. Letting the context flow down through method invocations has never confused me in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so, I realize I actually conflated two things. THAT stored context doesn't necessarily need to be there; it could be grabbed from the callManager
on the fly in the actor loop. But that doesn't really change much, as there's still a context on the callManager
struct.
The basic issue here comes down to there being two different possible origins for cancellation of any given SourceManager
method call:
- A cancellation of the context passed directly into the method - the standard Go way of doing it
- A cancellation from the long-lived context (e.g., a signal is sent to a
dep
process, which then needs to tell thesourceMgr
to shut down via a context that's injected in theNewSourceManager()
call), which must propagate to all open calls
There are different possible approaches to this, but that's the essential reality, and dealing with it entails combining cancelation paths somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think vanilla contexts handle that pretty well with just one tree of contexts, I don't see how you need to combine cancelation paths. I imagine it looking something like this:
func main() {
ctx, cancel := context.WithCancel(context.Background())
go cancelOnSignal(cancel) // yknow, set up a loop on a chan signal.Signal, blah blah
sm := NewSourceManager(ctx, [...])
sm.doStuff(ctx)
}
If the root context is canceled because of a signal, that gets propagated down to all the children, which will be all the open calls, like you want.
For the second case, if you want a 'locally' cancelable function, you call context.WithCancel
again. The newly generated context will cancel if the old papa context made in main
gets canceled, or if the new local cancel
method is called. But the local cancel method won't close any of the context's ancestors. This seems to be what you want, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The little code block is fine for that set of calls - but in a dep
run that includes a solve the vast majority of SourceManager
calls are coming from the solver, not dep
itself. And yes, we could thread a context through that as well (in fact, doing so might be quite useful to establish running time limits on the algorithm), and that could be subctx'd down into the calls the solver makes to the SourceManager
.
But while that's a fine guarantee for an implementation to provide, within gps, we have no idea if the caller has wired the long-lived context together with the ephemeral call contexts. To ensure correctness, we still have to internally combine cancelation.
For the second case, if you want a 'locally' cancelable function, you call context.WithCancel again. The newly generated context will cancel if the old papa context made in main gets canceled, or if the new local cancel method is called. But the local cancel method won't close any of the context's ancestors. This seems to be what you want, right?
Yes, and this is mostly the counterargument that Peter made on reddit. (I also did up what you're describing in some pseudocode).
I'm not saying it's not possible to do that way. I'm saying it's fundamentally equivalent to the constext approach. And, while constext is a new thing to learn, it's also a reasonably well-defined formalism. Given that we have to solve this problem, I'm interested in exploring the formalism, rather than just following the guideline...for now 😄 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could thread a context through that as well
Yep! That's exactly what I'd expect to see naively.
But while that's a fine guarantee for an implementation to provide, within gps, we have no idea if the caller has wired the long-lived context together with the ephemeral call contexts. To ensure correctness, we still have to internally combine cancelation.
I don't mean to be dense, but I'm still not totally getting this. If the calls are coming from the solver, not dep
itself, then we seem to be sure of the provenance of the contexts, right? We know they came as children made by the solver off of the original one from dep
(or some other implementation using the solver).
Are you concerned that there are multiple entrypoints to this code, and you want to be sure the caller is using the same context for them? If so, I think the right approach is to shrink the entrypoints. Do you have a concrete case here? That might help me understand.
Yes, and this is mostly the counterargument that Peter made on reddit. (I also did up what you're describing in some pseudocode).
Hm, this pseudocode isn't what I was thinking of. There's no need for quitchan there - you'd just call cancel()
at the end of the function directly. Each spawned goroutine is checking whether the context is closed before (or while) it does significant work like network calls. If you want to only exit once all spawned goroutines have exited, you keep track of that with sync.WaitGroup
.
I agree that both approaches eventually work, of course. No question there! This is more about how easy those approaches are to keep up and maintain. I just think it's simpler to think about the one flow of contexts than this idea of merging two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rogpeppe has also responded on reddit, with a hopefully-clearer explanation :)
Are you concerned that there are multiple entrypoints to this code, and you want to be sure the caller is using the same context for them? If so, I think the right approach is to shrink the entrypoints.
Yes. I don't see how it would be possible to shrink the entrypoints - there's the creation of the sourceManager
, and there's calling methods on it - the context for the former cancels ALL running activities, and the context for the latter cancels only the current method.
These contexts are probably from the same tree, but we cannot know if they are from within the sourceMgr
, so we have to take precautions.
There's no need for quitchan there - you'd just call
cancel()
at the end of the function directly.
Without the quitchan, we don't know to cancel this running call if the long-lived context was canceled. We'd have to make the assumption that the two contexts are part of the same tree.
If you want to only exit once all spawned goroutines have exited, you keep track of that with
sync.WaitGroup
.
Yep, but this is entirely separate.
I just think it's simpler to think about the one flow of contexts than this idea of merging two.
I agree, if we could make that assumption, it would be the simpler way to do it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! That makes a lot more sense. The root cause here is that you don't trust the caller to responsibly use the right context
value.
So, the design tradeoff you're making is that you can be more confident that this code will work for callers who do weird things with the package (and using different contexts would be weird), but you are sacrificing simplicity to gain that confidence.
I don't think I agree that that's the right design decision here since gps isn't ever going to have, like, tons of direct users. It's likely that dep
is the only user for quite a while. I think clear documentation on the source manager's constructor (use the same context, or children of the original context, whenever you call its methods) would be sufficient.
But! That's just my feeling. I think we've actually gotten to the core of things now, and we have the tradeoff much more clearly on the table, which is good enough for me. 👍
deducers.go
Outdated
|
||
// The rest of the work needs its own goroutine, the results of which will | ||
// be re-joined to this call via the return chans. | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider making these anonymous closures into named functions or methods. That way, it's clear what the inputs and outputs are.
In this case, I think it's something like
func (sc *sourceCoordinator) computeSourceGateway(ctx context.Context, name string, rc srcReturnChans)
and the invocation becomes go sc.computeSourceGateway(ctx, normalizedName, rc)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Yeah, I've been leaning in that direction; I went heavy pseudo-actor model initially, and am slowly walking it back. The general issue I have with it is distinguishing between things that should be called by the wrapper object, vs those that should not. That's a smaller design consideration, though - like I said, I'm walking it back :)
This one isn't shouldn't be a problem to make its own method for, but the other one (in deductionCoordinator.deduce()
is absolutely not threadsafe unless it's being serialized through that channel.
deducers.go
Outdated
sc.srcmut.RLock() | ||
srcGate, has := sc.srcs[normalizedName] | ||
sc.srcmut.RUnlock() | ||
if has { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code might be quite a bit simpler if it used sync.Cond
to rendezvous the goroutines instead of channels. As it is, there are lots of opportunities for races in this code. For example, here it looks to me like we're unlocking too early - this should be
sc.srcmut.RLock()
srcGate, has := sc.srcs[normalizedName]
if has {
sc.srcmut.RUnlock()
doReturn(srcGate, nil)
return
}
sc.srcmut.RUnlock()
but of course this is all a sad state to be in - we've wasted time running deduceRootPath
twice. Instead of guarding the map with a lock, I think we'd be better off guarding the function invocation. sync.Cond
is the simplest way to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...well, while this is certainly my first, untested pass at the code, I'm not seeing the opportunity for races here. And I don't see how it would be possible for deduceRootPath
to be run multiple times for the same path.
The big reason there's so much staged locking/unlocking here is because we have to keep both the deduceRootPath()
AND sourceURL()
calls on a non-blocking path, as both may involve I/O, especially the latter.
That said, I'd totally forgotten that sync.Cond
even exists. It may well be a much better way of coordinating these fold-in behaviors. I'll have a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it would be possible for
deduceRootPath
to be run multiple times for the same path.
Suppose getSourceGatewayFor
is called twice with the same arguments in two different goroutines, G1 and G2.
G1 does the L164 logic - it grabs the srcmut read lock, sees that sc.nameToURL doesn't have normalizedName, so it releases the lock. Now G2 is scheduled, and does exactly the same thing. They're both now waiting right in front of the go
call at L182.
G1 starts this goroutine - the one from L182 to L272. It grabs the psrcmut write lock, sees that there is no entry in p.protoSrcs
, so it creates one. It continues doing work, including running deduceRootPath
, all the way to running doReturn
. Again, it grabs the write lock, sends its output on the right channel, and then calls delete(sc.protoSrcs, normalizedName)
. It releases the lock and exits.
Now G2 is scheduled. It starts at the top of the big goroutine. It grabs the psrcmut lock. It checks if sc.protoSrcs
has normalized name. It doesn't, because G1 deleted its entry. So, G2 continues - it will run deduceRootPath again.
So, there we are. Certainly not a very common scheduling, but definitely possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, yes, that's possible - but that's the one I'm accounting for with this additional medial check we're commenting on here 😄
In that case, we do run deduceRootPath()
twice, but the results will be coming back from a trie - no regex, and definitely no HTTP call. So we're not paying a significant cost in cycles. And there's no unsynchronized data access or duplication of resources, as we return out early in the (fixed) version of this check we're commenting on.
(Also, the original code change you suggested is equivalent to what's there - I just hoisted the Unlock()
out of the conditional)
Are there other potential bad interleavings that you can see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the trie thing is good - I didn't realize that! But that kind of proves my point here, I think, which is that wowzers this is pretty tricky code. I think sync.Cond
could make it simpler. I'll scan for more bad schedulings, but no promises :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No argument here! I'd definitely like to see this decomposed into more comprehensible pieces. IMO it's already less complected than it was in the old system (where root deduction and source creation were inextricably bound up in those stacks of futures), but there's still a ways to go.
deducers.go
Outdated
// scheduled. Guard against that by checking the main sources map again | ||
// and bailing out if we find an entry. | ||
sc.srcmut.RLock() | ||
srcGate, has := sc.srcs[normalizedName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're checking here for normalizedName
, but later we set based on url
. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, thank you - I initially wrote this medial check before introducing the nameToURL
map, and forgot to update it when I changed the check at the top.
deducers.go
Outdated
doReturn(sa, nil) | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This anonymous closure runs to several pages long in my editor, which makes it tough to keep track of everything. Could it be broken up into smaller function calls that have clear inputs/outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure we can get it there :) Per the above response, once I've weaned off of the closure in general towards a method, splitting it up into parts would be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for being receptive to review! 😀
deducers.go
Outdated
sc.srcmut.RLock() | ||
srcGate, has := sc.srcs[normalizedName] | ||
sc.srcmut.RUnlock() | ||
if has { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it would be possible for
deduceRootPath
to be run multiple times for the same path.
Suppose getSourceGatewayFor
is called twice with the same arguments in two different goroutines, G1 and G2.
G1 does the L164 logic - it grabs the srcmut read lock, sees that sc.nameToURL doesn't have normalizedName, so it releases the lock. Now G2 is scheduled, and does exactly the same thing. They're both now waiting right in front of the go
call at L182.
G1 starts this goroutine - the one from L182 to L272. It grabs the psrcmut write lock, sees that there is no entry in p.protoSrcs
, so it creates one. It continues doing work, including running deduceRootPath
, all the way to running doReturn
. Again, it grabs the write lock, sends its output on the right channel, and then calls delete(sc.protoSrcs, normalizedName)
. It releases the lock and exits.
Now G2 is scheduled. It starts at the top of the big goroutine. It grabs the psrcmut lock. It checks if sc.protoSrcs
has normalized name. It doesn't, because G1 deleted its entry. So, G2 continues - it will run deduceRootPath again.
So, there we are. Certainly not a very common scheduling, but definitely possible.
Codecov Report
@@ Coverage Diff @@
## master #196 +/- ##
==========================================
+ Coverage 77.17% 78.22% +1.04%
==========================================
Files 27 28 +1
Lines 3987 4018 +31
==========================================
+ Hits 3077 3143 +66
+ Misses 692 655 -37
- Partials 218 220 +2
Continue to review full report at Codecov.
|
#202 is quite related to this, but I'm gonna leave it out of this PR to kinda sorta avoid scope creep just a teensy bit. |
Noting this here because I'm not sure if it'll remain an issue or not, so I don't want to make it a separate issue until this is merged. But, this PR as currently written has a problematic case. (The problem exists on Say there's a The role of the There's two definitions we could use for "a source exists": it exists upstream, or it exists in the cache. The latter, being local, is faster to check, but potentially wrong about the state of the world - even if the source exists in local cache, the upstream could have moved, been deleted, become inaccessible due to expired credentials, etc. For now, I think I'm going to just force the remote check to always happen. However, it seems likely that we may need to revisit this, and rely only on the local check initially. If/when we make that shift, we'll need to add a mechanism by which to reach back into the |
Very much a WIP.
Also change maybeSource.try() signatures to take an injected cache, as well as a context. ProjectAnalyzer was also removed as a struct field on baseVCSSource.
There are still a number of test failures, including some pointing to problems in the new implementation.
This was really always the intended model - there's no reason a SourceManager needs to be permanently coupled with just one analyzer. It's perfectly sufficient to provide one as an argument to the relevant methods. Fixes #195.
* Move baseVCSSource into vcs_source.go * Rename source_test.go to vcs_source_test.go * Remove sourceExistence type and remaining uses of it
This test isn't exhaustive, but it covers the basics.
Using actual signals was far too flaky in tests.
This more accurately reflects how we were actually using the method for some time, as well as what we'd more recently modified the method to do.
As-is, this is very likely to cause a perceived performance regression. It will end up serializing calls from the solver that previously had been parallelized, as it reverses the change made in 4815cd5. The fundamental problem is, we can't avoid that regression while maintaining the invariant I described as valuable earlier. However, this PR has been open for far too long, and diverged far too much as-is. Because it's accomplished its main purpose, I'm going to merge it now, and then we can revise and modify more incrementally. |
[WIP] Refactor sourceMgr
This PR refactors the sourceMgr in several areas:
ProjectAnalyzer
a method parameter - fixes Make ProjectAnalyzer an argument to SourceManager.GetManifestAndLock() #195The changes here also pave the way for persistent caching - #130.
Smaller TODOs:
update()
tofetch()
in our vcs repo impl