-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-1.0: Backports and updates for table lease leak #22563
Conversation
@vivekmenezes Please look at the last commit. |
@cockroachdb/build-prs I patched etcd directly in our |
Fork etcd and point Gopkg.toml at the fork. If the patch is expected to be
temporary, you can use the source field to avoid rewriting all the import
paths. Then rerun dep and commit the diff.
…On Sat, Feb 10, 2018 at 12:40 PM Ben Darnell ***@***.***> wrote:
@cockroachdb/build-prs
<https://github.com/orgs/cockroachdb/teams/build-prs> I patched etcd
directly in our vendored repo, which our linter is unhappy with. What
should I be doing instead?
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#22563 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA15IDq078jYYAHFv8wcpfcJtmTIoNtVks5tTdRvgaJpZM4SBAIW>
.
|
Review status: 0 of 4 files reviewed at latest revision, all discussions resolved, some commit checks failed. pkg/sql/lease.go, line 577 at r4 (raw file):
Creating How did you test this? Comments from Reviewable |
Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. pkg/sql/lease.go, line 577 at r4 (raw file): Previously, petermattis (Peter Mattis) wrote…
I haven't tested it yet (aside from running Moving this out of a defer is a good idea. I'll set a local variable here and move the rest of the code below The idea behind this change is that we're using a slightly non-standard refcounting pattern. Only the newest lease is allowed to increase its refcount (and as an optimization, the newest lease continues to exist even while its refcount is zero, instead of being destroyed and recreated); older leases are deleted when their refcounts reach zero. If we created a new lease, we've transitioned the previous one from "newest" to "not newest". If its refcount is already zero, no one else will be coming along to clean it up so we have to do it here. Comments from Reviewable |
Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. pkg/sql/lease.go, line 577 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Ok, I need to look at this in more detail later. For testing, you could manually replicate what I did in #20422 (comment). Comments from Reviewable |
fa0014a
to
acd96e2
Compare
Previously, log.outputLogEntry could panic while holding the log mutex. This would deadlock any goroutine that logged while recovering from the panic, which is approximately all of the recover routines. Most annoyingly, the crash reporter would deadlock, swallowing the cause of the panic. Avoid panicking while holding the log mutex and use l.exit instead, which exists for this very purpose. In the process, enforce the invariant that l.mu is held when l.exit is called. (The previous behavior was, in fact, incorrect, as l.flushAll should not be called without holding l.mu.) Also add a Tcl test to ensure this doesn't break in the future.
9fc6047
to
5c3d46c
Compare
OK, I've tested manually with the procedure from #20422 (comment) and verified that the number of leases holds steady ( I've also done the necessary glide (not dep in 1.0) magic to make the linter happy. |
Review status: 0 of 6 files reviewed at latest revision, all discussions resolved, some commit checks failed. pkg/sql/lease.go, line 820 at r8 (raw file):
This will make Comments from Reviewable |
Review status: 0 of 6 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. pkg/sql/lease.go, line 820 at r8 (raw file): Previously, petermattis (Peter Mattis) wrote…
Maybe. But under normal conditions the number of leases will be small and the limit will rarely be reached (I could increase the limit to make this less likely). Do you think it's worth doing anything more clever on the 1.0 branch? Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/sql/lease.go
Outdated
if err := t.stopper.RunAsyncTask(ctx, func(ctx context.Context) { | ||
m.LeaseStore.Release(ctx, t.stopper, lease) | ||
}); err != nil { | ||
if err := t.stopper.RunLimitedAsyncTask(ctx, removeLeaseSem, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about changing this to use the RunLimitedAsyncTask() because it blocks while holding on to the lock over all leases for a table. I think 1.0 users are likely to use very few tables and so it's very likely there will be only a few of these async tasks created.
My worry is that blocking while holding that mutex could lead to deadlock.
A more conservative form of throttling would be to continue to use
RunAsyncTask and put a semaphore based throttle inside the task. We’d still
have a mess of goroutines, but the load on the KV layer would be the same.
…On Sat, Feb 10, 2018 at 11:12 PM Ben Darnell ***@***.***> wrote:
Review status: 0 of 6 files reviewed at latest revision, 1 unresolved
discussion, some commit checks failed.
------------------------------
*pkg/sql/lease.go, line 820 at r8
<https://beta.reviewable.io/reviews/cockroachdb/cockroach/22563#-L51cbqtDOWqznHLxPQZ:-L522VCi3tQJO9ybA0Mg:b-7gl931>
(raw file
<https://github.com/cockroachdb/cockroach/blob/5c3d46c0e9300dd1e6559917093d0e16898da943/pkg/sql/lease.go#L820>):*
*Previously, petermattis (Peter Mattis) wrote…*
This will make removeLease block holding exitingLease.mu. Seems like that
could be problematic.
Maybe. But under normal conditions the number of leases will be small and
the limit will rarely be reached (I could increase the limit to make this
less likely). Do you think it's worth doing anything more clever on the 1.0
branch?
------------------------------
*Comments from Reviewable
<https://beta.reviewable.io/reviews/cockroachdb/cockroach/22563>*
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#22563 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AF6f9xYHu5q4AotgUDkDhSpzdiF6954Mks5tTmjBgaJpZM4SBAIW>
.
|
Addresses cockroachdb#20451 for the release-1.0 branch
This prevents a leak (only present in 1.0) of these leases, which could accumulate into a huge amount of work when PurgeOldLeases is called. Fixes cockroachdb#20422
Newer branches have a more sophisticated solution for this (cockroachdb#20542)
7185a69
to
d9bcbef
Compare
Review status: 0 of 7 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful. pkg/sql/lease.go, line 820 at r8 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Ack, this looks safer. If you see any test failures you'd need to combine this with watching for watching the stopper done channel. Comments from Reviewable |
Four commits: