Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: instrument RaftTransport workers with pprof labels #85909

Merged
merged 1 commit into from
Aug 12, 2022

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Aug 10, 2022

The unused arguments in the method signature were used to identify goroutines
in traces. This no longer works after Go 1.17 started passing arguments via
registers.

This commit adds pprof labels when starting these goroutines, to have a cleaner
code, more readable traces, and to work around the new Go convention.

Release note: None

@pav-kv pav-kv requested a review from a team as a code owner August 10, 2022 17:37
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv requested a review from tbg August 10, 2022 17:38
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these parameters are there on purpose, because they allow us to identify which node/class the goroutine belongs to in goroutine dumps. This is very useful during debugging, if something has stalled somewhere, since there can be lots of these goroutines running and we need to know which is which.

Might be worth a comment about this.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 10, 2022

@erikgrinaker Good to know that, I'll add a comment.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Aug 10, 2022

That said, I believe the recent changes to Go function call conventions in 1.17 may have broken this. I forget what the current state is, maybe you remember @tbg? We should really be using pprof labels instead, if possible.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Aug 10, 2022

Yeah, there's something about it here:

// sendWithoutRangeID used to be called sendWithRangeID, accepted a `_forStacks
// roachpb.RangeID` argument, and had the description below. Ever since Go
// switched to the register-based calling convention though, this stopped
// working, giving essentially random numbers in the goroutine dumps that were
// misleading. It has thus been "disarmed" until Go produces useful values
// again.
//
// See (internal): https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1641478596004700
//
// sendWithRangeID takes an unused rangeID argument so that the range
// ID will be accessible in stack traces (both in panics and when
// sampling goroutines from a live server). This line is subject to
// the whims of the compiler and it can be difficult to find the right
// value, but as of this writing the following example shows a stack
// while processing range 21 (0x15) (the first occurrence of that
// number is the rangeID argument, the second is within the encoded
// BatchRequest, although we don't want to rely on that occurring
// within the portion printed in the stack trace):
//
// github.com/cockroachdb/cockroach/pkg/storage.(*Replica).sendWithRangeID(0xc420d1a000, 0x64bfb80, 0xc421564b10, 0x15, 0x153fd4634aeb0193, 0x0, 0x100000001, 0x1, 0x15, 0x0, ...)

FWIW, we often call these _forStacks in function definitions to mark them as such:

func (r *Replica) rangeFeedWithRangeID(
_forStacks roachpb.RangeID, args *roachpb.RangeFeedRequest, stream rangefeed.Stream,
) *roachpb.Error {

@tbg
Copy link
Member

tbg commented Aug 11, 2022

The unused nodeID parameter was introduced in https://github.com/cockroachdb/cockroach/pull/8529/files#diff-1fa05e654163c64041e809fec538526cedc6723e47797f1f4e5df8993a1e60a9R247 and I don't think the intention was to be able to pull the nodeID from a stack trace. I also don't think the connection class parameter has this purpose, or we wouldn't have tacked it on at the end.

As Erik pointed out, the "correct" way to do this now is pprof annotations, which were added to the goroutine profile output recently!

If you don't mind, @pavelkalinnikov, add that here. Basically, we want to wrap this call

if err := t.processQueue(toNodeID, ch, stats, stream, class); err != nil {
log.Warningf(ctx, "while processing outgoing Raft queue to node %d: %s:", toNodeID, err)
}

in a pprof.Do:

		pprof.Do(ctx, pprof.Labels("remote_node_id", toNodeID.String()), func(ctx context.Context) {
			if err := t.processQueue(toNodeID, ch, stats, stream, class); err != nil {
				log.Warningf(ctx, "while processing outgoing Raft queue to node %d: %s:", toNodeID, err)
			}
		})

You can then ./dev build, run a (say) three-node cluster, and navigate to /debug/pprof/goroutine?debug=1 and look for a corresponding labels: line like this (but referencing your label).

image


For Replica.Send (i.e. fix up sendWithoutRangeID to use the same pattern) it's unfortunately harder due to the need for allocations while also wanting to reuse incoming label sets if there is one, which the API also makes difficult to discover in a low-alloc way (though not impossible). Filed #85948 for this.

@pav-kv pav-kv changed the title kvserver: remove unused method arguments kvserver: instrument RaftTransport workers with pprof labels Aug 12, 2022
@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 12, 2022

@tbg @erikgrinaker Done, PTAL.

@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 12, 2022

If you don't mind, @pavelkalinnikov, add that here. Basically, we want to wrap this call

@tbg I wrapped it a bit higher up the stack, in the place where the worker is launched.

The unused arguments in the method signature were used to identify goroutines
in traces. This no longer works after Go 1.17 started passing arguments via
registers.

This commit adds pprof labels when starting these goroutines, to have a cleaner
code, more readable traces, and to work around the new Go convention.

Release note: None
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

pkg/kv/kvserver/raft_transport.go Show resolved Hide resolved
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks!

@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 12, 2022

bors r=tbg,erikgrinaker

@craig craig bot merged commit ff3fc7e into cockroachdb:master Aug 12, 2022
@craig
Copy link
Contributor

craig bot commented Aug 12, 2022

Build succeeded:

@pav-kv pav-kv deleted the rm_unused_args branch August 12, 2022 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants