-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: enable always-on tracing by default #58897
tracing: enable always-on tracing by default #58897
Conversation
01de71b
to
4a5c52f
Compare
Wow, so much red in CI! Curious what this is all about. |
Probably the span leak we found in #58902. I'm not sure what to do about it yet. |
How can it be that? We're not checking for span leaks in this PR. |
Was a stab in the dark, but I was thinking it could be due to some memory blow up. I'll take a closer look in the morning. |
Looks like it was something else, but I'm not really sure what to do about it. With always on tracing, even spans with "nothing in them" generaterecordings. Cause it's always on, we'll at least generate the portion of the trace with the operation name.
With the legacy tracing mode, cause we weren't explicitly recording, we used to return cockroach/pkg/util/tracing/crdbspan.go Lines 136 to 143 in ff1442b
Now that this is not nil, we'll attach it to the our processor's cockroach/pkg/sql/execinfra/processorsbase.go Lines 710 to 712 in ff1442b
Most of the failing tests aren't expecting this: cockroach/pkg/sql/flowinfra/server_test.go Lines 109 to 113 in fb03099
I'm not sure what to do here. At first I figured if it's just tests, we could ignore the metas with trace data. Using something like: // ignoreTraceData takes a slice of metadata and returns the entries
// excluding the ones with trace data.
func ignoreTraceData(metas []execinfrapb.ProducerMetadata) []execinfrapb.ProducerMetadata {
res := make([]execinfrapb.ProducerMetadata, 0)
for _, m := range metas {
if m.TraceData == nil {
res = append(res, m)
}
}
return res
} But I'm not sure that's right, and also it doesn't help cockroach/pkg/sql/execinfra/processorsbase.go Lines 627 to 632 in fb03099
I'm don't know what the right patch here would be. Should we not include the trace data in trailing meta if there are no remote+children spans (aka behave like earlier)? Should we change our handling of |
I don't see that we need to do anything special at this moment (other than silencing, in tests, these TraceData meta as you suggested).
The contract of |
...in the import processor. The contract of `RowSource.Next` requires that at most one of the return values will be non-empty. This wasn't the case here, and caused opaque failures in cockroachdb#58897. cockroachdb#58897 tries to enable background tracing by default, which for us means that the trailing meta can be non-empty (they'll contain span recordings). That behaviour ends up tickling this bug, tripping up TestCSVImportCanBeResumed. Release note: None
52b10bf
to
c76be2a
Compare
This is follow-up work from cockroachdb#58712, where we measured the overhead for always-on tracing and found it to be minimal/acceptable. Lets switch this on by default to shake the implications of doing so. We can reasonably expect two kinds of fallout: (1) Unexpected blow up in memory usage due to resource leakage (which is a can be problem now that we're always maintaining open spans in an internal registry, see cockroachdb#58721) (2) Performance degradataion due to tracing overhead per-request (something cockroachdb#58712) was spot checking for. For (1) we'll introduce a future test in a separate PR. For (2), we'll monitor roachperf over the next few weeks. --- Also moved some of the documentation for the cluster setting into a comment form above. Looking at what's rendered in our other cluster settings (`SHOW ALL CLUSTER SETTINGS`), we default to a very pithy, unwrapped description. Release note: None
c76be2a
to
17ea476
Compare
58995: util/log: stop syncing writes excessively r=itsbilal a=knz Requested by @bdarnell Fixes #58025 Release note (cli change): Previously, for certain log files CockroachDB would both flush individual writes (i.e. propagate them from within the `cockroach` process to the OS) and also synchronize writes (i.e. ask the OS to confirm the log data was written to disk). The per-write synchronization part was unnecessary and, in fact, found to be possibly detrimental to performance and operating cost, so it was removed. Meanwhile, the log data continues to be flushed as previously, and CockroachDB also periodically (every 30s) request synchronization, also as previously. Release note (cli change): The parameter `sync-writes` for file sink configurations has been removed. (This is not a backward-incompatible change because the configuration feature is new in v21.1.) Release note (cli change): The parameter `buffered-writes` for file sink configurations has been added. It is set to `true` (writes are buffered) by default; and set to `false` (i.e. avoid buffering and flush every log entry) when the `auditable` flag is requested. 59115: importccl: account for the RowSource return constraint r=irfansharif a=irfansharif ...in the import processor. The contract of `RowSource.Next` requires that at most one of the return values will be non-empty. This wasn't the case here, and caused opaque failures in #58897. #58897 tries to enable background tracing by default, which for us means that the trailing meta can be non-empty (they'll contain span recordings). That behaviour ends up tickling this bug, tripping up TestCSVImportCanBeResumed. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
bors r+ |
Build failed: |
bors r+ |
Build succeeded: |
59132: sql: introduce sql.statement_stats.sample_rate to sample execution stats r=RaduBerinde,dhartunian a=asubiotto Depends on #58897 Depends on #59103 This PR puts the "always-on" into always-on EXPLAIN ANALYZE. Take a look at separate commits for details. What actually goes on is that we're taking the slightly safer route of introducing a cluster setting which defines a sample rate for execution stats. These execution stats are collected and propagated using background tracing, which is cheaper than verbose tracing. This allows us to power the new DB Console statement stats views. Currently we still need user input in order to turn up this sample rate. The sample rate is 0 by default in this PR for safety reasons. I'd like to discuss the default value of this cluster setting or whether we need it at all separately before the 21.1 release, but this gives us a nice escape hatch if for whatever reason stats collection results in poor performance. Closes #54556 59536: colexec,bazel: pin the `types` dependency in generated files r=irfansharif a=irfansharif This is a workaround for bazel auto-generated code. goimports does not automatically pick up the right packages when run within the bazel sandbox, so we have to pin it by hand. Release note: None Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
This is follow-up work from #58712, where we measured the overhead for
always-on tracing and found it to be minimal/acceptable. Lets switch
this on by default to shake the implications of doing so. We can
reasonably expect two kinds of fallout:
Unexpected blow up in memory usage due to resource leakage (which is
a can be problem now that we're always maintaining open spans in an
internal registry, see *: spans that are never explicitly Finish()-ed #58721)
Performance degradation due to tracing overhead per-request
(something tracing,sql: introduce BenchmarkTracing #58712) was spot checking for.
For 1 we'll introduce a future test in a separate PR. For 2, we'll
monitor roachperf over the next few weeks.
Also moved some of the documentation for the cluster setting into a
comment form above. Looking at what's rendered in our other cluster
settings (
SHOW ALL CLUSTER SETTINGS
), we default to a very pity,unwrapped description.
Release note: None