Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: memory budgeting #59188

Closed
tbg opened this issue Jan 20, 2021 · 9 comments
Closed

tracing: memory budgeting #59188

tbg opened this issue Jan 20, 2021 · 9 comments
Assignees
Labels
A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity X-stale

Comments

@tbg
Copy link
Member

tbg commented Jan 20, 2021

Is your feature request related to a problem? Please describe.
Verbose trace spans collect possibly large amounts of data in memory, which can present a stability risk. With always-on tracing, the same holds for the structured payloads (though with a small factor, since a lot less data is collected).

Describe the solution you'd like
Heuristically count the number of bytes held in a Span's own recording (i.e. not traversing into the children). New data enters via these methods:

// LogStructured adds a Structured payload to the Span. It will be added to the
// recording even if the Span is not verbose; however it will be discarded if
// the underlying Span has been optimized out (i.e. is a noop span).
//
// The caller must not mutate the item once LogStructured has been called.
func (s *Span) LogStructured(item Structured) {
if s.isNoop() {
return
}
s.crdb.logStructured(item)
}
// Record provides a way to record free-form text into verbose spans.
//
// TODO(irfansharif): We don't currently have redactability with trace
// recordings (both here, and using LogStructured above). We'll want to do this
// soon.
func (s *Span) Record(msg string) {
if !s.hasVerboseSink() {
return
}
if s.ot.shadowSpan != nil {
s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, msg))
}
if s.netTr != nil {
s.netTr.LazyPrintf("%s", msg)
}
s.crdb.record(msg)
}
// Recordf is like Record, but accepts a format specifier.
func (s *Span) Recordf(format string, args ...interface{}) {
if !s.hasVerboseSink() {
return
}
str := fmt.Sprintf(format, args...)
if s.ot.shadowSpan != nil {
s.ot.shadowSpan.LogFields(otlog.String(tracingpb.LogMessageField, str))
}
if s.netTr != nil {
s.netTr.LazyPrintf(format, args)
}
s.crdb.record(str)

As a set max size (per span) is exceeded or the root account runs out of quota, drop old messages every time a new message is added. Also set a flag on the span that indicates that the recording is incomplete, perhaps by retaining the most recent timestamp for which events were dropped. We should drop verbose messages first, then structured payloads.

Note that the opentracing and net/trace spans use a cardinality based limit:

netTr.SetMaxEvents(maxLogsPerSpan)

return lightStepManager{}, lightstep.NewTracer(lightstep.Options{
AccessToken: token,
MaxLogsPerSpan: maxLogsPerSpan,
MaxBufferedSpans: 10000,
UseGRPC: true,
})

I think for net/trace we may want to remove the limit and instead, upon first exceeding the memory budget for any given span, print a message to net/trace explaining that no more logs will be relayed there.

Describe alternatives you've considered

We could also try to attach a memory budget and use all of the infrastructure we already have in SQL, by attaching

// BoundAccount and BytesMonitor together form the mechanism by which
// allocations are tracked and constrained (e.g. memory allocations by the
// server on behalf of db clients). The primary motivation is to avoid common
// cases of memory or disk blow-ups due to user error or unoptimized queries; a
// secondary motivation in the longer term is to offer more detailed metrics to
// users to track and explain memory/disk usage.

to the Tracer

// Tracer is our own custom implementation of opentracing.Tracer. It supports:
//
// - forwarding events to x/net/trace instances
//
// - recording traces. Recording is started automatically for spans that have
// the verboseTracingBaggageKey baggage and can be started explicitly as well. Recorded
// events can be retrieved at any time.
//
// - lightstep traces. This is implemented by maintaining a "shadow" lightstep
// Span inside each of our spans.
//
// Even when tracing is disabled, we still use this Tracer (with x/net/trace and
// lightstep disabled) because of its recording capability (verbose tracing needs
// to work in all cases).
//
// Tracer is currently stateless so we could have a single instance; however,
// this won't be the case if the cluster settings move away from using global
// state.
type Tracer struct {

and use that to guide the decisions. This might actually be my favorite, assuming that the mon package is in good standing (I'm not familiar with it). Some notes:

  • this will likely require refactoring of the mon package, since the tracing package can not depend on util/log due to import cycles, which is fine.
  • we need to clear the recording in Span.Finish() and we need to prevent new recording data from entering a Finish'ed span (for otherwise it will allocate from the monitor, but never release back).
  • this is more brittle in the sense that a failure to return bytes to the monitor correctly will cause problems. However, we can add assertions that are active through large swathes of our test suite which should give us good confidence that this is not happening.
  • we need to determine where to take the bytes from. We could hang it off the sql memory monitor, or use a separate, fixed-size monitor.

Additional context

We will want some kind of basic safeguard in place for 21.1, even if it just extends the maxLogsPerSpan mechanism to also apply to payloads.

Jira issue: CRDB-3323

@tbg tbg added A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Jan 20, 2021
@irfansharif irfansharif self-assigned this Feb 24, 2021
@tbg
Copy link
Member Author

tbg commented Mar 2, 2021

@irfansharif I just saw that we merged #60678 without fixing the large allocation for the payload ring buffer. Is your follow-up coming out soon? We're in the stability period and performance work is happening (see #61328). I don't think that particular allocation is the root cause (if there even is a single root cause, sigh) but it's certainly in the hot path and shouldn't stay there for long. This is next on your list, right?

@irfansharif
Copy link
Contributor

yup, just got a bit side-tracked with other stability issues on my plate.

@jordanlewis
Copy link
Member

I'm a little scared of attaching this work to SQL monitors. In particular, lifespan bugs in monitors are often hard to track down and debug. I think it could mean that tracing lifespan bugs would have a larger blast radius than they would otherwise.

I think the monitor package is probably suitable, it's just we'd need to be quite careful about how we use it.

irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 3, 2021
Touches cockroachdb#59188. Follow-on work from cockroachdb#60678.

Release justification: low risk, high benefit changes to existing
functionality

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 3, 2021
Touches cockroachdb#59188. We can introduce byte-limits for verbose logging
messages in a similar manner to what we've done for structured events.

This commit also:
- adds a _dropped tag to recordings with dropped logs/structured events.
- squashes a bug where reset spans (as used in SessionTracing) still
  held onto earlier structured events
- moves away from the internal usage of the opentracing.LogRecord type,
  it's unnecessary

Release justification: low risk, high benefit changes to existing
functionality

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 3, 2021
Touches cockroachdb#59188. We can introduce byte-limits for verbose logging
messages in a similar manner to what we've done for structured events.

This commit also:
- adds a _dropped tag to recordings with dropped logs/structured events.
- squashes a bug where reset spans (as used in SessionTracing) still
  held onto earlier structured events
- moves away from the internal usage of the opentracing.LogRecord type,
  it's unnecessary

Release justification: low risk, high benefit changes to existing
functionality

Release note: None
@tbg
Copy link
Member Author

tbg commented Mar 4, 2021

As of #61359, each Span individually limits the bytes in payloads and the bytes in verbose messages (both use a FIFO ring buffer).

We should apply something similar to these two slices:

children []*crdbSpan
// remoteSpan contains the list of remote child span recordings that
// were manually imported.
remoteSpans []tracingpb.RecordedSpan

As we can easily see unbounded growth in them, see for example #59560.

tbg added a commit to tbg/cockroach that referenced this issue Mar 4, 2021
This caps the size of the span registry at 5k, evicting "random" (map
order) existing entries when at the limit.
The purpose of this is to serve as a guardrail against leaked spans,
which would otherwise lead to unbounded memory growth.

Touches cockroachdb#59188.

Release justification: low risk, high benefit changes to existing
functionality
Release note: None
craig bot pushed a commit that referenced this issue Mar 4, 2021
59288: rfc: declarative, state-based schema changer r=lucy-zhang a=lucy-zhang

https://github.com/lucy-zhang/cockroach/blob/schema-changer-rfc/docs/RFCS/20210115_new_schema_changer.md

Release note: none

61459: tracing: cap registry size at 5k r=irfansharif a=tbg

This caps the size of the span registry at 5k, evicting "random" (map
order) existing entries when at the limit.
The purpose of this is to serve as a guardrail against leaked spans,
which would otherwise lead to unbounded memory growth.

Touches #59188.

Release justification: low risk, high benefit changes to existing
functionality
Release note: None


61471: server: disable flaky check in TestDecommissionedNodeCannotConnect r=tbg a=erikgrinaker

In #61356, `TestDecommissionedNodeCannotConnect` was extended to check
error propagation for a `Scan` operation after a node was decommissioned.
However, because of #61470 not all failure modes propagate errors
properly, causing the test to sometimes hang. This patch disables this
check until these problems are addressed.

Resolves #61452.

Release justification: non-production code changes
Release note: None

Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 5, 2021
Touches cockroachdb#59188. Follow-on work from cockroachdb#60678.

Release justification: low risk, high benefit changes to existing
functionality

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 5, 2021
Touches cockroachdb#59188. We can introduce byte-limits for verbose logging
messages in a similar manner to what we've done for structured events.

This commit also:
- adds a _dropped tag to recordings with dropped logs/structured events.
- squashes a bug where reset spans (as used in SessionTracing) still
  held onto earlier structured events
- moves away from the internal usage of the opentracing.LogRecord type,
  it's unnecessary

Release justification: low risk, high benefit changes to existing
functionality

Release note: None
craig bot pushed a commit that referenced this issue Mar 5, 2021
61359: tracing: use byte-limits for logs/structured events per span r=irfansharif a=irfansharif

Touches #59188. Follow-on work from #60678. We can introduce byte-limits for
verbose logging and structured events, instead of limiting things based on
count.

This PR also:
- adds a _dropped tag to recordings with dropped logs/structured events.
- squashes a bug where reset spans (as used in SessionTracing) still
  held onto earlier structured events
- moves away from the internal usage of the opentracing.LogRecord type,
  it's unnecessary

Release justification: low risk, high benefit changes to existing
functionality

Release note: None

---

+cc @knz / @erikgrinaker / @angelapwen for pod-visibility.

61482: jobs: add job metrics per-type to track success, failure, and cancel r=fqazi a=fqazi

Fixes: #59711

Previously, there were only over all counters tracking how many
jobs were completed, cancelled, or failed. This was inadequate
because it didn't make it easy to tell in aggregate what job
types they were. To address this, this patch will add counters
for different job types for tracking success, failure, and
cancellation.

Release justification: Low risk change only adding a metric inside
the crdb_internal.feature_usage table
Release note: None

61491: sqlsmith: add support for computed columns r=RaduBerinde a=RaduBerinde

This changes the random table generator to also create computed
columns (either STORED or VIRTUAL). Some example of definitions:
 - `col1_14 STRING NOT NULL AS (lower(CAST(col1_8 AS STRING))) VIRTUAL`
 - `col2_6 DECIMAL NOT NULL AS (col2_2 + 1:::DECIMAL) STORED`
 - `col1_13 INT4 AS (col1_0 + col1_10) STORED`

Release justification: non-production code change.

Release note: None

Informs #57608.

61509: sql: add a regression test r=RaduBerinde a=RaduBerinde

This commit adds a regression test for #58104 (the problem was already
fixed).

Resolves #58104.

Release justification: non-production code change.

Release note: None

61522: opt: fix fetch scope in UPDATE..FROM statements r=mgartner a=mgartner

Previously, the fetch scope incorrectly included columns in the FROM
clause of an UPDATE..FROM statement. As a result, column names shared by
the FROM clause and the mutating table lead to ambiguity when resolving
partial index DEL column expressions. This commit ensures that the fetch
scope does not include columns in the FROM clause.

Fixes #61284

Release justification: This is a low-risk bug fix to existing
functionality.

Release note (bug fix): An UPDATE..FROM statement where the FROM clause
contained column names that match table column names erred if the table
had a partial index predicate referencing those columns. This bug,
present since partial indexes were released in version 20.2.0, has been
fixed.

61553: ccl,server: error.Wrap on previously handled errors r=[dt,miretskiy] a=stevendanna

These errors.Wrap calls are wrapping errors that are nil and thus will
always return a nil error.

Release justification: Minor error handling fixes
Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 9, 2021
In cockroachdb#59560 we've seen that this can grow unboundedly. Set a limit out of
an abundance of caution. Fixes cockroachdb#59188.

Release note: None
@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
@irfansharif
Copy link
Contributor

@abarganier the last PR I had for memory budgeting was #61714, that I'm not finding the time to get to. Mind if I throw the ball over to you? I don't think this is very pressing, that PR was mostly out of an abundance of caution. But still, would be nice to have as a safeguard.

@abarganier
Copy link
Contributor

@irfansharif SGTM, I think this would fit nicely into the tracing library handoff. Just to clarify, you closed the PR because you don't have the time, not because the solution is invalid, correct? I'd be happy to work on driving it home.

@irfansharif
Copy link
Contributor

Yup, just don't have the bandwidth for the small bit of polish+shepherding needed. That PR is pretty tiny though, and I'm happy to rubber stamp another attempt at it.

@abarganier
Copy link
Contributor

Cool sounds good, I'll get this scheduled on our end and tag you as a reviewer. I've updated the owning team accordingly. It might have to wait until a bit later in the month given other work on our plates, but I will keep you updated.

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tracing Relating to tracing in CockroachDB. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity X-stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants