Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/tracing: bug fixes and improvements #89785

Merged
merged 6 commits into from
Oct 19, 2022

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Oct 11, 2022

Please see individual commits.

Touches #87536
Epic: None

@andreimatei andreimatei requested review from abarganier and a team October 11, 2022 22:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Span.RecordStructured(item) was stating that the function takes
ownership of item, when in fact it effectively copies it. Correct it, so
that callers are free to use the item more liberally.

Release note: None
This patch fixes a bug around the returning of a span's recording.
Before this patch, the returned recording would alias the span's
internal copy of the recording. This was bad because the span's internal
copy is mutable: it changes when new spans are added to it. Besides
introducing the possibility of data races between accesses to the
returned recording and the internal recording, this aliasing can also
cause the returned recording to become inconsistent: a Trace's NumSpans
field can get out of sync with the children.

In practice, the trouble comes from scenarios like this:
1. when a span is finished, it calls openChildSp.getVerboseRecording()
   on its still-open children (if any)
2. the returned child's recording aliases the child's state
3. the returned child's recording is added to the finishing parent's
   trace, which is then added to the grand-parent. So now we have two
   open spans (the grand-parent and the grand-child) whose state aliases
   each other
4. being open, their state can change (when other child spans are
   finished). The grand-child's state changing can now cause the
   parent's state to become invalid.

This patch fixes the issue by making a deep copy of the recording before returning it.

Actually orchestrating a scenario where such state corruption leads to
something visible (i.e. a crash) proved a bit involved. So I've gone
with a randomized test that did reliably crash.

Release note: None
An unlock was misaligned.

Release note: None
Before this patch it was quite ambiguous whether the child spans in a
Trace are sorted by their start time or not. We have sorting calls
sprinkled through the code, but it was hard to say when they're needed
and whether they're sufficient. This patch does away with bespoke
sorting, instead opting to always maintain the children in a sorted
order. Most of the time this shouldn't cost anything, since we detect
the case where one child is added with a start time beyond the last one
(which should be the common case).

Release note: None
Whenever the trace span limit was hit (maxRecordedSpansPerTrace = 1000),
some spans are dropped from the trace in order to keep the total below
the limit. The structured events from the dropped spans are saved in
the parent though - we try not to throw structured events away. Each
span is supposed to have a size limit on the structured events that it
holds on to (maxStructuredBytesPerSpan = 10KB), but that limit was not
enforced on this code path - Trace.trimRecursive was copying the child's
events to the parent without respect for the limit. As a result, we've
seen traces for queries taking many minutes get huge because of this
omission.

This patch takes the opportunity to rework how the structured events
limit works. In addition to the per-span limit (10KB), it also
introduces a per-trace limit (1MB). The per-span limit remains, and
applies to structured records directly logged into a span. The per-trace
limit applies at the moment when either:
- a span is finished, and its recording is added to the parent's.
- a span's recording is requested, and the span collects all of its
  children's recordings.

In both these cases, the per-trace limit applies at the level of the
parent span that collects the children's recordings. Thus, it is
possible that multiple spans open at the same time, belonging to the same
trace, all temporarily grow to the limit. As they are finished and their
recordings are centralized, records will be dropped to until ultimately
there's a single recording, up to the limit.

In contrast to the previous per-span limit, the new scheme allows one
span to accumulate a large number of structured events from its
children. Thus, a trace can have one, or a few, large spans. Where
before a trace was not supposed to have any large spans (but we were
actually failing to properly enforce any limit, as the first paragraph
describes).

Release note: None
Before this patch, events in a structured recording were subject to the
per-span limit (10KB). This was not a principled thing, as it means that
a verbose trace could have more structured events than a structured
trace. The code came to be this way because doing something else was
difficult, until recently. This patch makes it so that the larger,
per-trace limit (1MB) applies. It does this by storing the structured
events coming from finished children in a structured recording in the
same way that structured events coming from dropped spans in a verbose
recording are stored (which is a good change on its own because it makes
the code more uniform).

Release note: None
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks for these fixes. Apologies for the review lag.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@andreimatei
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 19, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants