-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util/tracing: bug fixes and improvements #89785
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Span.RecordStructured(item) was stating that the function takes ownership of item, when in fact it effectively copies it. Correct it, so that callers are free to use the item more liberally. Release note: None
This patch fixes a bug around the returning of a span's recording. Before this patch, the returned recording would alias the span's internal copy of the recording. This was bad because the span's internal copy is mutable: it changes when new spans are added to it. Besides introducing the possibility of data races between accesses to the returned recording and the internal recording, this aliasing can also cause the returned recording to become inconsistent: a Trace's NumSpans field can get out of sync with the children. In practice, the trouble comes from scenarios like this: 1. when a span is finished, it calls openChildSp.getVerboseRecording() on its still-open children (if any) 2. the returned child's recording aliases the child's state 3. the returned child's recording is added to the finishing parent's trace, which is then added to the grand-parent. So now we have two open spans (the grand-parent and the grand-child) whose state aliases each other 4. being open, their state can change (when other child spans are finished). The grand-child's state changing can now cause the parent's state to become invalid. This patch fixes the issue by making a deep copy of the recording before returning it. Actually orchestrating a scenario where such state corruption leads to something visible (i.e. a crash) proved a bit involved. So I've gone with a randomized test that did reliably crash. Release note: None
An unlock was misaligned. Release note: None
Before this patch it was quite ambiguous whether the child spans in a Trace are sorted by their start time or not. We have sorting calls sprinkled through the code, but it was hard to say when they're needed and whether they're sufficient. This patch does away with bespoke sorting, instead opting to always maintain the children in a sorted order. Most of the time this shouldn't cost anything, since we detect the case where one child is added with a start time beyond the last one (which should be the common case). Release note: None
Whenever the trace span limit was hit (maxRecordedSpansPerTrace = 1000), some spans are dropped from the trace in order to keep the total below the limit. The structured events from the dropped spans are saved in the parent though - we try not to throw structured events away. Each span is supposed to have a size limit on the structured events that it holds on to (maxStructuredBytesPerSpan = 10KB), but that limit was not enforced on this code path - Trace.trimRecursive was copying the child's events to the parent without respect for the limit. As a result, we've seen traces for queries taking many minutes get huge because of this omission. This patch takes the opportunity to rework how the structured events limit works. In addition to the per-span limit (10KB), it also introduces a per-trace limit (1MB). The per-span limit remains, and applies to structured records directly logged into a span. The per-trace limit applies at the moment when either: - a span is finished, and its recording is added to the parent's. - a span's recording is requested, and the span collects all of its children's recordings. In both these cases, the per-trace limit applies at the level of the parent span that collects the children's recordings. Thus, it is possible that multiple spans open at the same time, belonging to the same trace, all temporarily grow to the limit. As they are finished and their recordings are centralized, records will be dropped to until ultimately there's a single recording, up to the limit. In contrast to the previous per-span limit, the new scheme allows one span to accumulate a large number of structured events from its children. Thus, a trace can have one, or a few, large spans. Where before a trace was not supposed to have any large spans (but we were actually failing to properly enforce any limit, as the first paragraph describes). Release note: None
Before this patch, events in a structured recording were subject to the per-span limit (10KB). This was not a principled thing, as it means that a verbose trace could have more structured events than a structured trace. The code came to be this way because doing something else was difficult, until recently. This patch makes it so that the larger, per-trace limit (1MB) applies. It does this by storing the structured events coming from finished children in a structured recording in the same way that structured events coming from dropped spans in a verbose recording are stored (which is a good change on its own because it makes the code more uniform). Release note: None
andreimatei
force-pushed
the
tracing.fix-recording
branch
from
October 12, 2022 15:59
1fb8aef
to
2d29175
Compare
abarganier
approved these changes
Oct 18, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for these fixes. Apologies for the review lag.
Reviewable status: complete! 1 of 0 LGTMs obtained
TFTR! bors r+ |
Build succeeded: |
This was referenced Nov 7, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please see individual commits.
Touches #87536
Epic: None