-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check entry empty state to ensure GC eligible #3634
Check entry empty state to ensure GC eligible #3634
Conversation
Codecov Report
@@ Coverage Diff @@
## r/index-active-block #3634 +/- ##
====================================================
Coverage 56.3% 56.3%
====================================================
Files 551 551
Lines 62189 62214 +25
====================================================
+ Hits 35015 35066 +51
+ Misses 24035 24013 -22
+ Partials 3139 3135 -4
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
…dex-active-block-flush-state
return result | ||
// IsEmpty returns true if the entry has no in-memory series data. | ||
func (entry *Entry) IsEmpty() bool { | ||
return entry.Series.IsEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of just doing entry.Series.IsEmpty()
we will need to keep the idea of RelookupAndIncrementReaderWriterCount
IMO which would re-lookup the latest lookup.Entry in the shard map.
There's a few reasons for this and we saw real problems without relooking up the entry. I can explain it more in depth if required but the problem is that sometimes a lookup.Entry is created for a new series which is passed off to the indexing queue but then a race of two datapoints for a series that doesn't exist yet causes only a single lookup.Entry to exist in the final shard map and the other one becomes orphaned (which could look empty but the lookup.Entry that made it into the shard map will have a series that is not actually empty).
That's why the caller (from within mutable_segments.go
) should perform perhaps something like the following:
isEmpty, ok := entry.RelookupAndReturnIsEmpty()
if !ok {
// Should not happen since shard will not expire until
// no more block starts are indexed.
// We do not GC this series if shard is missing since
// we open up a race condition where the entry is not
// in the shard yet and we GC it since we can't find it
// due to an asynchronous insert.
return true
}
return !isEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should probably add a metric for if the edge case happens using the instrument.EmitAndLogInvariantError(...)
so that this panics integration test and scenario tests if this edge case ever starts occurring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting wasn't aware of this edge case but it makes sense. Will add back this safeguard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated accordingly
src/dbnode/storage/flush.go
Outdated
flushedShards map[shardFlush]bool | ||
) | ||
if indexEnabled { | ||
flushesForNs, ok := indexFlushes[n.ID().String()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check dataFlushes
here as well? If not, why so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data flushes always precede index flushes, so it is safe to assume that the marking of a "full" flush can be indicated by the "index" flush. If the index is disabled, though, then we just use the data flush as the indicator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I am double checking w/ rob though so stay tuned. Assuming what I just said is true, I can add a comment here explaining the logic.
}] = s | ||
for _, t := range i.blockStartsFromIndexBlockStart(block.StartTime()) { | ||
for _, s := range shards { | ||
s.MarkWarmIndexFlushStateSuccessOrError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, IIUC, blockStartsFromIndexBlockStart
returns data block starts between index block start and index block start + index block size. This is important because index block size >= data block size. If that's true, why do we need to MarkWarmIndexFlushStateSuccessOrError
for each data block start time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. And the reason is because MarkWarmIndexFlushStateSuccessOrError
is still marking data blockStarts (not index blockStarts), but each block now just have a flag for both data and index. This is because the state we are tracking is always still at the data block size not index block size. Eg we have 1h block and 2h indexBlock. In-mem we now have:
1pm: {dataFlushed: bool, indexFlushed: bool}
2pm: {dataFlushed: bool, indexFlushed: bool}
3pm: {dataFlushed: bool, indexFlushed: bool}
4pm: {dataFlushed: bool, indexFlushed: bool}
...
where determining if a given block is eligible for GC we check both dataFlushed && indexFlushed
. So when an index flush occurs, that actually in this case covers 2 blocks.
This is a somewhat confusing nuance though so if you have any suggestions are making this easier to understand let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, may be worth adding a comment here just clarify
What this PR does / why we need it:
Before removing in-memory series from the index, we need to be sure that they've been flushed to disk. Before the change in the PR, the proxy for determining this was based on if a given blockStart had been bootstrapped - the logic being that block must have been flushed to disk for it to have been bootstrapped (i.e.
blockTickResult.NumSegmentsBootstrapped != 0
).The problem w/ the above approach is that it does not capture cold writes, since those could be written to old blockStarts, still be in-memory and not cold flushed yet to disk, but the bootstrapped state would still be true.
Instead, we now check that a given series is truly "empty" (i.e. no TSDB data nor index data for that series in-memory) to be eligible for GC. One challenge with this check is that we only keep track of the state (a) "is TSDB data still in-memory" and not (b) "is index data still in-memory". In other words, we can be sure that (a) is true but cannot be sure (b) is true. So this PR also makes it so that the state we track asserts that both (a) and (b) are true. The way we do this is by marking this state only after both TSDB and index (warm and cold) flushes are truly complete.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: