Check entry empty state to ensure GC eligible #3634

rallen090 · 2021-07-29T19:02:39Z

What this PR does / why we need it:

Before removing in-memory series from the index, we need to be sure that they've been flushed to disk. Before the change in the PR, the proxy for determining this was based on if a given blockStart had been bootstrapped - the logic being that block must have been flushed to disk for it to have been bootstrapped (i.e. blockTickResult.NumSegmentsBootstrapped != 0).

The problem w/ the above approach is that it does not capture cold writes, since those could be written to old blockStarts, still be in-memory and not cold flushed yet to disk, but the bootstrapped state would still be true.

Instead, we now check that a given series is truly "empty" (i.e. no TSDB data nor index data for that series in-memory) to be eligible for GC. One challenge with this check is that we only keep track of the state (a) "is TSDB data still in-memory" and not (b) "is index data still in-memory". In other words, we can be sure that (a) is true but cannot be sure (b) is true. So this PR also makes it so that the state we track asserts that both (a) and (b) are true. The way we do this is by marking this state only after both TSDB and index (warm and cold) flushes are truly complete.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

codecov · 2021-08-02T21:32:46Z

Codecov Report

Merging #3634 (6651096) into r/index-active-block (c815135) will increase coverage by 0.0%.
The diff coverage is 76.3%.

@@                 Coverage Diff                  @@
##           r/index-active-block   #3634   +/-   ##
====================================================
  Coverage                  56.3%   56.3%           
====================================================
  Files                       551     551           
  Lines                     62189   62214   +25     
====================================================
+ Hits                      35015   35066   +51     
+ Misses                    24035   24013   -22     
+ Partials                   3139    3135    -4

Flag	Coverage Δ
aggregator	`57.1% <ø> (ø)`
cluster	`∅ <ø> (∅)`
collector	`58.4% <ø> (ø)`
dbnode	`60.9% <76.3%> (+<0.1%)`	⬆️
m3em	`46.4% <ø> (ø)`
metrics	`19.8% <ø> (ø)`
msg	`74.4% <ø> (+0.1%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c815135...6651096. Read the comment docs.

…dex-active-block-flush-state

src/dbnode/storage/flush.go

src/dbnode/storage/index.go

src/dbnode/storage/namespace.go

robskillington · 2021-08-17T16:40:26Z

src/dbnode/storage/series/lookup/entry.go

-	return result
+// IsEmpty returns true if the entry has no in-memory series data.
+func (entry *Entry) IsEmpty() bool {
+	return entry.Series.IsEmpty()


Instead of just doing entry.Series.IsEmpty() we will need to keep the idea of RelookupAndIncrementReaderWriterCount IMO which would re-lookup the latest lookup.Entry in the shard map.

There's a few reasons for this and we saw real problems without relooking up the entry. I can explain it more in depth if required but the problem is that sometimes a lookup.Entry is created for a new series which is passed off to the indexing queue but then a race of two datapoints for a series that doesn't exist yet causes only a single lookup.Entry to exist in the final shard map and the other one becomes orphaned (which could look empty but the lookup.Entry that made it into the shard map will have a series that is not actually empty).

That's why the caller (from within mutable_segments.go) should perform perhaps something like the following:

isEmpty, ok := entry.RelookupAndReturnIsEmpty() if !ok { // Should not happen since shard will not expire until // no more block starts are indexed. // We do not GC this series if shard is missing since // we open up a race condition where the entry is not // in the shard yet and we GC it since we can't find it // due to an asynchronous insert. return true } return !isEmpty

Also we should probably add a metric for if the edge case happens using the instrument.EmitAndLogInvariantError(...) so that this panics integration test and scenario tests if this edge case ever starts occurring.

Interesting wasn't aware of this edge case but it makes sense. Will add back this safeguard.

PR updated accordingly

nbroyles · 2021-08-17T15:58:41Z

src/dbnode/storage/flush.go

+			flushedShards map[shardFlush]bool
+		)
+		if indexEnabled {
+			flushesForNs, ok := indexFlushes[n.ID().String()]


Do we need to check dataFlushes here as well? If not, why so?

Data flushes always precede index flushes, so it is safe to assume that the marking of a "full" flush can be indicated by the "index" flush. If the index is disabled, though, then we just use the data flush as the indicator.

This is something I am double checking w/ rob though so stay tuned. Assuming what I just said is true, I can add a comment here explaining the logic.

nbroyles · 2021-08-19T14:17:23Z

src/dbnode/storage/index.go

-			}] = s
+		for _, t := range i.blockStartsFromIndexBlockStart(block.StartTime()) {
+			for _, s := range shards {
+				s.MarkWarmIndexFlushStateSuccessOrError(t, err)


So, IIUC, blockStartsFromIndexBlockStart returns data block starts between index block start and index block start + index block size. This is important because index block size >= data block size. If that's true, why do we need to MarkWarmIndexFlushStateSuccessOrError for each data block start time?

That's correct. And the reason is because MarkWarmIndexFlushStateSuccessOrError is still marking data blockStarts (not index blockStarts), but each block now just have a flag for both data and index. This is because the state we are tracking is always still at the data block size not index block size. Eg we have 1h block and 2h indexBlock. In-mem we now have:

1pm: {dataFlushed: bool, indexFlushed: bool} 2pm: {dataFlushed: bool, indexFlushed: bool} 3pm: {dataFlushed: bool, indexFlushed: bool} 4pm: {dataFlushed: bool, indexFlushed: bool} ...

where determining if a given block is eligible for GC we check both dataFlushed && indexFlushed. So when an index flush occurs, that actually in this case covers 2 blocks.

This is a somewhat confusing nuance though so if you have any suggestions are making this easier to understand let me know.

Yeah, may be worth adding a comment here just clarify

src/dbnode/storage/shard.go

rallen090 added 4 commits July 28, 2021 19:06

WIP - refactor flush state marking to occur after index flush

de7e6b9

WIP - remove unused ns marking func

9fc4934

WIP - do not remove from index if series not empty

e19a2ac

WIP - remove flushed block checks

ca70206

rallen090 changed the base branch from master to r/index-active-block July 29, 2021 19:02

rallen090 added 7 commits July 29, 2021 16:51

Cleanup 1

6a2eaf1

Mock gen

5f1be6c

Fix tests 1

de4c165

Fix TestBlockWriteBackgroundCompact

f3117c9

Lint

6568044

WIP - fix index flush conditions

7c1f06c

WIP - fix index flush conditions 2

75842e3

rallen090 added 3 commits August 2, 2021 17:54

Add test to verify warm flush ordering

ee6e6ea

Lint

db5552f

Merge remote-tracking branch 'origin/r/index-active-block' into ra/in…

18984ca

…dex-active-block-flush-state

rallen090 requested a review from robskillington August 3, 2021 20:01

Experimental index flush matching

1e81687

rallen090 commented Aug 5, 2021

View reviewed changes

src/dbnode/storage/flush.go Outdated Show resolved Hide resolved

rallen090 added 5 commits August 10, 2021 16:19

Use maps for shard flushes

8344c17

Mark flushed shards based on block size

c6c3b4f

Fixup shard marking logic

0b28fe7

Mock

3bf7412

Fix test

dd4dd0a

rallen090 requested a review from nbroyles August 17, 2021 15:31

Fix test TestFlushManagerNamespaceIndexingEnabled

4084517