Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a single index Results when querying across blocks #1474

Merged

Conversation

robskillington
Copy link
Collaborator

Fixes #1469

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #1474 into master will decrease coverage by 6.8%.
The diff coverage is 65%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1474     +/-   ##
========================================
- Coverage    70.9%     64%   -6.9%     
========================================
  Files         841     834      -7     
  Lines       71895   71174    -721     
========================================
- Hits        50990   45614   -5376     
- Misses      17561   22333   +4772     
+ Partials     3344    3227    -117
Flag Coverage Δ
#aggregator 69.3% <ø> (-13.1%) ⬇️
#cluster 67.7% <ø> (-18.2%) ⬇️
#collector 47.9% <ø> (-15.8%) ⬇️
#dbnode 76.4% <65%> (-4.4%) ⬇️
#m3em 68.3% <ø> (-4.9%) ⬇️
#m3ninx 71.2% <ø> (-3%) ⬇️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.6% <ø> (ø) ⬆️
#msg 74.9% <ø> (ø) ⬆️
#query 56.3% <ø> (-9.8%) ⬇️
#x 69.4% <ø> (-7%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dec08fe...f05ce65. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #1474 into master will decrease coverage by 7.5%.
The diff coverage is 72.5%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1474      +/-   ##
=========================================
- Coverage    70.9%   63.3%    -7.6%     
=========================================
  Files         841     712     -129     
  Lines       71880   62527    -9353     
=========================================
- Hits        50982   39635   -11347     
- Misses      17564   19825    +2261     
+ Partials     3334    3067     -267
Flag Coverage Δ
#aggregator 61.1% <ø> (-21.2%) ⬇️
#cluster 84.3% <ø> (-1.5%) ⬇️
#collector 47.9% <ø> (-15.8%) ⬇️
#dbnode 68.8% <69.8%> (-12.1%) ⬇️
#m3em 68.3% <ø> (-4.9%) ⬇️
#m3ninx 69.8% <ø> (-4.5%) ⬇️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.5% <ø> (ø) ⬆️
#msg 74.9% <ø> (ø) ⬆️
#query 72.2% <ø> (+6.2%) ⬆️
#x 63.8% <100%> (-12.6%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fec4334...a8f8a2d. Read the comment docs.

Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general

)

type results struct {
sync.RWMutex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question, but do we usually have a struct contain a mu sync.RWMutex or have it be a mutex (promoted?)? Is there a difference besides accessors?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on access pattern mainly: A RWMutex is useful when you have potential multiple readers that could share underlying data without clobbering each other; if all accessors always need r/w access - then a mutex is fine.

Copy link
Collaborator

@arnikola arnikola Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, meant the distinction between something like

type results struct {
 sync.RWMutex
}

v.s.

type results struct {
 mu sync.RWMutex
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the prior, unless it's an exported struct in which case the latter to reduce scope of calling those methods.

@@ -55,6 +58,15 @@ func NewResults(opts Options) Results {

func (r *results) AddDocument(
d doc.Document,
) (added bool, size int, err error) {
r.Lock()
added, size, err = r.addDocumentWithLock(d)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we commonly named returns? In this file it's a bit mixed up e.g. Namespace() uses standard returns, is that ok or would it be better to refactor these to the more standard return type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to keep the existing stuff here. Usually we prefer to return structs actually instead of named returns.

noFinalize := r.noFinalize
r.RUnlock()

if noFinalize {
return
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When putting back in the pool? Sure I'm not opposed to that, but technically just before you add it to the pool you're basically declaring you're done with the object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, can't lock during entire call to Finalize as Reset itself acquires a write lock, that's why we release the lock earlier in the method.

@@ -170,6 +243,9 @@ func (r *results) Finalize() {
}

func (r *results) NoFinalize() {
r.Lock()
defer r.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a defer? Seems straightforward with no branching

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, probably don't need it here.

}
return numPartialUpdates, size, nil
}

func (r *results) AddIDAndTags(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method is not used any more, may be able to remove both it and AddDocument

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah true, will do - need to remove the tests.

@@ -120,6 +132,22 @@ type Results interface {
) (added bool, size int, err error)
}

// AddDocumentsBatchResultsOptions is a set of options to use when adding
// results for a query.
type AddDocumentsBatchResultsOptions struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to add this to Options? This doesn't seem like something that you'd necessarily change per query, but may be wrong

}

func (r *results) Reset(nsID ident.ID) {
r.Lock()
defer r.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a defer? Seems straightforward with no branching

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, probably don't need it here.

require.True(t, ident.NewTagIterMatcher(
ident.MustNewTagStringsIterator("bar", "baz")).Matches(
ident.NewTagsIterator(t1)))
results.WithMap(func(rMap *ResultsMap) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is used in multiple places, maybe make it into a test function?

// Reset batch
var emptyDoc doc.Document
for i := range batch {
batch[i] = emptyDoc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth doing some sort of Memset optimization here instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the memset optimization thankfully, just using a single zero var value and then setting everything with range i := ... to that value.

iterCloser := safeCloser{closable: iter}
execCloser := safeCloser{closable: exec}

defer func() {
b.docsPool.Put(batch)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to reset the batch here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No thankfully the Put method on the array will zero out each elem and also resize it to zero.

@@ -157,7 +226,11 @@ func (r *results) Reset(nsID ident.ID) {
}

func (r *results) Finalize() {
if r.noFinalize {
r.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I feel about this. If there is any amount of concurrency here then you've completely misused the object and you have a really bad bug anyway. Using a lock here kind of confuses that point

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I figured I was locking everything and it is kind of strange to lock only some state rather than all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I dont have strong feelings either way, just took me a second to think about why you might be doing it

@@ -170,6 +243,9 @@ func (r *results) Finalize() {
}

func (r *results) NoFinalize() {
r.Lock()
defer r.Unlock()

// Ensure neither the results object itself, or any of its underlying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is my typo originally, but can you do or->nor

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

@@ -895,24 +895,16 @@ func (i *nsIndex) Query(
exhaustive bool
returned bool
}{
merged: nil,
merged: i.resultsPool.Get(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to call this merged now that we push it all the way down? Maybe just call it results

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, renamed.

@@ -1064,8 +1017,6 @@ func (i *nsIndex) Query(
}

results.Lock()
// Signal not to add any further results since we've returned already.
results.returned = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we losing the ability to do this signaling? Seems like this might make the impact of timed out queries even worse since they'll keep updating this map...Actually isn't it broken because they'll keep trying to add results to a map that may have been returned to the pool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, where do the results get returned to the pool? I don't see it in this method or in the RPC method.

Regardless, if we want to pool this thing you may need to add ref-counting or some type of unique query identifier or something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I now use a lifetime to protect against writing to the results once we return from the Query call to the index, this prevents writing to results during cancellation or any other early return code path.

batch = batch[:0]
}

// Put last batch if remainding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining

}

func (r *results) Map() *ResultsMap {
return r.resultsMap
func (r *results) WithMap(fn func(results *ResultsMap)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to WithReadOnlyMap? I initially thought this method made it safe to mutate the map but thats not the case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

// from seriesID -> seriesTags, comprising index results.
// A function is required to ensure that the results is used
// while a read lock for the results is held.
WithMap(fn func(results *ResultsMap))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, how would you feel about adding a Seal() and changing Map() to return (*ResultsMap, error). Would allow callers to avoid the extra lambda.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is actually the approach I'll take also to address the pooling issues that Richie was concerned about (i.e. not knowing when it's finalized and potentially adding values to it after it is finalized)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robskillington I think if you have an early time out / cancel, you have to give up and not return it to the pool cause even if you seal it there is no way to know when its safe to unseal it unless you ref count it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I now use a lifetime to protect against writing to the results once we return from the Query call to the index, this prevents writing to results during cancellation or any other early return code path.

resultsPool.Init(func() index.Results { return index.NewResults(indexOpts) })
resultsPool.Init(func() index.Results {
// NB(r): Need to initialize after setting the index opts so
// it seems the same reference of the options as is set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence reads like word salad lol. I think you were trying to say: "so it sees the same reference to the indexOptions"

}
}
}

// If block had more data but we stopped early, need to notify caller.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: This would read a lot better if you move this comment below the early return cause right now it seems like a bug at first glance cause you say need to notify the caller but then the code immediately returns without doing anything

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will do.

}
results.Unlock()

if alreadyNotExhaustive {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little weird, can you just move the break into the previous conditional block that has the same condition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, this is a side effect of the refactor actually, good catch.

// If block had more data but we stopped early, need to notify caller.
if blockExhaustive {
return
}
results.exhaustive = false
state.exhaustive = false
}

for _, block := range blocks {
// Capture block for async query execution below.
block := block

// Terminate early if we know we don't need any more results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a second to understand this, could you clarify the comment to something like:

"We're looping through all the blocks that we need to query and kicking off parallel queries which are bounded by the queryWorkersPool's maximum concurrency. This means that it's possible at this point that we've completed querying one or more blocks and already exhausted the maximum number of results that we're allowed to return. If thats the case, there is no value in kicking off more parallel queries, so we break out of the loop."""

I know its a little verbose but I've seen multiple iterations of this code already and other people coming in with fresh eyes will probably have more trouble following.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing the comment, will do - hah 👍

results.Lock()
// Signal not to add any further results since we've returned already.
results.returned = true
state.Lock()
// Take reference to vars to return while locked, need to allow defer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand this comment. What deadlock are you protecting against? The only Lock/Defer that is see is in the execBlockQuery func and I don't see how that would deadlock with this code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove that statement, there used to be other stuff going on.

queryValid := cancellable.TryCheckout()
if !queryValid {
// Query not valid any longer, do not add results and return early
return batch, 0, errCancelledQuery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this error will still get propagated up to the level above and stored in the state multierr.....I guess thats fine though since you take a copy of the multierr when you return

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we always lock when we access multierr.


r.opts = opts

// finalize existing held nsID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add periods to the comments in this file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, this is old code but I'll update it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS we should really create a lint rule so you don't have to manually ask this btw (not urgent, but something we should think about)

r.nsID = nsID

// reset all values from map first
for _, entry := range r.resultsMap.Iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt you be doing this in Finalize() not Reset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see you call reset from finalize

}

// reset all keys in the map next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a quick comment saying this will finalize the keys

) (added bool, size int, err error) {
added = false
) (bool, int, error) {
added := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I'd personally find this more readable if you just explicitly returned false/true in each code path

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

// take a copy of the bytes backing the documents so the original can be
// modified after this function returns without affecting the results map.
// If documents with duplicate IDs are added, they are simply ignored and
// the first document added with an ID is returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a TODO here saying we may need to change the exact behavior here once the index becomes mutable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

// is already cancelled this will return false, otherwise it will return
// true and guarantee the lifetime is not cancelled until the checkout
// is returned.
func (l *CancellableLifetime) TryCheckout() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the comment: If this returns true you MUST call ReleaseCheckout later, but if it returns false calling ReleaseCheckout will panic (pretty sure unlocking an unlocked lock panics

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

require.False(t, ok)

// Ensure that cancel finished
for {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of doing loop here, maybe add a channel that the cancel goroutine pushes a struct to after setting the bool?

}
alreadyNotExhaustive := opts.Limit > 0 && mergedSize >= opts.Limit
size := results.Size()
alreadyNotExhaustive := opts.LimitExceeded(size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this be renamed to alreadyExceededLimit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

indexOpts.SegmentBuilderOptions(),
indexOpts.FSTSegmentOptions(),
compaction.CompactorOptions{
MmapDocsData: opts.ForegroundCompactorMmapDocsData,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there may be an error here; backgroundCompactor uses ForegroundCompactorMmapDocsData, and foregroundCompactor uses BackgroundCompactorMmapDocsData

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, good call - TY for this catch.

// Results is a collection of results for a query.
// Results is a collection of results for a query, it is synchronized
// when access to the results set is used as documented by the methods.
// It cannot be written to after it is sealed, until it's reopened by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be sealed in the current implementation, so this comment may be wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks will update.

return r.resultsMap
r.RLock()
v := r.resultsMap
r.RUnlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this make a copy of the resultsMap? Otherwise won't downstream consumers be able to modify this map without a lock or have their map changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to the call to Map about this being unsafe. We can't really take a copy since it will cause a ton of allocations.

DoAndReturn(func(q index.Query, opts index.QueryOptions, r index.Results) (bool, error) {
Query(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).
DoAndReturn(func(
l *resource.CancellableLifetime,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meganit: these can all be _

) (added bool, size int, err error) {
added = false
) (bool, int, error) {
added := false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

testOpts = optionsWithDocsArrayPool(NewOptions(), 1, 1)
}

func optionsWithDocsArrayPool(opts Options, size, capacity int) Options {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering what the purpose of this function is rather than just doing this in the init; just a convention?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call this method elsewhere.

copy(allDocsCopy, allDocs)
fstData.DocsReader = docs.NewSliceReader(0, allDocsCopy)
} else {
// Otherwise encode and reference the encoded bytes as mmap'd bytes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with this path, but just verifying it's ok to have DocsReader be nil here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, if we supply the docs data and docs index data then its safe to have DocsReader be nil.

@robskillington robskillington merged commit b9205e8 into master Mar 21, 2019
@robskillington robskillington deleted the r/reuse-single-index-results-querying-across-blocks branch March 21, 2019 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants