-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a single index Results when querying across blocks #1474
Use a single index Results when querying across blocks #1474
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1474 +/- ##
========================================
- Coverage 70.9% 64% -6.9%
========================================
Files 841 834 -7
Lines 71895 71174 -721
========================================
- Hits 50990 45614 -5376
- Misses 17561 22333 +4772
+ Partials 3344 3227 -117
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #1474 +/- ##
=========================================
- Coverage 70.9% 63.3% -7.6%
=========================================
Files 841 712 -129
Lines 71880 62527 -9353
=========================================
- Hits 50982 39635 -11347
- Misses 17564 19825 +2261
+ Partials 3334 3067 -267
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general
) | ||
|
||
type results struct { | ||
sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a question, but do we usually have a struct contain a mu sync.RWMutex
or have it be a mutex (promoted?)? Is there a difference besides accessors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on access pattern mainly: A RWMutex is useful when you have potential multiple readers that could share underlying data without clobbering each other; if all accessors always need r/w access - then a mutex is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, meant the distinction between something like
type results struct {
sync.RWMutex
}
v.s.
type results struct {
mu sync.RWMutex
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the prior, unless it's an exported struct in which case the latter to reduce scope of calling those methods.
src/dbnode/storage/index/results.go
Outdated
@@ -55,6 +58,15 @@ func NewResults(opts Options) Results { | |||
|
|||
func (r *results) AddDocument( | |||
d doc.Document, | |||
) (added bool, size int, err error) { | |||
r.Lock() | |||
added, size, err = r.addDocumentWithLock(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we commonly named returns? In this file it's a bit mixed up e.g. Namespace()
uses standard returns, is that ok or would it be better to refactor these to the more standard return type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to keep the existing stuff here. Usually we prefer to return structs actually instead of named returns.
noFinalize := r.noFinalize | ||
r.RUnlock() | ||
|
||
if noFinalize { | ||
return | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When putting back in the pool? Sure I'm not opposed to that, but technically just before you add it to the pool you're basically declaring you're done with the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thoughts, can't lock during entire call to Finalize as Reset itself acquires a write lock, that's why we release the lock earlier in the method.
src/dbnode/storage/index/results.go
Outdated
@@ -170,6 +243,9 @@ func (r *results) Finalize() { | |||
} | |||
|
|||
func (r *results) NoFinalize() { | |||
r.Lock() | |||
defer r.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a defer? Seems straightforward with no branching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, probably don't need it here.
src/dbnode/storage/index/results.go
Outdated
} | ||
return numPartialUpdates, size, nil | ||
} | ||
|
||
func (r *results) AddIDAndTags( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method is not used any more, may be able to remove both it and AddDocument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah true, will do - need to remove the tests.
src/dbnode/storage/index/types.go
Outdated
@@ -120,6 +132,22 @@ type Results interface { | |||
) (added bool, size int, err error) | |||
} | |||
|
|||
// AddDocumentsBatchResultsOptions is a set of options to use when adding | |||
// results for a query. | |||
type AddDocumentsBatchResultsOptions struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to add this to Options
? This doesn't seem like something that you'd necessarily change per query, but may be wrong
src/dbnode/storage/index/results.go
Outdated
} | ||
|
||
func (r *results) Reset(nsID ident.ID) { | ||
r.Lock() | ||
defer r.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a defer? Seems straightforward with no branching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, probably don't need it here.
require.True(t, ident.NewTagIterMatcher( | ||
ident.MustNewTagStringsIterator("bar", "baz")).Matches( | ||
ident.NewTagsIterator(t1))) | ||
results.WithMap(func(rMap *ResultsMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is used in multiple places, maybe make it into a test function?
src/dbnode/storage/index/block.go
Outdated
// Reset batch | ||
var emptyDoc doc.Document | ||
for i := range batch { | ||
batch[i] = emptyDoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth doing some sort of Memset optimization here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the memset optimization thankfully, just using a single zero var value and then setting everything with range i := ...
to that value.
iterCloser := safeCloser{closable: iter} | ||
execCloser := safeCloser{closable: exec} | ||
|
||
defer func() { | ||
b.docsPool.Put(batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to reset the batch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No thankfully the Put method on the array will zero out each elem and also resize it to zero.
@@ -157,7 +226,11 @@ func (r *results) Reset(nsID ident.ID) { | |||
} | |||
|
|||
func (r *results) Finalize() { | |||
if r.noFinalize { | |||
r.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how I feel about this. If there is any amount of concurrency here then you've completely misused the object and you have a really bad bug anyway. Using a lock here kind of confuses that point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I figured I was locking everything and it is kind of strange to lock only some state rather than all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I dont have strong feelings either way, just took me a second to think about why you might be doing it
src/dbnode/storage/index/results.go
Outdated
@@ -170,6 +243,9 @@ func (r *results) Finalize() { | |||
} | |||
|
|||
func (r *results) NoFinalize() { | |||
r.Lock() | |||
defer r.Unlock() | |||
|
|||
// Ensure neither the results object itself, or any of its underlying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is my typo originally, but can you do or->nor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing.
src/dbnode/storage/index.go
Outdated
@@ -895,24 +895,16 @@ func (i *nsIndex) Query( | |||
exhaustive bool | |||
returned bool | |||
}{ | |||
merged: nil, | |||
merged: i.resultsPool.Get(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to call this merged now that we push it all the way down? Maybe just call it results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, renamed.
@@ -1064,8 +1017,6 @@ func (i *nsIndex) Query( | |||
} | |||
|
|||
results.Lock() | |||
// Signal not to add any further results since we've returned already. | |||
results.returned = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we losing the ability to do this signaling? Seems like this might make the impact of timed out queries even worse since they'll keep updating this map...Actually isn't it broken because they'll keep trying to add results to a map that may have been returned to the pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, where do the results get returned to the pool? I don't see it in this method or in the RPC method.
Regardless, if we want to pool this thing you may need to add ref-counting or some type of unique query identifier or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I now use a lifetime to protect against writing to the results once we return from the Query call to the index, this prevents writing to results during cancellation or any other early return code path.
src/dbnode/storage/index/block.go
Outdated
batch = batch[:0] | ||
} | ||
|
||
// Put last batch if remainding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remaining
src/dbnode/storage/index/results.go
Outdated
} | ||
|
||
func (r *results) Map() *ResultsMap { | ||
return r.resultsMap | ||
func (r *results) WithMap(fn func(results *ResultsMap)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to WithReadOnlyMap
? I initially thought this method made it safe to mutate the map but thats not the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing.
src/dbnode/storage/index/types.go
Outdated
// from seriesID -> seriesTags, comprising index results. | ||
// A function is required to ensure that the results is used | ||
// while a read lock for the results is held. | ||
WithMap(fn func(results *ResultsMap)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of this, how would you feel about adding a Seal()
and changing Map()
to return (*ResultsMap, error)
. Would allow callers to avoid the extra lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is actually the approach I'll take also to address the pooling issues that Richie was concerned about (i.e. not knowing when it's finalized and potentially adding values to it after it is finalized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robskillington I think if you have an early time out / cancel, you have to give up and not return it to the pool cause even if you seal it there is no way to know when its safe to unseal it unless you ref count it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I now use a lifetime to protect against writing to the results once we return from the Query call to the index, this prevents writing to results during cancellation or any other early return code path.
…x query returns due timeout or other error
… github.com:m3db/m3 into r/reuse-single-index-results-querying-across-blocks
src/dbnode/server/server.go
Outdated
resultsPool.Init(func() index.Results { return index.NewResults(indexOpts) }) | ||
resultsPool.Init(func() index.Results { | ||
// NB(r): Need to initialize after setting the index opts so | ||
// it seems the same reference of the options as is set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence reads like word salad lol. I think you were trying to say: "so it sees the same reference to the indexOptions"
src/dbnode/storage/index.go
Outdated
} | ||
} | ||
} | ||
|
||
// If block had more data but we stopped early, need to notify caller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: This would read a lot better if you move this comment below the early return cause right now it seems like a bug at first glance cause you say need to notify the caller but then the code immediately returns without doing anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will do.
src/dbnode/storage/index.go
Outdated
} | ||
results.Unlock() | ||
|
||
if alreadyNotExhaustive { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little weird, can you just move the break into the previous conditional block that has the same condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, this is a side effect of the refactor actually, good catch.
src/dbnode/storage/index.go
Outdated
// If block had more data but we stopped early, need to notify caller. | ||
if blockExhaustive { | ||
return | ||
} | ||
results.exhaustive = false | ||
state.exhaustive = false | ||
} | ||
|
||
for _, block := range blocks { | ||
// Capture block for async query execution below. | ||
block := block | ||
|
||
// Terminate early if we know we don't need any more results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a second to understand this, could you clarify the comment to something like:
"We're looping through all the blocks that we need to query and kicking off parallel queries which are bounded by the queryWorkersPool's maximum concurrency. This means that it's possible at this point that we've completed querying one or more blocks and already exhausted the maximum number of results that we're allowed to return. If thats the case, there is no value in kicking off more parallel queries, so we break out of the loop."""
I know its a little verbose but I've seen multiple iterations of this code already and other people coming in with fresh eyes will probably have more trouble following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing the comment, will do - hah 👍
src/dbnode/storage/index.go
Outdated
results.Lock() | ||
// Signal not to add any further results since we've returned already. | ||
results.returned = true | ||
state.Lock() | ||
// Take reference to vars to return while locked, need to allow defer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand this comment. What deadlock are you protecting against? The only Lock/Defer that is see is in the execBlockQuery func and I don't see how that would deadlock with this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove that statement, there used to be other stuff going on.
queryValid := cancellable.TryCheckout() | ||
if !queryValid { | ||
// Query not valid any longer, do not add results and return early | ||
return batch, 0, errCancelledQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this error will still get propagated up to the level above and stored in the state multierr.....I guess thats fine though since you take a copy of the multierr when you return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we always lock when we access multierr.
src/dbnode/storage/index/results.go
Outdated
|
||
r.opts = opts | ||
|
||
// finalize existing held nsID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add periods to the comments in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, this is old code but I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS we should really create a lint rule so you don't have to manually ask this btw (not urgent, but something we should think about)
r.nsID = nsID | ||
|
||
// reset all values from map first | ||
for _, entry := range r.resultsMap.Iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt you be doing this in Finalize()
not Reset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see you call reset from finalize
src/dbnode/storage/index/results.go
Outdated
} | ||
|
||
// reset all keys in the map next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a quick comment saying this will finalize the keys
src/dbnode/storage/index/results.go
Outdated
) (added bool, size int, err error) { | ||
added = false | ||
) (bool, int, error) { | ||
added := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: I'd personally find this more readable if you just explicitly returned false/true in each code path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
// take a copy of the bytes backing the documents so the original can be | ||
// modified after this function returns without affecting the results map. | ||
// If documents with duplicate IDs are added, they are simply ignored and | ||
// the first document added with an ID is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a TODO here saying we may need to change the exact behavior here once the index becomes mutable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing.
// is already cancelled this will return false, otherwise it will return | ||
// true and guarantee the lifetime is not cancelled until the checkout | ||
// is returned. | ||
func (l *CancellableLifetime) TryCheckout() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add to the comment: If this returns true you MUST call ReleaseCheckout
later, but if it returns false calling ReleaseCheckout
will panic (pretty sure unlocking an unlocked lock panics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
require.False(t, ok) | ||
|
||
// Ensure that cancel finished | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Instead of doing loop here, maybe add a channel that the cancel goroutine pushes a struct to after setting the bool?
src/dbnode/storage/index.go
Outdated
} | ||
alreadyNotExhaustive := opts.Limit > 0 && mergedSize >= opts.Limit | ||
size := results.Size() | ||
alreadyNotExhaustive := opts.LimitExceeded(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this be renamed to alreadyExceededLimit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
src/dbnode/storage/index/block.go
Outdated
indexOpts.SegmentBuilderOptions(), | ||
indexOpts.FSTSegmentOptions(), | ||
compaction.CompactorOptions{ | ||
MmapDocsData: opts.ForegroundCompactorMmapDocsData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there may be an error here; backgroundCompactor
uses ForegroundCompactorMmapDocsData
, and foregroundCompactor
uses BackgroundCompactorMmapDocsData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, good call - TY for this catch.
src/dbnode/storage/index/types.go
Outdated
// Results is a collection of results for a query. | ||
// Results is a collection of results for a query, it is synchronized | ||
// when access to the results set is used as documented by the methods. | ||
// It cannot be written to after it is sealed, until it's reopened by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be sealed in the current implementation, so this comment may be wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, thanks will update.
return r.resultsMap | ||
r.RLock() | ||
v := r.resultsMap | ||
r.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this make a copy of the resultsMap? Otherwise won't downstream consumers be able to modify this map without a lock or have their map changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to the call to Map about this being unsafe. We can't really take a copy since it will cause a ton of allocations.
DoAndReturn(func(q index.Query, opts index.QueryOptions, r index.Results) (bool, error) { | ||
Query(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()). | ||
DoAndReturn(func( | ||
l *resource.CancellableLifetime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meganit: these can all be _
src/dbnode/storage/index/results.go
Outdated
) (added bool, size int, err error) { | ||
added = false | ||
) (bool, int, error) { | ||
added := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
testOpts = optionsWithDocsArrayPool(NewOptions(), 1, 1) | ||
} | ||
|
||
func optionsWithDocsArrayPool(opts Options, size, capacity int) Options { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering what the purpose of this function is rather than just doing this in the init
; just a convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call this method elsewhere.
copy(allDocsCopy, allDocs) | ||
fstData.DocsReader = docs.NewSliceReader(0, allDocsCopy) | ||
} else { | ||
// Otherwise encode and reference the encoded bytes as mmap'd bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with this path, but just verifying it's ok to have DocsReader be nil here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, if we supply the docs data and docs index data then its safe to have DocsReader be nil.
Fixes #1469