Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve commit log reader speed #1724

Merged
merged 6 commits into from
Jun 13, 2019

Conversation

robskillington
Copy link
Collaborator

What this PR does / why we need it:

Faster commit log bootstrapping with significantly less resources used.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

Copy link
Contributor

@richardartoul richardartoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if C.I passes since these code paths are pretty extensively tested.

There was a bug in how you were calling the seriesPredicate (passing nil for seriesID and namespace) that the commitlog package did not catch but the commitlog bootstrapper package did.

I went ahead and fixed it + added a fix to the commitlog package tests to catch it in the future. Pushed the changes to this branch already so you can look if you're curious.

decoderStream.Reset(arg.bytes[arg.offset:])
decoder.Reset(decoderStream)
entry, err := decoder.DecodeLogEntryRemaining(arg.decodeRemainingToken, arg.uniqueIndex)
entry, err = msgpack.DecodeLogEntryFast(r.logEntryBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo

return seriesMetadata{}, errCommitLogReaderMissingMetadata
}

decoded, err := msgpack.DecodeLogMetadataFast(entry.Metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D


idPool := r.opts.IdentifierPool()
tags = idPool.Tags()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the tags that will eventually get loaded into the series? I've been moving away from using this pool in places where things are long-lived because the default slice size is large (12 I think? maybe more) so it ends up wasting a lot of memory if you have less tags

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it's not 100% clear whether they end up getting used or not ultimately because this is just the "reader".

I would probably prefer a caller copies to their own set of exact length tags, then returns the one that got pulled from the pool here back to the pool.

In fact, maybe what I'll do is change it so we always reuse the ID and Tags between each read so that caller to copy, which should reduce pool overuse in general for someone just wanting to read and not necessarily use these things for long term use (i.e. just quickly reading through the commit log).

Are you ok with that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that has wider sweeping changes (changing to return something that needs to be copied) because it means we can't keep around the seriesMetadata lookup map in the commit log reader itself.

I'm going to keep it to just returning pooled things that can be returned if needed and a copy taken if needed, we can iterate on this in later changes perhaps.

if r.nextIndex == 0 {
return r.close()
}
return r.chunkReader.fd.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you not need the nil check anymore?

Copy link
Collaborator Author

@robskillington robskillington Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I audited the chunk reader and seems it's not being set to nil anywhere and it gets created at creation time.

@codecov
Copy link

codecov bot commented Jun 11, 2019

Codecov Report

Merging #1724 into master will decrease coverage by <.1%.
The diff coverage is 86.9%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1724     +/-   ##
========================================
- Coverage    71.9%   71.8%   -0.1%     
========================================
  Files         980     980             
  Lines       81991   81870    -121     
========================================
- Hits        58971   58859    -112     
+ Misses      19146   19139      -7     
+ Partials     3874    3872      -2
Flag Coverage Δ
#aggregator 82.4% <ø> (-0.1%) ⬇️
#cluster 85.7% <ø> (ø) ⬆️
#collector 63.9% <ø> (ø) ⬆️
#dbnode 79.9% <86.9%> (-0.1%) ⬇️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 74% <ø> (ø) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.6% <ø> (ø) ⬆️
#msg 74.7% <ø> (ø) ⬆️
#query 66.3% <ø> (ø) ⬆️
#x 85.1% <ø> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 413b984...2aa9af1. Read the comment docs.

@richardartoul
Copy link
Contributor

This P.R makes me happy :) Unfortunately we'll still need the lazy loading for this to be truly fast (the force merge stuff is really slow right now), but this is a great start and the lazy merge stuff isn't too bad

r.infoDecoderStream.Reset(r.logEntryBytes)
r.infoDecoder.Reset(r.infoDecoderStream)
logInfo, err := r.infoDecoder.DecodeLogInfo()
return logInfo, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can just return r.infoDecoder.DecodeLogInfo()

@robskillington robskillington changed the title [WIP] Faster commit log bootstrapper Improve commit log reader speed Jun 12, 2019
@robskillington
Copy link
Collaborator Author

I added an integration test that can dynamically be turned up higher/lower for number of series, and have ensured that this is somewhat faster (it only is roughly 10-20% faster however at this point in time).

I'll iterate on the commit log bootstrapper to improve times and measure using the integration test.

@richardartoul
Copy link
Contributor

@robskillington Yeah its probably only 10-20% faster because we reduced the parallelism from 4x to 1x. Your benchmark only covers the commitlogs not the snapshots though. I'd probably land this as is since its pretty contained and easy to review.

I'm happy to review the other P.Rs once they're ready.

@robskillington robskillington merged commit 089da87 into master Jun 13, 2019
@robskillington robskillington deleted the r/commit-log-bootstrapper-enhancements branch June 13, 2019 00:02
@robskillington
Copy link
Collaborator Author

@richardartoul agreed, I do want to speed it up much further but this is a good stopping point for this change. And yeah, I understand it's because we sacrificed parallelism =]

@richardartoul
Copy link
Contributor

richardartoul commented Jun 13, 2019

@robskillington out if curiosity was there a point where you could benchmark both the concurrency and the fast decoding? Although I guess now at least you get 3 cores back which is pretty nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants