Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement (opt-in) WriteBatchRawV2 that can batch across namespaces #1974

Merged
merged 6 commits into from
Oct 4, 2019

Conversation

richardartoul
Copy link
Contributor

@richardartoul richardartoul commented Oct 1, 2019

For workloads that involve multiple namespaces the existing implementation can lead to a lot of extraneous RPC which leads to increased load / instability of the M3DB nodes. This P.R adds the ability to opt-in on the client (when working with versions of M3DB that support the new APIs) to use a new API that batches writes across namespaces transparently leading to improved performance.

@codecov
Copy link

codecov bot commented Oct 1, 2019

Codecov Report

Merging #1974 into master will increase coverage by <.1%.
The diff coverage is 74.4%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #1974     +/-   ##
=========================================
+ Coverage    63.4%    63.4%   +<.1%     
=========================================
  Files        1119     1119             
  Lines      105570   105968    +398     
=========================================
+ Hits        66969    67253    +284     
- Misses      34315    34402     +87     
- Partials     4286     4313     +27
Flag Coverage Δ
#aggregator 79.7% <ø> (-0.1%) ⬇️
#cluster 56.3% <ø> (ø) ⬆️
#collector 63.7% <ø> (ø) ⬆️
#dbnode 64.8% <74.4%> (ø) ⬆️
#m3em 59.6% <ø> (ø) ⬆️
#m3ninx 61.1% <ø> (ø) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.7% <ø> (ø) ⬆️
#msg 74.9% <ø> (ø) ⬆️
#query 68.2% <ø> (ø) ⬆️
#x 75% <ø> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67da159...04bc245. Read the comment docs.


// UseV2BatchAPIs determines whether the V2 batch APIs are used. Note that the M3DB nodes must
// have support for the V2 APIs in order for this feature to be used.
UseV2BatchAPIs *bool `yaml:"useV2BatchAPIs"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this in the long term, is this sustainable? The config might be littered with different versions for different sets of APIs and it could get very messy. How about just a single APIVersion *string where users will put in "0.2.3" or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to deprecate the old APIs soon so I'm not super worried about it but I can make it a string. I'll probably keep it a bool in the guts of the codebase though just to keep things simple

src/dbnode/generated/thrift/rpc.thrift Outdated Show resolved Hide resolved
struct WriteBatchRawV2RequestElement {
1: required binary id
2: required Datapoint datapoint
3: required i64 nameSpace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this might save a ton of memory, but might be awkward to use from a user perspective. I suppose there can be a thin wrapper around this interface to make this easier.

Why not go all out and have list<binary> ids at the request level and required i64 id here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one will interact with this directly, it all is handled transparently by the client so I think thats fine.

I've spoken with Rob before about the IDs thing and its pretty uncommon to have multiple data points for the same ID in one request so I'm gonna leave that out for now for simplicity

struct WriteTaggedBatchRawRequestElement {
1: required binary id
2: required binary encodedTags
3: required Datapoint datapoint
}

struct WriteTaggedBatchRawV2RequestElement {
1: required binary id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment regarding list of ids at the request level here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same response

src/dbnode/client/write_tagged_op.go Show resolved Hide resolved
src/dbnode/client/write_op.go Show resolved Hide resolved
writeOpBatchSize tally.Histogram
fetchOpBatchSize tally.Histogram
status status
serverSupportsV2APIs bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to above regarding versioning. Imagining a serverSupportsV3APIs and serverSupportsV4APIs tag here later on is quite painful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this portion I'll leave as is as a bool because with an enum you need to handle the case where its an invalid value. Hopefully we can just delete this code soon

src/dbnode/client/host_queue.go Show resolved Hide resolved
}

seriesID := s.newPooledID(ctx, elem.ID, pooledReq)
batchWriter.Add(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can batchWriter be nil at this stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think so. It starts off as nil and it will get set in the first iteration of the loop. Everytime after that where it gets set to nil (caus a batch was written) we assign a new one

}

seriesID := s.newPooledID(ctx, elem.ID, pooledReq)
batchWriter.AddTagged(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can batchWriter be nil at this stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

Copy link
Collaborator

@justinjc justinjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@richardartoul richardartoul merged commit 193cc7e into master Oct 4, 2019
@richardartoul richardartoul deleted the ra/write-batch-multi-ns branch October 4, 2019 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants