-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[m3msg] Use multiple connections for M3Msg writers #2230
[m3msg] Use multiple connections for M3Msg writers #2230
Conversation
daef618
to
e070f43
Compare
Codecov Report
@@ Coverage Diff @@
## r/add-m3msg-aggregator-client-server #2230 +/- ##
======================================================================
Coverage ? 40.9%
======================================================================
Files ? 837
Lines ? 74980
Branches ? 0
======================================================================
Hits ? 30695
Misses ? 41270
Partials ? 3015
Continue to review full report at Codecov.
|
With a build from 5aa1c4e (go1.12.9) used for both aggregator and cordinator, I see high idle CPU
|
When I reduced number of m3msg connections on coordinators from 32 to 8, the aggregator CPU use went down from 1.4 to 1.1. Coordinator CPU remained unchanged. That cluster has 6 coordinators, and 12 aggregators. |
Likewise, dropping connection count on aggregators from 32 to 8 resulted in idle CPU drop on cooridinators from 2.7 to 1.5. |
… and set better defaults
func (w *consumerWriterImpl) reset(opts resetOptions) { | ||
w.writeState.Lock() | ||
prevConns := w.writeState.conns | ||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work better replacing the conns in place, and closing the previous one after it's replaced? Or it the size of conns not guaranteed to be the same every reset?
w.rw.Writer.Reset(conn) | ||
w.lastResetNanos = w.nowFn().UnixNano() | ||
w.writeState.lastResetNanos = opts.at.UnixNano() | ||
w.writeState.validConns = opts.validConns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's no branching here; if it's impossible to replace conns in place, can close them here rather than in the defer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main thing is trying to avoid holding lock while doing IO work (closing connection may block for IO, so don't want to do that while we hold any locks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, followup on more comments in a consequent PR
…mers in m3msg agg client
… of github.com:m3db/m3 into r/add-m3msg-aggregator-client-server-multi-connections
if acked { | ||
w.m.messageConsumeLatency.Record(time.Duration(w.nowFn().UnixNano() - initNanos)) | ||
// w.m.messageConsumeLatency.Record(time.Duration(w.nowFn().UnixNano() - initNanos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @robskillington a lot of these metrics are commented out, any reason for that? Can we uncomment them?
What this PR does / why we need it:
Allows for multiple connections for M3Msg to use to a single backend instance, also removes a lot of default instrumentation using Prometheus summaries for timers when using the M3Msg aggregator client.
There are bunch of other small fixes in terms of the default connection options for large throughput workloads using M3Msg from point to point connections (i.e. 50k-500k single instance to instance datapoints per second, rather than fan-in of metrics from thousands of hosts which the old rawtcp protocol was mainly designed for).
Making the default write timeout infinite and relying on fast TCP keep alives to (5-10s) to break stale connections are used instead, since IO timeout on connections was frequently hit during bursts of traffic which made it worse (i.e. hard to recover the connection once it degrades even slightly).
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: