[m3msg] Use multiple connections for M3Msg writers #2230

robskillington · 2020-03-26T18:12:48Z

What this PR does / why we need it:

Allows for multiple connections for M3Msg to use to a single backend instance, also removes a lot of default instrumentation using Prometheus summaries for timers when using the M3Msg aggregator client.

There are bunch of other small fixes in terms of the default connection options for large throughput workloads using M3Msg from point to point connections (i.e. 50k-500k single instance to instance datapoints per second, rather than fan-in of metrics from thousands of hosts which the old rawtcp protocol was mainly designed for).

Making the default write timeout infinite and relying on fast TCP keep alives to (5-10s) to break stale connections are used instead, since IO timeout on connections was frequently hit during bursts of traffic which made it worse (i.e. hard to recover the connection once it degrades even slightly).

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

codecov · 2020-03-26T22:00:02Z

Codecov Report

❗ No coverage uploaded for pull request base (r/add-m3msg-aggregator-client-server@48beec5). Click here to learn what that means.
The diff coverage is 0%.

@@                          Coverage Diff                           @@
##             r/add-m3msg-aggregator-client-server   #2230   +/-   ##
======================================================================
  Coverage                                        ?   40.9%           
======================================================================
  Files                                           ?     837           
  Lines                                           ?   74980           
  Branches                                        ?       0           
======================================================================
  Hits                                            ?   30695           
  Misses                                          ?   41270           
  Partials                                        ?    3015

Flag	Coverage Δ
#aggregator	`0% <0%> (?)`
#cluster	`28.3% <ø> (?)`
#collector	`100% <ø> (?)`
#dbnode	`68.5% <ø> (?)`
#m3em	`100% <ø> (?)`
#m3ninx	`48.3% <ø> (?)`
#m3nsch	`100% <ø> (?)`
#query	`1.3% <0%> (?)`
#x	`46.1% <ø> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48beec5...5aa1c4e. Read the comment docs.

mmedvede · 2020-04-09T17:46:33Z

With a build from 5aa1c4e (go1.12.9) used for both aggregator and cordinator, I see high idle CPU

CPU profiles:
pprof.m3aggregator.samples.cpu.002.pb.gz
pprof.m3coordinator.samples.cpu.011.pb.gz

mmedvede · 2020-04-09T23:38:46Z

When I reduced number of m3msg connections on coordinators from 32 to 8, the aggregator CPU use went down from 1.4 to 1.1. Coordinator CPU remained unchanged. That cluster has 6 coordinators, and 12 aggregators.

mmedvede · 2020-04-10T00:00:23Z

Likewise, dropping connection count on aggregators from 32 to 8 resulted in idle CPU drop on cooridinators from 2.7 to 1.5.

…ection

…with header

… and set better defaults

src/msg/producer/writer/consumer_writer.go

arnikola · 2020-04-22T02:17:40Z

src/msg/producer/writer/consumer_writer.go

+func (w *consumerWriterImpl) reset(opts resetOptions) {
+	w.writeState.Lock()
+	prevConns := w.writeState.conns
+	defer func() {


Would this work better replacing the conns in place, and closing the previous one after it's replaced? Or it the size of conns not guaranteed to be the same every reset?

src/msg/producer/writer/consumer_writer.go

arnikola · 2020-04-22T02:22:30Z

src/msg/producer/writer/consumer_writer.go

-	w.rw.Writer.Reset(conn)
-	w.lastResetNanos = w.nowFn().UnixNano()
+	w.writeState.lastResetNanos = opts.at.UnixNano()
+	w.writeState.validConns = opts.validConns


Looks like there's no branching here; if it's impossible to replace conns in place, can close them here rather than in the defer?

Main thing is trying to avoid holding lock while doing IO work (closing connection may block for IO, so don't want to do that while we hold any locks).

src/msg/producer/writer/consumer_writer.go

src/msg/producer/writer/consumer_writer_test.go

…-connections

notbdu

LGTM, followup on more comments in a consequent PR

…mers in m3msg agg client

…-connections

… of github.com:m3db/m3 into r/add-m3msg-aggregator-client-server-multi-connections

cw9 · 2020-05-02T21:30:59Z

src/msg/producer/writer/message_writer.go

 	if acked {
-		w.m.messageConsumeLatency.Record(time.Duration(w.nowFn().UnixNano() - initNanos))
+		// w.m.messageConsumeLatency.Record(time.Duration(w.nowFn().UnixNano() - initNanos))


Hi @robskillington a lot of these metrics are commented out, any reason for that? Can we uncomment them?

robskillington and others added 4 commits February 25, 2020 09:40

[aggregator] Add M3Msg client and server for M3Aggregator

5581ef5

Add integration test for new aggregator m3msg server and client

6d5ac47

Fix build

38b921e

Merge branch 'master' into r/add-m3msg-aggregator-client-server

48beec5

robskillington force-pushed the r/add-m3msg-aggregator-client-server-multi-connections branch from daef618 to e070f43 Compare March 26, 2020 18:18

robskillington changed the title ~~[m3msg] Use multiple TCP connections for M3Msg connections~~ [m3msg] Use multiple connections for M3Msg writers Mar 26, 2020

Merge branch 'master' into r/add-m3msg-aggregator-client-server

898510f

robskillington and others added 18 commits April 17, 2020 10:58

Merge branch 'master' into r/add-m3msg-aggregator-client-server

cd980db

Address comments

9e210db

Merge branch 'master' into r/add-m3msg-aggregator-client-server

f1aa0d6

Address feedback

a4d6483

Fix tests

79cdf7a

Fix aggregator integration tests

a309c0c

Revert queue metrics scoping

91fe98f

Merge branch 'master' into r/add-m3msg-aggregator-client-server

f1a41a1

Fix complaints about %w error wrapping

405e130

Set instrument options for m3msg options

6cd77db

[m3msg] Use multiple TCP connections for M3Msg connections

c498d17

Allow num connections to be set by config

f78af72

Use consistent connection per shard and only block sends on same conn…

eb4235f

…ection

Default using no write timeout and rely on keep alive

dff54fc

Cleanup

c91c036

Fix legacy aggregator client, add ability to force aggregator writes …

edbf36d

…with header

Fix client close

060360d

Remove instrumentation (need to switch to histograms if m3msg client)…

71d81bc

… and set better defaults