Add downsampling capabilities for coordinator #796

robskillington · 2018-07-16T16:45:00Z

This change builds on the multi-cluster support for the coordinator that adds embedded downsampling to the coordinator.

This is ready to review, the API is stable, only thing remaining is test coverage that is being worked on now.

codecov · 2018-07-16T17:00:33Z

Codecov Report

Merging #796 into master will decrease coverage by 0.27%.
The diff coverage is 63.93%.

@@            Coverage Diff             @@
##           master     #796      +/-   ##
==========================================
- Coverage   77.91%   77.63%   -0.28%     
==========================================
  Files         368      374       +6     
  Lines       31799    32418     +619     
==========================================
+ Hits        24775    25167     +392     
- Misses       5342     5537     +195     
- Partials     1682     1714      +32

Flag	Coverage Δ
#coordinator	`61.03% <63.93%> (?)`
#dbnode	`81.36% <ø> (+0.02%)`	⬆️
#m3ninx	`72.7% <ø> (ø)`	⬆️
#query	`?`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6505ba...93b5cc9. Read the comment docs.

arnikola · 2018-07-16T17:24:57Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+
+	numRollups := matchResult.NumRollups()
+	for i := 0; i < numRollups; i++ {
+		rollup, ok := matchResult.RollupsAt(i, now.UnixNano())


nit: Better to flip this and add samplesAppender if ok

Sure thing.

arnikola · 2018-07-16T17:25:21Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+	id := a.encodedTagsIteratorPool.Get()
+	id.Reset(unownedID)
+	now := time.Now()
+	fromNanos, toNanos := now.Add(-1*a.clockOpts.MaxNegativeSkew()).UnixNano(),


nit: might be easier to read as��

fromNanos := now.Sub(a.clockOpts.MaxNegativeSkew()).UnixNano() toNanos := now.Add(1*a.clockOpts.MaxPositiveSkew()).UnixNano()�

Sure thing.

arnikola · 2018-07-16T17:27:00Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+
+func (a *metricsAppender) SamplesAppender() (SamplesAppender, error) {
+	// Sort tags
+	sort.Sort(a.tags)


Would it be better to insert the tags in the sorted location?

That requires an alloc, heh - I'm going out of my way to avoid the alloc for each time this needs to happen.

Aren't you appending new tags anyway? Would putting it in the right place require an additional alloc instead?

arnikola · 2018-07-16T17:27:33Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+func (w *downsamplerFlushHandlerWriter) Write(
+	mp aggregated.ChunkedMetricWithStoragePolicy,
+) error {
+	w.wg.Add(1)


It seems a little odd to me to have this function touching the internal wait group; what's the difference between doing it this way and calling Flush at the end v.s. the calling function handling the parallelism?

So accruing the outputs with Write then actually asking the storage to write the samples out when Flush is called would add a large amount of latency to the whole process, you ideally want to start writing as soon as possible.

The whole Write and Flush interleaving is much better for pipe type of communication, like TCP, which the aggregator interfaces seemed to have been optimized for unfortunately.

So this is why I am complying with the interfaces but ensuring that we can reduce latency and pending requests by writing immediately, and just using Flush to synchronize with the caller.

arnikola · 2018-07-16T17:27:59Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+	}, nil
+}
+
+func (h *downsamplerFlushHandler) Close() {


nit: can we get a comment here? Either // noop or // TODO depending on what's required

Sure thing, I'll add no-op.

arnikola · 2018-07-16T17:58:58Z

src/cmd/services/m3coordinator/downsample/id_pool_types.go

+
+type encodedTagsIteratorPool struct {
+	tagDecoderPool serialize.TagDecoderPool
+	pool           pool.ObjectPool


Worth adding a xpool.CheckedBytesWrapperPool too?

Hm, doesn't seem like it's needed anywhere currently?

arnikola · 2018-07-16T17:59:14Z

src/cmd/services/m3coordinator/downsample/id_pool_types.go

+
+// TagValue returns the value for a tag value.
+func (it *encodedTagsIterator) TagValue(tagName []byte) ([]byte, bool) {
+	it.tagDecoder.Reset(it.bytes)


Should this clone the tagIterator first, then reset that? If I'm reading this right if you ever call TagValue, it'll mess with the current iterator position, and if you were trying to get value for a missing name, you'd use up the iterator?

Would it be worth it to decode the tagDecoder to a list of tags on Reset for this?

Fair, I'll change it to clone.

arnikola · 2018-07-16T17:59:50Z

src/cmd/services/m3coordinator/downsample/id_pool_types.go

+) *encodedTagsIterator {
+	return &encodedTagsIterator{
+		tagDecoder: tagDecoder,
+		bytes:      checked.NewBytes(nil, nil),


Worth getting this from a pool?

Since encodedTagsIterator itself is already pooled this is only paid when a single encodedTagsIterator is created in the pool so I think it would be the same cost to also pool the checked.NewBytes(...) because the entire encodedTagsIterator will be reused anyhow, hence we don't allocate checked.NewBytes(...) over and over.

arnikola · 2018-07-16T18:01:34Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+}
+
+// Validate will validate the dynamic downsampling options.
+func (o DownsamplerOptions) Validate() error {


nit: Should Validate() have the responsibility of checking that o.StorageFlushConcurrency > 0, and setting it to default if not?

Does Validate() needs to be public? seems it's only called internally?

Sure thing, I'll remove visibility.

Also: validate steps should never mutate something, hence I'm pretty against mutating during validate.

arnikola · 2018-07-16T18:06:09Z

src/coordinator/services/m3coordinator/server/server.go

+		namespaces  = clusters.ClusterNamespaces()
+		downsampler downsample.Downsampler
+	)
+	if n := namespaces.NumAggregatedClusterNamespaces(); n > 0 {


nit: Pull this out into a function?

Hm, this is a function NumAggregatedClusterNamespaces()? Or do you mean like AnyAggregatedClusterNamespaces()?

Sorry, meant the whole block that builds the downsampler not specifically this line haha

Sure thing, done.

cw9 · 2018-07-16T20:51:21Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+	initAllocTagsSliceCapacity     = 32
+	shardSetID                     = uint32(0)
+	instanceID                     = "downsampler_local"
+	placementKVKey                 = "/placement"


nit: Is this the convention used in this repo? Usually we don't have the / prefix in keys?

This is directly from the aggregator.

cw9 · 2018-07-16T20:53:33Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+}
+
+// MetricsAppender is a metrics appender that can
+// build a samples appender, only valid to use


Should only valid to use with a single caller at a time. be a comment on the implementation instead? Will there ever be a thread safe implementation for this?

cw9 · 2018-07-16T20:57:44Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+}
+
+// Validate will validate the dynamic downsampling options.
+func (o DownsamplerOptions) Validate() error {


Does Validate() needs to be public? seems it's only called internally?

cw9 · 2018-07-16T20:57:55Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+
+type newAggregatorResult struct {
+	aggregator aggregator.Aggregator
+


nit: remove empty lines in the struct?

Sure thing.

cw9 · 2018-07-16T21:51:43Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+		shardSet[i] = shard.NewShard(uint32(i)).
+			SetState(shard.Initializing).
+			SetCutoverNanos(0).
+			SetCutoffNanos(math.MaxInt64)


Do you need to set Cutover and Cutoff nanos? Default value should be good?

Sure thing, can leave as default.

cw9 · 2018-07-17T15:42:29Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+		return newAggregatorResult{}, err
+	}
+
+	placementStore := mem.NewStore()


nit: might be cleaner to create a placement.Service with the mem.Store to init the placement with this instance and the shardIDs with one replica rather than creating the placement and save proto etc?

I tried this, it was too difficult to create a placement service from mem.Store. Also you can't create a services.LeaderService without a concrete etcd client which means can't use that approach.

cw9 · 2018-07-17T15:46:33Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+			return
+		}
+
+		err = w.handler.storage.Write(w.ctx, &storage.WriteQuery{


Do you want any retry here?

I believe for this we'll allow the M3DB session client to retry the write (it has a configurable write retrier), just so there is one way to do it. We do it at a higher level in the ingester but only because we want to know when we're performing retries, etc with metrics.

cw9 · 2018-07-17T15:48:15Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+	return nil
+}
+
+type newAggregatorResult struct {


The name seems a bit odd, should this be aggregator instead?

It needs to be something other than aggregator otherwise it'll shadow the import. I can use just agg perhaps.

cw9 · 2018-07-17T15:55:38Z

src/cmd/services/m3coordinator/downsample/id_pool_types.go

+)
+
+// Ensure encodedTagsIterator implements id.SortedTagIterator
+var _ id.SortedTagIterator = &encodedTagsIterator{}


Is encodedTagsIterator implementing more than just the id.SortedTagIterator interface? If so can you add some comments to the struct? Otherwise just return id.SortedTagIterator from the constructor?

I'll create an interface that implements both interfaces id.SortedTagIterator and id.ID.

cw9 · 2018-07-17T16:07:03Z

src/cmd/services/m3coordinator/downsample/id_pool_types.go

+		}
+	}
+	if p.rollupTagIndex == -1 {
+		p.rollupTagIndex = len(p.tagPairs)


Why do we need to treat the rollup tag differently?

Because its not part of the tag pairs, the rollup ID provider actually wraps tag pairs and when enumerated gives you back the rollup tag at the right spot. It's to save re-allocing tag pairs each time when you need to insert the tag, you can just get a pooled rollup ID provider, set the tag pairs and then give it to another component that can iterate them and get back the rollup tag at the right position when enumerating the tags.

cw9 · 2018-07-17T17:04:21Z

src/cmd/services/m3coordinator/downsample/tags.go

+}
+
+func (t *tags) Current() ident.Tag {
+	t.nameBuf = append(t.nameBuf[:0], t.names[t.idx]...)


This is copying the bytes? Do you need the copy?

It's actually turning a string t.names[...] into bytes, doing so by using a re-useable buffer so its cheap.

nikunjgit

Might need to add some tests for at least the basic use cases.

nikunjgit · 2018-07-17T17:06:35Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+
+	flushWorkers := xsync.NewWorkerPool(storageFlushConcurrency)
+	flushWorkers.Init()
+	handler := newDownsamplerFlushHandler(o.Storage, sortedTagIteratorPool,


yeah I agree, we're initializing pools, rollups, setting up placements, etc. Maybe split them into smaller functions ?

nikunjgit · 2018-07-17T17:17:03Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+		err := iter.Err()
+		iter.Close()
+		if err != nil {
+			logger.Debugf("downsampler flush error preparing write: %v", err)


since we don't return back the error, maybe make these logger.Error ? Otherwise, we would probably lose these

Sure thing.

nikunjgit · 2018-07-17T17:17:15Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+			},
+		})
+		if err != nil {
+			logger.Debugf("downsampler flush error failed write: %v", err)


same comment as above, consider logger.Error ?

Sure thing.

nikunjgit · 2018-07-17T17:25:49Z

src/cmd/services/m3coordinator/downsample/tags.go

+}
+
+func (t *tags) Duplicate() ident.TagIterator {
+	return &tags{idx: -1, names: t.names, values: t.values}


is duplicate meant to clone ? If yes, you will probably have to copy the slices ?

It's not meant to clone, it's just meant to be a duplicate iterator with the position reset.

nikunjgit · 2018-07-18T15:27:50Z

src/coordinator/api/v1/handler/prometheus/remote/write.go

+	downsampler downsample.Downsampler,
+	scope tally.Scope,
+) (http.Handler, error) {
+	if store == nil && downsampler == nil {


curious about this condition. Traditionally, we don't check if arguments passed to a struct are null except if they are coming from a config.

I prefer to be defensive.

nikunjgit · 2018-07-18T15:30:42Z

src/cmd/services/m3coordinator/downsample/tags.go

+	"github.com/m3db/m3x/ident"
+)
+
+type tags struct {


good idea to unit test the Current() & Next() usage ?

Yup, adding coverage.

nikunjgit · 2018-07-18T15:32:48Z

src/cmd/services/m3coordinator/downsample/tags.go

+}
+
+func (t *tags) Next() bool {
+	hasNext := t.idx+1 < len(t.names)


seems like other functions in this package are appending to t.names. In that case, Next() can return false first but then return true which makes it a bit confusing.

You're meant to construct it first, then use it as an iterator. I don't want to create intermediate structures or else they can't be pooled easily.

nikunjgit · 2018-07-18T15:33:44Z

src/coordinator/api/v1/handler/prometheus/remote/write.go

+		writeUnaggregatedErr error
+		writeAggregatedErr   error
+	)
+	if h.downsampler != nil {


this block can be another function ? Similarly, h.store != nil block can be another function

Yup, good call. Done.

nikunjgit · 2018-07-18T15:37:48Z

src/coordinator/api/v1/handler/prometheus/remote/write.go

+			request := newLocalWriteRequest(write, h.store)
+			requests = append(requests, request)
+		}
+		writeUnaggregatedErr = execution.ExecuteParallel(ctx, requests)


seems like we are using two very different ways of writing aggregated vs unaggregated (multiErr vs failing on first error). If you think multiErr is better, then let's switch to that ?

just to clarify: my worry here is that for downsampled metrics we would end up writing all endpoints, ignoring any intermediate errors but for unaggregated metrics, we will fail on first error. I think that behavior should be consistent ? Maybe consider hiding the actual downsampled and unaggregated writes behind another object instead of putting this in remote/write.go ? We also have native/write.go and you don't want to replicate the logic in two places.

Sounds good, I've refactored this to always try and write every incoming value regardless of how many errors we encounter.

nikunjgit

Were you planning on adding some coverage ?

nikunjgit · 2018-07-30T17:35:37Z

src/cmd/services/m3coordinator/downsample/downsampler.go

+
+	flushWorkers := xsync.NewWorkerPool(storageFlushConcurrency)
+	flushWorkers.Init()
+	handler := newDownsamplerFlushHandler(o.Storage, sortedTagIteratorPool,


thoughts about splitting it ?

nikunjgit · 2018-07-30T17:41:16Z

src/coordinator/api/v1/handler/prometheus/remote/write.go

+
+			wg.Done()
+		}()
+	}


wg.Wait() ?

Good call, done.

robskillington · 2018-08-01T03:33:30Z

I'll split it up, sounds good.

…tion

robskillington · 2018-08-02T14:50:06Z

This is all refactored, and good to review.

nikunjgit · 2018-08-01T21:49:58Z

src/cmd/services/m3coordinator/downsample/flush_handler.go

+		if len(mp.ChunkedID.Suffix) != 0 {
+			expected++
+		}
+		tags := make(models.Tags, expected)


consider always going make(model.Tags, iter.NumTags()+1) and have a comment that sometimes we can have a suffix. Might be easier to read.

Sure thing.

nikunjgit · 2018-08-02T17:50:22Z

src/cmd/services/m3coordinator/downsample/leader_local.go

+	"github.com/m3db/m3cluster/services/leader/campaign"
+)
+
+type localLeaderService struct {


comment on what this does ?

nikunjgit · 2018-08-02T17:52:32Z

src/cmd/services/m3coordinator/downsample/tags.go

+	valueBuf []byte
+}
+
+var _ ident.TagIterator = &tags{}


var _ ident.TagIterator = (*tags)(nil)

Sure thing.

robskillington · 2018-08-03T18:55:27Z

Going to merge, there's some issue with the Prometheus integration test running out of disk space... need to rename the repository to make progress today, so going to merge this first.

Add downsampling capabilities for coordinator

73496f7

robskillington mentioned this pull request Jul 16, 2018

[WIP] Add rudimentary downsampling for m3coordinator #744

Closed

Fix coordinator httpd tests

0ddf468

arnikola reviewed Jul 16, 2018

View reviewed changes

Address feedback

b08ecae

cw9 reviewed Jul 16, 2018

View reviewed changes

Address feedback

2acae20

cw9 reviewed Jul 17, 2018

View reviewed changes

nikunjgit reviewed Jul 18, 2018

View reviewed changes

Rob Skillington added 5 commits July 25, 2018 10:32

Refactor constructing the downsampler

2d9f509

Refactor prometheus write handler as per feedback

aa51edd

Refactor based on feedback

d227be2

Merge branch 'master' into r/coordinator-downsampling-multicluster

df613d3

Make downsampler ID types fit updated interfaces

922dc5a

nikunjgit reviewed Jul 30, 2018

View reviewed changes

Rob Skillington added 4 commits July 30, 2018 18:42

Add test coverage for downsample package

bcda193

Fix test build issues

82e9a0e

Fix TestConfiguration

1c849a6

Use local leader service and remove etcd integration cluster

a3c0b2d

Rob Skillington added 4 commits August 1, 2018 00:03

Refactor newAggregator into smaller constructor methods

5d08844

Refactor newAggregator further for independent sub-component construc…

5bb2b1a

…tion

Remove unused variable

cb6ad3e

Move downsampler types to their own files

6208d1f

nikunjgit approved these changes Aug 2, 2018

View reviewed changes

Rob Skillington added 3 commits August 2, 2018 14:17

Address minor feedback

bc5b4c8

Fix spelling mistake

9d9a4ff

Merge branch 'master' into r/coordinator-downsampling-multicluster

93b5cc9

robskillington merged commit 29c4607 into master Aug 3, 2018

robskillington deleted the r/coordinator-downsampling-multicluster branch August 3, 2018 18:55


		type newAggregatorResult struct {
		aggregator aggregator.Aggregator

Add downsampling capabilities for coordinator #796

Add downsampling capabilities for coordinator #796

Conversation

robskillington commented Jul 16, 2018

codecov bot commented Jul 16, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Jul 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Jul 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Jul 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikunjgit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikunjgit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington commented Aug 1, 2018

codecov bot commented Jul 16, 2018 •

edited

Loading

robskillington Jul 16, 2018 •

edited

Loading

robskillington Jul 17, 2018 •

edited

Loading

robskillington Jul 28, 2018 •

edited

Loading