Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coordinator] Store metrics type into the annotation #2628

Merged
merged 18 commits into from
Oct 6, 2020

Conversation

linasm
Copy link
Collaborator

@linasm linasm commented Sep 11, 2020

What this PR does / why we need it:
It introduces an annotation.Payload protobuf message which (marshalled to bytes) is used for structured storage of data within time series annotations.
Currently it has just a single field - metrics type - which gets set within remote write flow (if the value is provided to the /prom/remote/write endpoint).

Special notes for your reviewer:
Open questions:

  1. There was a naming clash in src/query/generated/proto/prompb/types.proto (type field and its enum values) with metric types used in Graphite ingestion flow. Apparently, binary (tag based) encoding is used, so I have renamed the clashing elements (keeping protobuf tag values the same). This was discussed with @shreyassrivatsan. It might be a good idea to eventually migrate Graphite ingestion flow to use the same type field (and merge the enum values), but this would require a phased deployment, so here I am not doing anything in that direction.

  2. I have introduced a separate set of internal (Go code, not protobuf) metrics types for Prometheus (vs Graphite), which is not very nice. I believe those sets could be merged into one already, but for that I would need to know how to handle the additional metrics types in current flows (such as https://github.com/m3db/m3/compare/linasm/metrics-type-metadata?expand=1#diff-889532daf887c0868320b34b83371b72R157-L163 and https://github.com/m3db/m3/compare/linasm/metrics-type-metadata?expand=1#diff-3668488cd7a0545acaa9c635c5bdb46eL487-R493).

  3. Do we want to compress the protobuf message persisted in the annotation? Currently the message size would be either 0 (default values) or 2 bytes and the compression would probably just add an overhead. But this has to be considered looking into the future. Alternatively, if we need to include some large field in future (say, help), that field could be compressed individually.

  4. Currently DBNode thrift API reads/writes annotations as binary fields, and this is what I have used here (for writing). Would we want to replace that with something more structured on the API level (since we are trying to structure it internally already)?

Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE

Does this PR require updating code package or user-facing documentation?:
NONE

@@ -58,7 +58,7 @@ type SamplesAppenderResult struct {
type SampleAppenderOptions struct {
Override bool
OverrideRules SamplesAppenderOverrideRules
MetricType ts.MetricType
MetricType ts.M3MetricType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this as MetricType

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can. But I renamed it in order to match PromMetricType in https://github.com/m3db/m3/pull/2628/files/aeb8036681303b0de6312eb47006d49f59a4a631#diff-0535bd9002d5b34759706aebeec1ee46
I believe having PromMetricType and just MetricType at the same scope/level would be confusing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that MetricType is implicitly scoped to M3MetricType just because we're in the M3 project, so all types can be considered M3Type by default

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish it were so simple :) Unfortunately we bring in a protobuf with a different MetricType from Prometheus project which makes this whole point moot.

@@ -0,0 +1,18 @@
syntax = "proto3";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This probably makes more sense to put in query, since it's an external value sent from prom remote write

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we will have to use it in the aggregation process on dbnode.

@@ -284,6 +288,98 @@ func TestPromWriteAggregatedMetricsWithHeader(t *testing.T) {
require.Equal(t, http.StatusOK, resp.StatusCode)
}

func TestPromWriteMetricsTypes(t *testing.T) {
ctrl := gomock.NewController(t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use xtest.NewController instead of the default one, it has some helper functions that fail fast when a function is not EXPECTed rather than hanging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Type type = 101;
Source source = 102;
M3Type m3_type = 101;
Source source = 102;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mind lining these up with the defs before the comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

func (i *promTSIter) Next() bool {
i.idx++
return i.idx < len(i.tags)
if i.idx >= len(i.tags) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need an error check now, i.e. if i.err != nil { return false }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// NB: These are custom fields that M3 uses. They start at 101 so that they
// should never clash with prometheus fields.
Type type = 101;
Source source = 102;
M3Type m3_type = 101;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we keep these as Type? Otherwise it stutters heavily, e.g. ts.M3_Type == M3Type. M3_COUNTER

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well thats just an unfortunate consequence of having to add MetricType type to match Prometheus. On the message name, see my comment above. And on the enum value name (M3_COUNTER etc.) - Protobuf does not allow enums sharing with the same named values in the same scope.
Long term, I think we should merge M3Type into MetricType and get rid of it (actually, only M3_TIMER would have to be added, with some high tag number to avoid future collision with something added by Prometheus). But that would require a phased migration of whole ingestion pipeline.
See my comments on this PR for more details.

Comment on lines 96 to 97
M3Type: M3MetricTypeGauge,
PromType: PromMetricTypeUnknown,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these are both default values, can remove them here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove. Just for the record, forcing to look into 3 different places to understand what you are getting as a default hurts readability.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah understandable, just Unknown etc are probably expected as defaults for enums

)

// PromMetricType is the enum for Prometheus metric types.
type PromMetricType int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; These can probably be uint8 rather than int

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

src/query/storage/converter.go Show resolved Hide resolved
src/query/storage/converter.go Show resolved Hide resolved
return false
}

i.annotation, err = annotationPayload.Marshal()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use MarshalAppend rather than Marshal? Marshal always allocs a slice, but we can pass in a zeroed i.annotation into MarshalAppend to avoid that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using MarshalTo(data) initially, but later discovered that this leads to concurrency issues because the output of promTSIter is used within multiple goroutines.

Comment on lines 87 to 89
M3Type M3MetricType
PromType PromMetricType
PromFamilyType PromMetricFamilyType
Copy link
Collaborator

@arnikola arnikola Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we split this into two types? One which covers the input prom types, and one that is converted into that only describes the m3 type? Otherwise there's potential situations where these may drift

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changing now that I am removing the FamilyType concept (as per our earlier discussion). With that in mind, can you clarify what exactly do you want to split (the comment is not very clear to me)?

@linasm linasm requested a review from arnikola October 5, 2020 09:52
Comment on lines +6 to +7
MetricType metric_type = 1;
bool handle_value_resets = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use camel case rather than underscoring here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using snake case for field names is suggested by the official protobuf style guide. Unfortunately we have mixed both snake and camel in various places, but for new files (like in this case) the decision is to stick to the style guide.

Comment on lines 11 to 18
UNKNOWN = 0;
COUNTER = 1;
GAUGE = 2;
HISTOGRAM = 3;
GAUGE_HISTOGRAM = 4;
SUMMARY = 5;
INFO = 6;
STATESET = 7;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meganit; mind aligning these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


const (
// PromMetricTypeUnknown is the unknown Prometheus metric type.
PromMetricTypeUnknown PromMetricType = iota
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; can remove MetricType from the name, so these would be PromCounter and M3Counter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started doing it, and then realized that names such as PromUnknownand PromInfo would lose too much context. Decided to keep the current names, also consistent with the preexisting M3MetricTypeGauge etc.

@@ -20,22 +20,51 @@

package ts

// MetricType is the enum for metric types.
type MetricType int
// M3MetricType is the enum for M3 metric types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mind adding a comment why our list is shorter than proms? Would be useful especially for people who aren't familiar with the problem space

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +79 to +82
M3Type M3MetricType
PromType PromMetricType
Source SourceType
HandleValueResets bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following up here since GH removed the other comment about splitting this; was just thinking it's probably better not to have both prom and m3 types in a single SeriesAttributes, rather have them separate. Happy to leave it as is though 👍

src/query/storage/converter.go Show resolved Hide resolved
@@ -275,3 +276,128 @@ func BenchmarkFetchResultToPromResult(b *testing.B) {
benchResult = FetchResultToPromResult(fr, false)
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test that converts from annotations to attributes then back to annotations, then back to attributes to ensure the fields get set correctly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would be nice, but there is no such round-trip conversion. The conversion flow goes like this:
API protobuf -> attributes -> annotation payload protobuf

Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG with nits

@linasm linasm merged commit ac2ef9b into master Oct 6, 2020
@linasm linasm deleted the linasm/metrics-type-metadata branch October 6, 2020 08:11
ChrisChinchilla pushed a commit that referenced this pull request Oct 6, 2020
commit 47c219a
Author: Linas Medžiūnas <linasm@users.noreply.github.com>
Date:   Tue Oct 6 13:31:37 2020 +0300

    [dbnode] GetNamespaceFn returns public Namespace interface (#2696)

commit ac2ef9b
Author: Linas Medžiūnas <linasm@users.noreply.github.com>
Date:   Tue Oct 6 11:11:50 2020 +0300

    [coordinator] Store metrics type into the annotation (#2628)

    * Rename existing protobuf TimeSeries.type field to m3_type to avoid collision

    * Add new Prometheus protobuf fields

    * Rename the internal MetricsType to M3MetricsType

    * Implement conversions

    * Write metrics type as annotation payload

    * Avoid reusing annotation slices

    * Fix test

    * Introduce metric family type

    * Address review feedback

    * Revert "Introduce metric family type"

    This reverts commit d108b4f.

    * Introduce annotation.Payload.handle_value_resets field

    * Minor changes according to PR feedback

commit 2df33bf
Author: Linas Medžiūnas <linasm@users.noreply.github.com>
Date:   Tue Oct 6 09:23:48 2020 +0300

    [dbnode] Extended namespace options (#2644)

    * [dbnode] Extended namespace options

    * Add ExtendedOptions to integration tests

    * Fix build

    * Fix build (OptionsToProto)

    * go.sum

    * Imports

    * Allow updating extended namespace options

    * Lint

    * Fixed volumeIndex increment

    * Revert "Fixed volumeIndex increment"

    This reverts commit 2e0342b.

    * Fix TestLocalTypeWithAggregatedNamespace

    * Added protection against golang/protobuf/jsonpb use

    * More protection against non-gogo protobuf package

    * Make namespace tests pass

    * Reduce reptitiveness of protobuf.Any construction

    * Extract ExtendedOptions test infra to x package

    * Add copyright

    * Move test ExtendedOptionsConverter to the packages where it is needed

    * xtest.NewExtendedOptionsProto returns error

    Co-authored-by: Gediminas <gediminas@chronosphere.io>

commit 84f5b98
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Mon Oct 5 14:36:45 2020 -0700

    [query] Implemented the Graphite `applyByNode` function (#2654)

commit 08117d2
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Mon Oct 5 13:28:07 2020 -0700

    [query] Fix snapping bug affecting "moving" function (movingMedian, movingAverage, etc.) (#2694)

commit db4e9a3
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Mon Oct 5 12:49:06 2020 -0700

    [query] ParseTime function supports 95%+ of `from / until` time formats (#2621)

commit 89f6fcc
Author: Gediminas Guoba <gediminas@chronosphere.io>
Date:   Mon Oct 5 21:36:32 2020 +0300

    [largetiles] Fixed volumeIndex increment (#2681)

    * Fixed volumeIndex increment

    * minor refactoring

    Co-authored-by: Linas Medžiūnas <linasm@users.noreply.github.com>
    Co-authored-by: Rob Skillington <rob.skillington@gmail.com>

commit 1df3eed
Author: Linas Medžiūnas <linasm@users.noreply.github.com>
Date:   Mon Oct 5 20:45:50 2020 +0300

    [dbnode] Fix granularity of LeaseManager.UpdateOpenLeases (#2695)

commit 5bcb1ba
Author: Linas Medžiūnas <linasm@users.noreply.github.com>
Date:   Fri Oct 2 16:59:22 2020 +0300

    [dbnode] Support dynamically registered background processes in mediator (#2634)

    * [dbnode] Support dynamically registered background processes in mediator

    * [dbnode] Pass background processes via RunOptions

    * Add a comment for BackgroundProcess

    * Add integration test TestRunCustomBackgroundProcess

    * Include backgroundProcess in TestDatabaseMediatorOpenClose

    * Move the source of BackgroundProcess asynchrony outside for consistency/easier testing

    * Fix test

    * Formatting

    * Update BackgroundProcess doc

    * go mod tidy

    * Verify reporting in TestRunCustomBackgroundProcess

    * Address review feedback

    * Resurrect BackgroundProcess.Start()

    * Enforce mediator.RegisterBackgroundProcess before Open

    * Remove locking from mediator.Report

    * Fix comment

commit ef9ba0a
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Fri Oct 2 02:39:33 2020 -0400

    [changelog] Add changelog for 0.15.16 (#2688)

commit 81c3e19
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Fri Oct 2 01:54:34 2020 -0400

    [coordinator] Configurable writes to leaving shards count towards consistency, add read level unstrict all (#2687)

commit a700d56
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Thu Oct 1 23:09:09 2020 -0400

    [coordinator] Continue write request when context is cancelled (#2682)

commit 526da79
Author: Ryan Allen <rallen090@gmail.com>
Date:   Thu Oct 1 15:48:37 2020 -0400

    [query] - Include FetchQuery in InspectSeries arguments (#2685)

commit 3dedba5
Author: arnikola <artem@chronosphere.io>
Date:   Thu Oct 1 15:18:27 2020 -0400

    [query] Additional testing on Prometheus engine path (#2686)

    * [query] BlockMetadata no longer required for Prometheus engine path

    * Revert code change

    * Exposes ApplyRangeWarnings

commit f996e2d
Author: Linas Medžiūnas <linasm@users.noreply.github.com>
Date:   Thu Oct 1 21:49:33 2020 +0300

    [dbnode] Large tile aggregation improvements (#2668)

    * [dbnode] Large tile aggregation improvements

    * Fix mocking

    * Log block start and volume on error

    * Infra for reverse index sharing

    * StorageOptions.AfterNamespaceCreatedFn

    * Adjust test timing

    * Do not fail on fs.ErrCheckpointFileNotFound

    * mockgen

    * Improve handling of shared reverse index

    * Improve Shard.AggregateTiles UT coverage

    * Remove unused reverseIndex from test

    * Address review feedback (partial)

    * Rearrange imports

commit d5fbe4b
Author: Bo Du <bo@chronosphere.io>
Date:   Thu Oct 1 10:12:38 2020 -0600

    [dbnode] No empty TSDB snapshots (#2666)

commit 06cca59
Author: Asaf Mesika <amesika@logz.io>
Date:   Thu Oct 1 12:18:21 2020 +0300

    [docs] Add short description of 2019 monitorama talk (#2515)

    * Add short description of 2019 monitorama talk

    * Update docs/overview/media.md

    Co-authored-by: Chris Chinchilla <chris@chrischinchilla.com>

    Co-authored-by: Chris Chinchilla <chris@chrischinchilla.com>
    Co-authored-by: Chris Chinchilla <chris@chronosphere.io>

commit 68ba5bb
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Wed Sep 30 12:18:42 2020 -0700

    [query] Implemented the Graphite `interpolate` function  (#2650)

commit 77455be
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Wed Sep 30 07:41:27 2020 -0700

    [query] Implemented the Graphite `grep` function (#2655)

commit 43ff200
Author: nate <nbroyles@gmail.com>
Date:   Tue Sep 29 18:00:14 2020 -0400

    [query] Update /database/create endpoint to support creating an aggregated namespace (#2670)

commit 41ac33c
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Tue Sep 29 12:37:01 2020 -0700

    Update debug/dump cmd with Cluster-Environment-Name header (#2672)

    This is needed since the dump includes the PlacmentSource

commit 503d68d
Author: nate <nbroyles@gmail.com>
Date:   Tue Sep 29 15:10:00 2020 -0400

    [docs] Add Fileset Migration docs to Operational Guide nav (#2680)

commit ac8259a
Author: nate <nbroyles@gmail.com>
Date:   Tue Sep 29 14:44:18 2020 -0400

    [dbnode][query] Add ability to set and retrieve AggregationOptions via namespace APis (#2661)

commit 946161b
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Tue Sep 29 10:54:43 2020 -0700

    Only acquire writeState exclusive lock when resetting connections (#2678)

    This is a shared lock across all goroutines, so ideally we limit the
    amount of sync points. There is no need to acquire the exclusive lock
    when flushing a connection. Instead the shared read lock can be acquired
    and individual connection exclusive locks can be acquired to ensure only
    a single writer is using the connection at a time.

    This is part of a broader fix to prevent deadlocks in the m3msg communication.
    Before this change and adding write timeouts, the producer and consumer could deadlock
    and stop all messages from flowing. If a producer was sending more messages than
    it was acking, it would eventually fill up the TCP buffers and begin to block
    writes. If the producer was flushing a connection and holding the exclusive lock,
    it would end up blocking with lock. Since the lock was held, the ACKing process
    could not clear the buffer and the flush would block forever.

commit 192f611
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Tue Sep 29 08:19:58 2020 -0700

    Fix data race in TestSeekerManagerCacheShardIndices (#2679)

commit c717d40
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Mon Sep 28 23:27:09 2020 -0400

    [documentation] Add documentation for disk read bytes per second limit (#2677)

commit 3c81774
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Mon Sep 28 11:25:08 2020 -0700

    Add a configurable write timeout for consumer ACKs (#2675)

    Without a timeout this can potentially block forever.

    The producer and consumer can potentially dead lock trying to send/receive messages:

    Producer -> msg -> Consumer. Consumer is not attempting to read since it's trying to ACK.
    Consumer -> ack -> Producer. Producer is not attempting to read since it might be stuck trying to get a lock on the connection.

    For backwards compatibility this defaults to 0. We should set it to a sane default (5s) in the future.

commit 80c9f9e
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Mon Sep 28 09:39:19 2020 -0700

    Add queue processing metrics (#2671)

    * Add queue processing metrics

    I found these metrics useful when tracking down an issue with the queue
    processor. Additionally, added an optional consumer label for the
    message writer metrics so you can easily tell which downstream consumer
    is causing issues.

    Unfortunately, setting the consumer label is a little awkward since it
    only makes since for the replicated shard writer, which only has a single
    consumer instance at one time. For the shared shard writer, the empty string
    is used.

commit 66b0b24
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Mon Sep 28 03:51:22 2020 -0700

    [query] Implemented the Graphite `sortByMinima` function

commit a7b499d
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Sun Sep 27 20:12:44 2020 -0700

    [query] Implemented the Graphite `cumulative` function

commit 13290be
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Fri Sep 25 16:34:55 2020 -0700

    Add metrics for bytes added/removed to the m3msg buffer (#2656)

    Helpful for understanding how to size the buffer to survive a m3aggregator
    deploy. Due to the graceful shutdown of a m3agregator, it can take almost a
    minute between connections closing on the old instances and connections
    start taking traffic on the new instances. If a coordinator takes 10MB/s
    of traffic, it only takes 10s to fill up a 100MB buffer and start dropping requests.

commit 06d23e2
Author: Matt Schallert <mattschallert@gmail.com>
Date:   Fri Sep 25 15:06:20 2020 -0700

    [deps] grpc v1.27 -> v1.29 (#2667)

    We're stuck in a tough spot with our gRPC version. We are on a
    relatively old version, and gRPC frequently introduces breaking changes
    to APIs that are experimental. Our own query code, and the version of
    etcd we depend on, use experimental APIs which were later deprecated.

    We can't upgrade too far as we won't be able to import etcd. However
    v1.29 is far more compatible with other parts of the ecosystem, and
    makes it easier to integrate M3 in other environments (as well as
    integrate newer tools into M3).

commit 5b95438
Author: Ryan Hall <ryanhall07@gmail.com>
Date:   Fri Sep 25 13:53:08 2020 -0700

    [msg] Move writer README to top level msg README  (#2669)

commit 38bc187
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Fri Sep 25 14:12:05 2020 -0400

    [query] Add metrics for remote storage backends (#2657)

commit 21cceab
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Fri Sep 25 10:42:52 2020 -0400

    [dbnode] Add example for using the Go DB node client to query/write metrics (#2658)

commit 41b4bc4
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Fri Sep 25 10:42:00 2020 -0400

    [query] Add ability to enable optimized Graphite fanout if all agg namespaces have all the data (#2665)

commit aefca00
Author: Linas Medžiūnas <linasm@users.noreply.github.com>
Date:   Fri Sep 25 09:09:46 2020 +0300

    [dbnode] Large tiles aggregation flow v2 (#2643)

    * Refactor and cleanup

    * Refactor interfaces to more closely match design

    * Update frame iterator read from a encoding.ReaderIterator

    * Removing unnecessary files

    * az

    * Add utility to apply tile calculations on data.

    * test fix

    * Added concurrency

    * Concurrency logging

    * [dbnode] A noop AggregateTiles thrift RPC

    * Add AggregateTilesRequest.rangeType

    * sourceNameSpace / targetNameSpace

    * Drop AggregateTilesRequest.shardId

    * A partial implementation of AggregateTiles

    * Open DataFileSetReader and iterate through it

    * Decompress the data read

    * Add explicit FileSetType

    * [dbnode] Add OrderedByIndex option for DataFileSetReader.Open

    * Remove dbShard.TagsFromSeriesID

    * Regenerate mocks

    * Unit tests

    * Mockgen

    * Fix test

    * Resurrect rpc_mock.go

    * Remove accidentally committed files

    * Trigger build

    * Add step parameter

    * Write aggregated data to other namespace

    * Fix tests

    * Introduced AggregateTilesOptions

    * Minor improvements

    * Cleanup

    * PR response

    * Add headers

    * Remove unnecessary stuff.

    * [dbnode] A noop AggregateTiles thrift RPC

    * Add AggregateTilesRequest.rangeType

    * sourceNameSpace / targetNameSpace

    * Drop AggregateTilesRequest.shardId

    * A partial implementation of AggregateTiles

    * Open DataFileSetReader and iterate through it

    * Decompress the data read

    * Add explicit FileSetType

    * Remove dbShard.TagsFromSeriesID

    * Regenerate mocks

    * Unit tests

    * Mockgen

    * Fix test

    * Resurrect rpc_mock.go

    * Remove accidentally committed files

    * Trigger build

    * Add step parameter

    * Write aggregated data to other namespace

    * Fix tests

    * Introduced AggregateTilesOptions

    * Minor improvements

    * Cleanup

    * [dbnode] Integrate arrow iterators into tile aggregation

    * Fix close error after EOF

    * Can already close the SeriesBlockIterator

    * Update to use concurrent iteration and prefer single metadata

    * [dbnode] Cross block series reader

    * Assert on OrderedByIndex

    * Tests

    * Mocks

    * Dont test just the happy path

    * Compute and validate block time frames

    * [dbnode] Integration test for large tiles (#2478)

    * [dbnode] Create a virtual reverse index for a computed namespace

    * Return processedBlockCount from AggregateTiles

    * Improve error handling

    * Validate AggregateTilesOptions

    * Unnest read locks

    * Use default instead of constant

    * Fix test

    * minor refactoring

    * Addressed review feedback

    * Legal stuff

    * Refactor recorder

    * Allow using flat buffers rather than arrow

    * [dbnode] persist manager for large tiles

    * revert of .ci

    * minor

    * Adding better comparisons for arrow vs flat

    * Some fixes for query_data_files

    * An option to read from all shards

    * Fix large_tiles_test

    * Fix TestDatabaseAggregateTiles

    * Read data ordered by index

    * Generate mocks

    * Fix TestAggregateTiles

    * Group Read() results by id

    * Remodel CrossBlockReader as an Iterator

    * Mockgen

    * Erase slice contents before draining them

    * Resolve merge conflicts

    * Align with master

    * Integrate CrossBlockReader

    * Make a defensive copy of dataFileSetReaders

    * avoid panics

    * Improve TestNamespaceAggregateTiles

    * Added TODO on TestAggregateTiles

    * Align query_data_files

    * Mockgen

    * Added cross block iterator to be able to read multiple BlockRecords.
    Also removed concurrency from tile iterators and cleaned up utility

    * Add HandleCounterResets to AggregateTilesOptions

    * Additional tests and cleanup.

    * [dbnode] Large Tiles fs.writer experimental implementation

    * Implement DownsampleCounterResets

    * Improve readablitiy

    * Use pointer arguments to get the results

    * Reduce code duplication

    * Refine comments

    * Remove dependency on SeriesBlockFrame

    * [dbnode] Add OrderedByIndex option for DataFileSetReader.Open (#2465)

    * [dbnode] Cross-block series reader (#2481)

    * Fix build

    * Integrate DownsampleCounterResets

    * Introduce DownsampledValue struct

    * Update DownsampleCounterResets integration

    * Preallocate capacity for downsampledValues

    * Large tiles parrallel indexing.

    * Checkpoint fixed

    * Successful write/fetch with some hardcoded values.

    * Some FIXME solved

    * [dbnode] AggregateTiles RPC - minimal E2E flow (#2466)

    * minor fixes

    * codegen fix

    * Address feedback from PR 2477

    * TestShardAggregateTiles using 2 readers

    * Fix large_tiles_test.go

    * integration test fix

    * Bug fix and test

    * [large tiles] Fix refcounting in CrossBlockReader

    * Workaround for negative reference count

    * Integration test fix

    * [large-tiles] Try detect double finalize

    * [dbnode] Large tiles concurrency test

    * Batch writer is used for waster writes

    * race fix

    * Fix compilation error

    * Comment out some noise

    * Fix data races on time field (bootstrapManager, flushManager)

    * Fix misplaced wd.Add

    * Close context used for aggregation (in test code)

    * Close encoders during large tile writes

    * removing some debug code

    * Close series in test code

    * Move series.Close() after err check (test code)

    * Update to series frame API

    * Additional tests

    * PR

    * work on integ test

    * tags close

    * [large-tiles] Fix management of pooled objects

    * Fix query_data_files tool

    * Use mock checked bytes in cross_block_reader_test.go

    * query test

    * Query in place of fetch

    * test fix

    * Bug reproduced

    * Heavier concurrency test

    * Fix session related races in large_tiles_test.go

    * Fix string conversion

    * Use mocks for pooled objects in TestShardAggregateTiles

    * Add build integration tag

    * test fix

    * minor refactoring

    * increased the amount of series to 5k

    * Remove a noop

    * Log the finish of namespace aggregation

    * some minor refactorings

    * Streaming aggregation, reusing resources for each series

    * Do not allocate minHeapEntries

    * Cleanup

    * Fix query_data_files

    * Fix build

    * go.sum

    * Align with StreamingWriter API changes

    * Add FIXME WRT segment.Tail finalizing

    * Exclude query_data_files

    * Exclude counter_resets_downsampler.go

    * Remove arrow related code

    * Cleanup

    * Use explicit EncodedTags type

    * Rename processedBlockCount to processedTileCount

    * Fix build

    * Exclude read_index_ids changes

    * Address review feedback

    * Increase fetch timeout to stabilize TestAggregationAndQueryingAtHighConcurrency

    * Abort the writer in case of any error during aggregation

    Co-authored-by: Artem <artem@chronosphere.io>
    Co-authored-by: Gediminas <gediminas@chronosphere.io>

commit ac530fa
Author: Matt Schallert <mattschallert@gmail.com>
Date:   Thu Sep 24 14:07:46 2020 -0700

    [deps] Bump tchannel v1.12 -> v1.14 (#2659)

commit 62a31fc
Author: Rob Skillington <rob.skillington@gmail.com>
Date:   Thu Sep 24 15:06:15 2020 -0400

    [dbnode] Better defaults for block retriever config (#2664)

commit 767a517
Author: nate <nbroyles@gmail.com>
Date:   Thu Sep 24 10:11:56 2020 -0400

    [dbnode] Load and cache info files during bootstrap  (#2598)

commit 4d81e4a
Author: teddywahle <69990143+teddywahle@users.noreply.github.com>
Date:   Wed Sep 23 12:43:21 2020 -0700

    [query] Implemented the Graphite `highest` and `lowest` functions (#2623)

commit b4575f6
Author: arnikola <artem@chronosphere.io>
Date:   Wed Sep 23 10:07:11 2020 -0400

    [query] Add resolution exceeds query range warning (#2429)

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>
Copy link

@christopherzli christopherzli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes m3 not compatible with prometheus metric scraper due to difference in definition: https://github.com/prometheus/prometheus/blob/main/prompb/types.proto#L128

@@ -14,10 +14,14 @@ message Sample {
message TimeSeries {
repeated Label labels = 1 [(gogoproto.nullable) = false];
repeated Sample samples = 2 [(gogoproto.nullable) = false];
MetricType type = 3;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes m3 not compatible with prometheus metric scraper due to difference in definition: https://github.com/prometheus/prometheus/blob/main/prompb/types.proto#L128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants