[coordinator] Store metrics type into the annotation #2628

linasm · 2020-09-11T08:45:35Z

What this PR does / why we need it:
It introduces an annotation.Payload protobuf message which (marshalled to bytes) is used for structured storage of data within time series annotations.
Currently it has just a single field - metrics type - which gets set within remote write flow (if the value is provided to the /prom/remote/write endpoint).

Special notes for your reviewer:
Open questions:

There was a naming clash in src/query/generated/proto/prompb/types.proto (type field and its enum values) with metric types used in Graphite ingestion flow. Apparently, binary (tag based) encoding is used, so I have renamed the clashing elements (keeping protobuf tag values the same). This was discussed with @shreyassrivatsan. It might be a good idea to eventually migrate Graphite ingestion flow to use the same type field (and merge the enum values), but this would require a phased deployment, so here I am not doing anything in that direction.
I have introduced a separate set of internal (Go code, not protobuf) metrics types for Prometheus (vs Graphite), which is not very nice. I believe those sets could be merged into one already, but for that I would need to know how to handle the additional metrics types in current flows (such as https://github.com/m3db/m3/compare/linasm/metrics-type-metadata?expand=1#diff-889532daf887c0868320b34b83371b72R157-L163 and https://github.com/m3db/m3/compare/linasm/metrics-type-metadata?expand=1#diff-3668488cd7a0545acaa9c635c5bdb46eL487-R493).
Do we want to compress the protobuf message persisted in the annotation? Currently the message size would be either 0 (default values) or 2 bytes and the compression would probably just add an overhead. But this has to be considered looking into the future. Alternatively, if we need to include some large field in future (say, help), that field could be compressed individually.
Currently DBNode thrift API reads/writes annotations as binary fields, and this is what I have used here (for writing). Would we want to replace that with something more structured on the API level (since we are trying to structure it internally already)?

Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE

Does this PR require updating code package or user-facing documentation?:
NONE

…llision

arnikola · 2020-09-11T12:16:05Z

src/cmd/services/m3coordinator/downsample/downsampler.go

@@ -58,7 +58,7 @@ type SamplesAppenderResult struct {
 type SampleAppenderOptions struct {
 	Override      bool
 	OverrideRules SamplesAppenderOverrideRules
-	MetricType    ts.MetricType
+	MetricType    ts.M3MetricType


Can we keep this as MetricType

Well, we can. But I renamed it in order to match PromMetricType in https://github.com/m3db/m3/pull/2628/files/aeb8036681303b0de6312eb47006d49f59a4a631#diff-0535bd9002d5b34759706aebeec1ee46
I believe having PromMetricType and just MetricType at the same scope/level would be confusing.

My thinking was that MetricType is implicitly scoped to M3MetricType just because we're in the M3 project, so all types can be considered M3Type by default

I wish it were so simple :) Unfortunately we bring in a protobuf with a different MetricType from Prometheus project which makes this whole point moot.

arnikola · 2020-09-11T12:19:30Z

src/dbnode/generated/proto/annotation/annotation.proto

@@ -0,0 +1,18 @@
+syntax = "proto3";


nit: This probably makes more sense to put in query, since it's an external value sent from prom remote write

Well, we will have to use it in the aggregation process on dbnode.

arnikola · 2020-09-11T12:26:32Z

src/query/api/v1/handler/prometheus/remote/write_test.go

@@ -284,6 +288,98 @@ func TestPromWriteAggregatedMetricsWithHeader(t *testing.T) {
 	require.Equal(t, http.StatusOK, resp.StatusCode)
 }

+func TestPromWriteMetricsTypes(t *testing.T) {
+	ctrl := gomock.NewController(t)


nit: use xtest.NewController instead of the default one, it has some helper functions that fail fast when a function is not EXPECTed rather than hanging

arnikola · 2020-09-11T12:28:13Z

src/query/generated/proto/prompb/types.proto

-  Type type = 101;
-  Source source = 102;
+  M3Type m3_type = 101;
+  Source source  = 102;


nit: mind lining these up with the defs before the comment?

arnikola · 2020-09-30T21:05:53Z

src/query/api/v1/handler/prometheus/remote/write.go

 }

 func (i *promTSIter) Next() bool {
 	i.idx++
-	return i.idx < len(i.tags)
+	if i.idx >= len(i.tags) {


This will need an error check now, i.e. if i.err != nil { return false }

arnikola · 2020-09-30T21:20:38Z

src/query/generated/proto/prompb/types.proto

  // NB: These are custom fields that M3 uses. They start at 101 so that they
  // should never clash with prometheus fields.
-  Type type = 101;
-  Source source = 102;
+  M3Type m3_type = 101;


nit: Can we keep these as Type? Otherwise it stutters heavily, e.g. ts.M3_Type == M3Type. M3_COUNTER

Well thats just an unfortunate consequence of having to add MetricType type to match Prometheus. On the message name, see my comment above. And on the enum value name (M3_COUNTER etc.) - Protobuf does not allow enums sharing with the same named values in the same scope.
Long term, I think we should merge M3Type into MetricType and get rid of it (actually, only M3_TIMER would have to be added, with some high tag number to avoid future collision with something added by Prometheus). But that would require a phased migration of whole ingestion pipeline.
See my comments on this PR for more details.

arnikola · 2020-09-30T21:23:32Z

src/query/ts/metadata.go

+		M3Type:   M3MetricTypeGauge,
+		PromType: PromMetricTypeUnknown,


nit: these are both default values, can remove them here

Will remove. Just for the record, forcing to look into 3 different places to understand what you are getting as a default hurts readability.

Yeah understandable, just Unknown etc are probably expected as defaults for enums

arnikola · 2020-09-30T21:24:21Z

src/query/ts/metadata.go

+)
+
+// PromMetricType is the enum for Prometheus metric types.
+type PromMetricType int


nit; These can probably be uint8 rather than int

src/query/storage/converter.go

arnikola · 2020-10-04T22:09:00Z

src/query/api/v1/handler/prometheus/remote/write.go

+		return false
+	}
+
+	i.annotation, err = annotationPayload.Marshal()


Should this use MarshalAppend rather than Marshal? Marshal always allocs a slice, but we can pass in a zeroed i.annotation into MarshalAppend to avoid that

I tried using MarshalTo(data) initially, but later discovered that this leads to concurrency issues because the output of promTSIter is used within multiple goroutines.

arnikola · 2020-10-04T22:56:45Z

src/query/ts/metadata.go

+	M3Type         M3MetricType
+	PromType       PromMetricType
+	PromFamilyType PromMetricFamilyType


nit: Can we split this into two types? One which covers the input prom types, and one that is converted into that only describes the m3 type? Otherwise there's potential situations where these may drift

This is changing now that I am removing the FamilyType concept (as per our earlier discussion). With that in mind, can you clarify what exactly do you want to split (the comment is not very clear to me)?

This reverts commit d108b4f.

arnikola · 2020-10-05T13:32:03Z

src/dbnode/generated/proto/annotation/annotation.proto

+    MetricType metric_type   = 1;
+    bool handle_value_resets = 2;


nit: use camel case rather than underscoring here

Using snake case for field names is suggested by the official protobuf style guide. Unfortunately we have mixed both snake and camel in various places, but for new files (like in this case) the decision is to stick to the style guide.

arnikola · 2020-10-05T13:32:51Z

src/dbnode/generated/proto/annotation/annotation.proto

+    UNKNOWN = 0;
+    COUNTER = 1;
+    GAUGE = 2;
+    HISTOGRAM = 3;
+    GAUGE_HISTOGRAM = 4;
+    SUMMARY = 5;
+    INFO = 6;
+    STATESET = 7;


meganit; mind aligning these?

arnikola · 2020-10-05T14:14:21Z

src/query/ts/metadata.go

+
+const (
+	// PromMetricTypeUnknown is the unknown Prometheus metric type.
+	PromMetricTypeUnknown PromMetricType = iota


nit; can remove MetricType from the name, so these would be PromCounter and M3Counter?

Started doing it, and then realized that names such as PromUnknownand PromInfo would lose too much context. Decided to keep the current names, also consistent with the preexisting M3MetricTypeGauge etc.

arnikola · 2020-10-05T14:16:52Z

src/query/ts/metadata.go

@@ -20,22 +20,51 @@

 package ts

-// MetricType is the enum for metric types.
-type MetricType int
+// M3MetricType is the enum for M3 metric types.


nit: mind adding a comment why our list is shorter than proms? Would be useful especially for people who aren't familiar with the problem space

arnikola · 2020-10-05T14:36:39Z

src/query/ts/metadata.go

+	M3Type            M3MetricType
+	PromType          PromMetricType
+	Source            SourceType
+	HandleValueResets bool


Just following up here since GH removed the other comment about splitting this; was just thinking it's probably better not to have both prom and m3 types in a single SeriesAttributes, rather have them separate. Happy to leave it as is though 👍

src/query/storage/converter.go

arnikola · 2020-10-05T14:48:29Z

src/query/storage/converter_test.go

@@ -275,3 +276,128 @@ func BenchmarkFetchResultToPromResult(b *testing.B) {
 		benchResult = FetchResultToPromResult(fr, false)
 	}
 }
+


Can we have a test that converts from annotations to attributes then back to annotations, then back to attributes to ensure the fields get set correctly?

Hmm, would be nice, but there is no such round-trip conversion. The conversion flow goes like this:
API protobuf -> attributes -> annotation payload protobuf

arnikola

LG with nits

commit 47c219a Author: Linas Medžiūnas <linasm@users.noreply.github.com> Date: Tue Oct 6 13:31:37 2020 +0300 [dbnode] GetNamespaceFn returns public Namespace interface (#2696) commit ac2ef9b Author: Linas Medžiūnas <linasm@users.noreply.github.com> Date: Tue Oct 6 11:11:50 2020 +0300 [coordinator] Store metrics type into the annotation (#2628) * Rename existing protobuf TimeSeries.type field to m3_type to avoid collision * Add new Prometheus protobuf fields * Rename the internal MetricsType to M3MetricsType * Implement conversions * Write metrics type as annotation payload * Avoid reusing annotation slices * Fix test * Introduce metric family type * Address review feedback * Revert "Introduce metric family type" This reverts commit d108b4f. * Introduce annotation.Payload.handle_value_resets field * Minor changes according to PR feedback commit 2df33bf Author: Linas Medžiūnas <linasm@users.noreply.github.com> Date: Tue Oct 6 09:23:48 2020 +0300 [dbnode] Extended namespace options (#2644) * [dbnode] Extended namespace options * Add ExtendedOptions to integration tests * Fix build * Fix build (OptionsToProto) * go.sum * Imports * Allow updating extended namespace options * Lint * Fixed volumeIndex increment * Revert "Fixed volumeIndex increment" This reverts commit 2e0342b. * Fix TestLocalTypeWithAggregatedNamespace * Added protection against golang/protobuf/jsonpb use * More protection against non-gogo protobuf package * Make namespace tests pass * Reduce reptitiveness of protobuf.Any construction * Extract ExtendedOptions test infra to x package * Add copyright * Move test ExtendedOptionsConverter to the packages where it is needed * xtest.NewExtendedOptionsProto returns error Co-authored-by: Gediminas <gediminas@chronosphere.io> commit 84f5b98 Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Mon Oct 5 14:36:45 2020 -0700 [query] Implemented the Graphite `applyByNode` function (#2654) commit 08117d2 Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Mon Oct 5 13:28:07 2020 -0700 [query] Fix snapping bug affecting "moving" function (movingMedian, movingAverage, etc.) (#2694) commit db4e9a3 Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Mon Oct 5 12:49:06 2020 -0700 [query] ParseTime function supports 95%+ of `from / until` time formats (#2621) commit 89f6fcc Author: Gediminas Guoba <gediminas@chronosphere.io> Date: Mon Oct 5 21:36:32 2020 +0300 [largetiles] Fixed volumeIndex increment (#2681) * Fixed volumeIndex increment * minor refactoring Co-authored-by: Linas Medžiūnas <linasm@users.noreply.github.com> Co-authored-by: Rob Skillington <rob.skillington@gmail.com> commit 1df3eed Author: Linas Medžiūnas <linasm@users.noreply.github.com> Date: Mon Oct 5 20:45:50 2020 +0300 [dbnode] Fix granularity of LeaseManager.UpdateOpenLeases (#2695) commit 5bcb1ba Author: Linas Medžiūnas <linasm@users.noreply.github.com> Date: Fri Oct 2 16:59:22 2020 +0300 [dbnode] Support dynamically registered background processes in mediator (#2634) * [dbnode] Support dynamically registered background processes in mediator * [dbnode] Pass background processes via RunOptions * Add a comment for BackgroundProcess * Add integration test TestRunCustomBackgroundProcess * Include backgroundProcess in TestDatabaseMediatorOpenClose * Move the source of BackgroundProcess asynchrony outside for consistency/easier testing * Fix test * Formatting * Update BackgroundProcess doc * go mod tidy * Verify reporting in TestRunCustomBackgroundProcess * Address review feedback * Resurrect BackgroundProcess.Start() * Enforce mediator.RegisterBackgroundProcess before Open * Remove locking from mediator.Report * Fix comment commit ef9ba0a Author: Rob Skillington <rob.skillington@gmail.com> Date: Fri Oct 2 02:39:33 2020 -0400 [changelog] Add changelog for 0.15.16 (#2688) commit 81c3e19 Author: Rob Skillington <rob.skillington@gmail.com> Date: Fri Oct 2 01:54:34 2020 -0400 [coordinator] Configurable writes to leaving shards count towards consistency, add read level unstrict all (#2687) commit a700d56 Author: Rob Skillington <rob.skillington@gmail.com> Date: Thu Oct 1 23:09:09 2020 -0400 [coordinator] Continue write request when context is cancelled (#2682) commit 526da79 Author: Ryan Allen <rallen090@gmail.com> Date: Thu Oct 1 15:48:37 2020 -0400 [query] - Include FetchQuery in InspectSeries arguments (#2685) commit 3dedba5 Author: arnikola <artem@chronosphere.io> Date: Thu Oct 1 15:18:27 2020 -0400 [query] Additional testing on Prometheus engine path (#2686) * [query] BlockMetadata no longer required for Prometheus engine path * Revert code change * Exposes ApplyRangeWarnings commit f996e2d Author: Linas Medžiūnas <linasm@users.noreply.github.com> Date: Thu Oct 1 21:49:33 2020 +0300 [dbnode] Large tile aggregation improvements (#2668) * [dbnode] Large tile aggregation improvements * Fix mocking * Log block start and volume on error * Infra for reverse index sharing * StorageOptions.AfterNamespaceCreatedFn * Adjust test timing * Do not fail on fs.ErrCheckpointFileNotFound * mockgen * Improve handling of shared reverse index * Improve Shard.AggregateTiles UT coverage * Remove unused reverseIndex from test * Address review feedback (partial) * Rearrange imports commit d5fbe4b Author: Bo Du <bo@chronosphere.io> Date: Thu Oct 1 10:12:38 2020 -0600 [dbnode] No empty TSDB snapshots (#2666) commit 06cca59 Author: Asaf Mesika <amesika@logz.io> Date: Thu Oct 1 12:18:21 2020 +0300 [docs] Add short description of 2019 monitorama talk (#2515) * Add short description of 2019 monitorama talk * Update docs/overview/media.md Co-authored-by: Chris Chinchilla <chris@chrischinchilla.com> Co-authored-by: Chris Chinchilla <chris@chrischinchilla.com> Co-authored-by: Chris Chinchilla <chris@chronosphere.io> commit 68ba5bb Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Wed Sep 30 12:18:42 2020 -0700 [query] Implemented the Graphite `interpolate` function (#2650) commit 77455be Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Wed Sep 30 07:41:27 2020 -0700 [query] Implemented the Graphite `grep` function (#2655) commit 43ff200 Author: nate <nbroyles@gmail.com> Date: Tue Sep 29 18:00:14 2020 -0400 [query] Update /database/create endpoint to support creating an aggregated namespace (#2670) commit 41ac33c Author: Ryan Hall <ryanhall07@gmail.com> Date: Tue Sep 29 12:37:01 2020 -0700 Update debug/dump cmd with Cluster-Environment-Name header (#2672) This is needed since the dump includes the PlacmentSource commit 503d68d Author: nate <nbroyles@gmail.com> Date: Tue Sep 29 15:10:00 2020 -0400 [docs] Add Fileset Migration docs to Operational Guide nav (#2680) commit ac8259a Author: nate <nbroyles@gmail.com> Date: Tue Sep 29 14:44:18 2020 -0400 [dbnode][query] Add ability to set and retrieve AggregationOptions via namespace APis (#2661) commit 946161b Author: Ryan Hall <ryanhall07@gmail.com> Date: Tue Sep 29 10:54:43 2020 -0700 Only acquire writeState exclusive lock when resetting connections (#2678) This is a shared lock across all goroutines, so ideally we limit the amount of sync points. There is no need to acquire the exclusive lock when flushing a connection. Instead the shared read lock can be acquired and individual connection exclusive locks can be acquired to ensure only a single writer is using the connection at a time. This is part of a broader fix to prevent deadlocks in the m3msg communication. Before this change and adding write timeouts, the producer and consumer could deadlock and stop all messages from flowing. If a producer was sending more messages than it was acking, it would eventually fill up the TCP buffers and begin to block writes. If the producer was flushing a connection and holding the exclusive lock, it would end up blocking with lock. Since the lock was held, the ACKing process could not clear the buffer and the flush would block forever. commit 192f611 Author: Ryan Hall <ryanhall07@gmail.com> Date: Tue Sep 29 08:19:58 2020 -0700 Fix data race in TestSeekerManagerCacheShardIndices (#2679) commit c717d40 Author: Rob Skillington <rob.skillington@gmail.com> Date: Mon Sep 28 23:27:09 2020 -0400 [documentation] Add documentation for disk read bytes per second limit (#2677) commit 3c81774 Author: Ryan Hall <ryanhall07@gmail.com> Date: Mon Sep 28 11:25:08 2020 -0700 Add a configurable write timeout for consumer ACKs (#2675) Without a timeout this can potentially block forever. The producer and consumer can potentially dead lock trying to send/receive messages: Producer -> msg -> Consumer. Consumer is not attempting to read since it's trying to ACK. Consumer -> ack -> Producer. Producer is not attempting to read since it might be stuck trying to get a lock on the connection. For backwards compatibility this defaults to 0. We should set it to a sane default (5s) in the future. commit 80c9f9e Author: Ryan Hall <ryanhall07@gmail.com> Date: Mon Sep 28 09:39:19 2020 -0700 Add queue processing metrics (#2671) * Add queue processing metrics I found these metrics useful when tracking down an issue with the queue processor. Additionally, added an optional consumer label for the message writer metrics so you can easily tell which downstream consumer is causing issues. Unfortunately, setting the consumer label is a little awkward since it only makes since for the replicated shard writer, which only has a single consumer instance at one time. For the shared shard writer, the empty string is used. commit 66b0b24 Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Mon Sep 28 03:51:22 2020 -0700 [query] Implemented the Graphite `sortByMinima` function commit a7b499d Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Sun Sep 27 20:12:44 2020 -0700 [query] Implemented the Graphite `cumulative` function commit 13290be Author: Ryan Hall <ryanhall07@gmail.com> Date: Fri Sep 25 16:34:55 2020 -0700 Add metrics for bytes added/removed to the m3msg buffer (#2656) Helpful for understanding how to size the buffer to survive a m3aggregator deploy. Due to the graceful shutdown of a m3agregator, it can take almost a minute between connections closing on the old instances and connections start taking traffic on the new instances. If a coordinator takes 10MB/s of traffic, it only takes 10s to fill up a 100MB buffer and start dropping requests. commit 06d23e2 Author: Matt Schallert <mattschallert@gmail.com> Date: Fri Sep 25 15:06:20 2020 -0700 [deps] grpc v1.27 -> v1.29 (#2667) We're stuck in a tough spot with our gRPC version. We are on a relatively old version, and gRPC frequently introduces breaking changes to APIs that are experimental. Our own query code, and the version of etcd we depend on, use experimental APIs which were later deprecated. We can't upgrade too far as we won't be able to import etcd. However v1.29 is far more compatible with other parts of the ecosystem, and makes it easier to integrate M3 in other environments (as well as integrate newer tools into M3). commit 5b95438 Author: Ryan Hall <ryanhall07@gmail.com> Date: Fri Sep 25 13:53:08 2020 -0700 [msg] Move writer README to top level msg README (#2669) commit 38bc187 Author: Rob Skillington <rob.skillington@gmail.com> Date: Fri Sep 25 14:12:05 2020 -0400 [query] Add metrics for remote storage backends (#2657) commit 21cceab Author: Rob Skillington <rob.skillington@gmail.com> Date: Fri Sep 25 10:42:52 2020 -0400 [dbnode] Add example for using the Go DB node client to query/write metrics (#2658) commit 41b4bc4 Author: Rob Skillington <rob.skillington@gmail.com> Date: Fri Sep 25 10:42:00 2020 -0400 [query] Add ability to enable optimized Graphite fanout if all agg namespaces have all the data (#2665) commit aefca00 Author: Linas Medžiūnas <linasm@users.noreply.github.com> Date: Fri Sep 25 09:09:46 2020 +0300 [dbnode] Large tiles aggregation flow v2 (#2643) * Refactor and cleanup * Refactor interfaces to more closely match design * Update frame iterator read from a encoding.ReaderIterator * Removing unnecessary files * az * Add utility to apply tile calculations on data. * test fix * Added concurrency * Concurrency logging * [dbnode] A noop AggregateTiles thrift RPC * Add AggregateTilesRequest.rangeType * sourceNameSpace / targetNameSpace * Drop AggregateTilesRequest.shardId * A partial implementation of AggregateTiles * Open DataFileSetReader and iterate through it * Decompress the data read * Add explicit FileSetType * [dbnode] Add OrderedByIndex option for DataFileSetReader.Open * Remove dbShard.TagsFromSeriesID * Regenerate mocks * Unit tests * Mockgen * Fix test * Resurrect rpc_mock.go * Remove accidentally committed files * Trigger build * Add step parameter * Write aggregated data to other namespace * Fix tests * Introduced AggregateTilesOptions * Minor improvements * Cleanup * PR response * Add headers * Remove unnecessary stuff. * [dbnode] A noop AggregateTiles thrift RPC * Add AggregateTilesRequest.rangeType * sourceNameSpace / targetNameSpace * Drop AggregateTilesRequest.shardId * A partial implementation of AggregateTiles * Open DataFileSetReader and iterate through it * Decompress the data read * Add explicit FileSetType * Remove dbShard.TagsFromSeriesID * Regenerate mocks * Unit tests * Mockgen * Fix test * Resurrect rpc_mock.go * Remove accidentally committed files * Trigger build * Add step parameter * Write aggregated data to other namespace * Fix tests * Introduced AggregateTilesOptions * Minor improvements * Cleanup * [dbnode] Integrate arrow iterators into tile aggregation * Fix close error after EOF * Can already close the SeriesBlockIterator * Update to use concurrent iteration and prefer single metadata * [dbnode] Cross block series reader * Assert on OrderedByIndex * Tests * Mocks * Dont test just the happy path * Compute and validate block time frames * [dbnode] Integration test for large tiles (#2478) * [dbnode] Create a virtual reverse index for a computed namespace * Return processedBlockCount from AggregateTiles * Improve error handling * Validate AggregateTilesOptions * Unnest read locks * Use default instead of constant * Fix test * minor refactoring * Addressed review feedback * Legal stuff * Refactor recorder * Allow using flat buffers rather than arrow * [dbnode] persist manager for large tiles * revert of .ci * minor * Adding better comparisons for arrow vs flat * Some fixes for query_data_files * An option to read from all shards * Fix large_tiles_test * Fix TestDatabaseAggregateTiles * Read data ordered by index * Generate mocks * Fix TestAggregateTiles * Group Read() results by id * Remodel CrossBlockReader as an Iterator * Mockgen * Erase slice contents before draining them * Resolve merge conflicts * Align with master * Integrate CrossBlockReader * Make a defensive copy of dataFileSetReaders * avoid panics * Improve TestNamespaceAggregateTiles * Added TODO on TestAggregateTiles * Align query_data_files * Mockgen * Added cross block iterator to be able to read multiple BlockRecords. Also removed concurrency from tile iterators and cleaned up utility * Add HandleCounterResets to AggregateTilesOptions * Additional tests and cleanup. * [dbnode] Large Tiles fs.writer experimental implementation * Implement DownsampleCounterResets * Improve readablitiy * Use pointer arguments to get the results * Reduce code duplication * Refine comments * Remove dependency on SeriesBlockFrame * [dbnode] Add OrderedByIndex option for DataFileSetReader.Open (#2465) * [dbnode] Cross-block series reader (#2481) * Fix build * Integrate DownsampleCounterResets * Introduce DownsampledValue struct * Update DownsampleCounterResets integration * Preallocate capacity for downsampledValues * Large tiles parrallel indexing. * Checkpoint fixed * Successful write/fetch with some hardcoded values. * Some FIXME solved * [dbnode] AggregateTiles RPC - minimal E2E flow (#2466) * minor fixes * codegen fix * Address feedback from PR 2477 * TestShardAggregateTiles using 2 readers * Fix large_tiles_test.go * integration test fix * Bug fix and test * [large tiles] Fix refcounting in CrossBlockReader * Workaround for negative reference count * Integration test fix * [large-tiles] Try detect double finalize * [dbnode] Large tiles concurrency test * Batch writer is used for waster writes * race fix * Fix compilation error * Comment out some noise * Fix data races on time field (bootstrapManager, flushManager) * Fix misplaced wd.Add * Close context used for aggregation (in test code) * Close encoders during large tile writes * removing some debug code * Close series in test code * Move series.Close() after err check (test code) * Update to series frame API * Additional tests * PR * work on integ test * tags close * [large-tiles] Fix management of pooled objects * Fix query_data_files tool * Use mock checked bytes in cross_block_reader_test.go * query test * Query in place of fetch * test fix * Bug reproduced * Heavier concurrency test * Fix session related races in large_tiles_test.go * Fix string conversion * Use mocks for pooled objects in TestShardAggregateTiles * Add build integration tag * test fix * minor refactoring * increased the amount of series to 5k * Remove a noop * Log the finish of namespace aggregation * some minor refactorings * Streaming aggregation, reusing resources for each series * Do not allocate minHeapEntries * Cleanup * Fix query_data_files * Fix build * go.sum * Align with StreamingWriter API changes * Add FIXME WRT segment.Tail finalizing * Exclude query_data_files * Exclude counter_resets_downsampler.go * Remove arrow related code * Cleanup * Use explicit EncodedTags type * Rename processedBlockCount to processedTileCount * Fix build * Exclude read_index_ids changes * Address review feedback * Increase fetch timeout to stabilize TestAggregationAndQueryingAtHighConcurrency * Abort the writer in case of any error during aggregation Co-authored-by: Artem <artem@chronosphere.io> Co-authored-by: Gediminas <gediminas@chronosphere.io> commit ac530fa Author: Matt Schallert <mattschallert@gmail.com> Date: Thu Sep 24 14:07:46 2020 -0700 [deps] Bump tchannel v1.12 -> v1.14 (#2659) commit 62a31fc Author: Rob Skillington <rob.skillington@gmail.com> Date: Thu Sep 24 15:06:15 2020 -0400 [dbnode] Better defaults for block retriever config (#2664) commit 767a517 Author: nate <nbroyles@gmail.com> Date: Thu Sep 24 10:11:56 2020 -0400 [dbnode] Load and cache info files during bootstrap (#2598) commit 4d81e4a Author: teddywahle <69990143+teddywahle@users.noreply.github.com> Date: Wed Sep 23 12:43:21 2020 -0700 [query] Implemented the Graphite `highest` and `lowest` functions (#2623) commit b4575f6 Author: arnikola <artem@chronosphere.io> Date: Wed Sep 23 10:07:11 2020 -0400 [query] Add resolution exceeds query range warning (#2429) Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

christopherzli

this makes m3 not compatible with prometheus metric scraper due to difference in definition: https://github.com/prometheus/prometheus/blob/main/prompb/types.proto#L128

christopherzli · 2023-10-07T10:33:35Z

src/query/generated/proto/prompb/types.proto

@@ -14,10 +14,14 @@ message Sample {
 message TimeSeries {
  repeated Label labels   = 1 [(gogoproto.nullable) = false];
  repeated Sample samples = 2 [(gogoproto.nullable) = false];
+  MetricType type         = 3;


this makes m3 not compatible with prometheus metric scraper due to difference in definition: https://github.com/prometheus/prometheus/blob/main/prompb/types.proto#L128

linasm added 5 commits September 10, 2020 13:20

Rename existing protobuf TimeSeries.type field to m3_type to avoid co…

78783c8

…llision

Add new Prometheus protobuf fields

cc62479

Rename the internal MetricsType to M3MetricsType

27f7f23

Implement conversions

62f4bfd

Write metrics type as annotation payload

06d5f81

linasm requested review from robskillington, gediminasgu, arnikola and shreyassrivatsan September 11, 2020 08:45

Merge branch 'master' into linasm/metrics-type-metadata

aeb8036

linasm added the area:tiles label Sep 11, 2020

linasm and others added 5 commits September 23, 2020 12:46

Avoid reusing annotation slices

521a732

Merge branch 'master' into linasm/metrics-type-metadata

ced1245

Fix test

08d50fb

Introduce metric family type

d108b4f

Merge branch 'master' into linasm/metrics-type-metadata

7cb1bf6

arnikola reviewed Oct 1, 2020

View reviewed changes

linasm and others added 2 commits October 1, 2020 14:08

Address review feedback

70124bf

Merge branch 'master' into linasm/metrics-type-metadata

7e1ea81

arnikola reviewed Oct 4, 2020

View reviewed changes

linasm and others added 3 commits October 5, 2020 11:08

Merge branch 'master' into linasm/metrics-type-metadata

a287fba

Revert "Introduce metric family type"

77753b9

This reverts commit d108b4f.

Introduce annotation.Payload.handle_value_resets field

1466bf7

linasm requested a review from arnikola October 5, 2020 09:52

arnikola reviewed Oct 5, 2020

View reviewed changes

arnikola approved these changes Oct 5, 2020

View reviewed changes

linasm and others added 2 commits October 6, 2020 10:16

Merge branch 'master' into linasm/metrics-type-metadata

b4dc990

Minor changes according to PR feedback

3eb2ad0

linasm merged commit ac2ef9b into master Oct 6, 2020

linasm deleted the linasm/metrics-type-metadata branch October 6, 2020 08:11

christopherzli reviewed Oct 7, 2023

View reviewed changes

[coordinator] Store metrics type into the annotation #2628

[coordinator] Store metrics type into the annotation #2628

Conversation

linasm commented Sep 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnikola Oct 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnikola left a comment

Choose a reason for hiding this comment

christopherzli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnikola Oct 4, 2020 •

edited

Loading