Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coordinator] [query] Add readiness probe for probing current consistency level achievability #2976

Merged
merged 10 commits into from
Dec 4, 2020

Conversation

robskillington
Copy link
Collaborator

What this PR does / why we need it:

This adds a readiness probe that returns whether the cluster is available for reads and/or writes at the configured consistency levels. Useful if in an environment it takes some time for the process to achieve connectivity before being added behind a load balancer.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

ReadyHTTPMethod = http.MethodGet
)

// ReadyHandler tests whether the service is connected to underlying storage.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to document how this readiness check works (ie. returns a 5xx if writes are not ready and you requested writes)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

result.ReadyReads = append(result.ReadyReads, nsResult)
}

ready, err = ns.Session().WriteClusterAvailability()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about skipping the respective availability check if not requested? e.g skip checking write available if req.writes is false. my thinking is it would be unfortunate to block reads if only reads were requested and the write availability ran into an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these don't actually wait on anything it's just a check of current state (checking open connection count), not actually trying to do reachability so won't be any faster/slower or blocking if one is unhealthy vs another.

Copy link
Collaborator

@linasm linasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM with a few readability nits.
But I think unit tests for session.WriteConsistencyLevel / session.ReadConsistencyLevel would be handy as the logics there is not that trivial.

Comment on lines +827 to +836
switch level {
case topology.ConsistencyLevelAll:
clusterAvailableForShard = shardReplicasAvailable == replicas
case topology.ConsistencyLevelMajority:
clusterAvailableForShard = shardReplicasAvailable >= majority
case topology.ConsistencyLevelOne:
clusterAvailableForShard = shardReplicasAvailable > 0
default:
return false, fmt.Errorf("unknown consistency level: %d", level)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would extract this switch out of the loop and compute minReplicasRequired value in it (which would be topoMap.Replicas(), topoMap.MajorityReplicas(), or 1, respectively).
(not for the sake of performance, but for brevity and reduced nesting)

Comment on lines +837 to +839
if !clusterAvailableForShard {
return false, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and then, this would simply be:

Suggested change
if !clusterAvailableForShard {
return false, nil
}
if shardReplicasAvailable < minReplicasRequired {
return false, nil
}

src/query/api/v1/handler/ready.go Outdated Show resolved Hide resolved
Comment on lines +110 to +114
if !ready {
result.NotReadyReads = append(result.NotReadyReads, nsResult)
} else {
result.ReadyReads = append(result.ReadyReads, nsResult)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro nit: invert then/else to avoid negation.

Comment on lines +122 to +126
if !ready {
result.NotReadyWrites = append(result.NotReadyWrites, nsResult)
} else {
result.ReadyWrites = append(result.ReadyWrites, nsResult)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same micro nit as above wrt !ready.

@linasm linasm removed their assignment Dec 4, 2020
@robskillington
Copy link
Collaborator Author

I refactored the code so that the existing connection code (which uses same logic) is reused, and that's already covered by extensive code coverage thankfully.

@codecov
Copy link

codecov bot commented Dec 4, 2020

Codecov Report

Merging #2976 (269215f) into master (421a988) will increase coverage by 0.5%.
The diff coverage is 51.4%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #2976     +/-   ##
========================================
+ Coverage    71.8%   72.3%   +0.5%     
========================================
  Files        1078    1078             
  Lines      100253   99330    -923     
========================================
- Hits        72039   71881    -158     
+ Misses      23238   22470    -768     
- Partials     4976    4979      +3     
Flag Coverage Δ
aggregator 75.8% <ø> (-0.1%) ⬇️
cluster 85.3% <ø> (ø)
collector 84.3% <ø> (ø)
dbnode 78.8% <46.1%> (+1.0%) ⬆️
m3em 74.4% <ø> (ø)
m3ninx 73.1% <ø> (-0.1%) ⬇️
metrics 19.9% <ø> (ø)
msg 74.0% <ø> (-0.1%) ⬇️
query 67.3% <55.5%> (+0.4%) ⬆️
x 80.2% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 421a988...269215f. Read the comment docs.

@robskillington robskillington merged commit b6dfbaa into master Dec 4, 2020
@robskillington robskillington deleted the r/add-readiness-probe branch December 4, 2020 23:22
robskillington added a commit that referenced this pull request Dec 4, 2020
robskillington added a commit that referenced this pull request Dec 7, 2020
anuprout added a commit to anuprout/m3 that referenced this pull request Feb 9, 2021
…as downsample option is set to "all : false" (#2)

* [dbnode] Refactor wide query path (m3db#2826)

* [dbnode] Introduce Aggregator type (m3db#2840)

* Improve some slow tests (m3db#2881)

* Improe some slow tests

* lint

* lint

* goddamn imports

* [query] Remove dead code in prom package (m3db#2871)

* [dbnode] Refactoring dbShard (m3db#2848)

* [ci] Use gofmt to rename generated identifiers (m3db#2883)

* Order index segments in multi segments builder in descending order. (m3db#2875)

* [all] Return a 504 error if timeout. (m3db#2886)

* [cluster] Store shards in sorted form (m3db#2890)

* [docs] Update buildkite doc path (m3db#2891)

* Add new website relaunch (m3db#2892)

* [docs] Fix operator docs no longer being built and published (m3db#2895)

* [aggregator] Remove msgpack support (m3db#2894)

* [dbnode] Remove dead code in ts package (m3db#2898)

* [aggregator] Reduce delays in integration tests (m3db#2896)

* [query] Query handlers refactoring (m3db#2872)

* Initial commit for proposed query handlers refactoring:
 * updated CustomHandler interface
 * unified how query routes are added. no need to pass router around and wrapping is done in one place.
 * routes are registered with names so they could be easily found later when custom handlers are registered.

* trying to make linter happy

* linter fixes

* revert old behaviour

* Make sure route methods are taken into account when adding and searching for named route.

* fixed code formatting

* [dbnode] Refactor wide query path (m3db#2826)

* [dbnode] Introduce Aggregator type (m3db#2840)

* [coordinator] Set default namespace tag to avoid colliding with commonly used "namespace" label (m3db#2878)

* [coordinator] Set default namespace tag to avoid colliding with common "namespace" default value

* Use defined constant

* Add downsampler test case to demonstrate override namespace tag

Co-authored-by: Wesley Kim <wesley@chronosphere.io>

* Improve some slow tests (m3db#2881)

* Improe some slow tests

* lint

* lint

* goddamn imports

* Changes after code review.

* [query] Remove dead code in prom package (m3db#2871)

* Register separate route for each method.

* linter fixes

* removed code duplication in hasndler_test

* Fail if route was already registered.

* formatted code

* Update src/query/api/v1/httpd/handler_test.go

Co-authored-by: Vilius Pranckaitis <vpranckaitis@gmail.com>

* Update src/query/api/v1/httpd/handler_test.go

Co-authored-by: Vilius Pranckaitis <vpranckaitis@gmail.com>

* More handler tests.

Co-authored-by: arnikola <artem@chronosphere.io>
Co-authored-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Co-authored-by: Rob Skillington <rob.skillington@gmail.com>
Co-authored-by: Wesley Kim <wesley@chronosphere.io>
Co-authored-by: Vytenis Darulis <vytenis@uber.com>
Co-authored-by: Vilius Pranckaitis <vpranckaitis@gmail.com>

* [docs] Add missing single quote to shell command (m3db#2893)

* [docs] Update theme rebased (m3db#2899)

* Update README to temporarily remove the logo (m3db#2902)

* [docs] Small changes to website/docs (m3db#2900)

* [changelog] v1.0.0 changelog update (m3db#2904)

* [DOCS] Fix broken images and add formatting to links (m3db#2903)

* [DOCS] Correct grammar issue on home page (m3db#2905)

* Fix seedNodes config in local stack yaml (m3db#2907)

* [dtest] Fix API incompatibilities in docker harness (m3db#2849)

* Correct old typo (m3db#2906)

* [dtest] API tests for /query and /query_range endpoints (m3db#2873)

* [dbnode] Query logging (m3db#2888)

* [query] Add lookback duration from query config (m3db#2913)

* [query] Fix invalid query resulting in 500 (m3db#2910)

* [query] Fix /m3query returning 500 on invalid query argument (m3db#2916)

* [linter] more optimal linter invocation for "all" case (m3db#2863)

* [dbnode] Remove dead code in encoding package (m3db#2920)

* [dbnode] Fix client config respecting value for connect consistency (m3db#2914)

* [storage] Add Aliyun storage class (m3db#2908)

* Update ci submodule to latest (m3db#2921)

This was accidentally reverted to an earlier commit in
https://github.com/m3db/m3/pull/2907/files.

* [m3cluster/etcd] Use zap logger for etcd integration test store (m3db#2915)

* [docs] Parameterize menus (m3db#2918)

* [coordinator] Validate placement on set placement endpoint unless force set (m3db#2922)

* Update development guide (m3db#2832)

* [dbnode] Properly rebuild index segments if they fail verification. (m3db#2879)

* [dbnode] Add configurability for regexp DFA and FSA limits (m3db#2926)

* [DOCS] Separate config to allow for dev and prod setups (m3db#2909)

* [aggregator] Use NaN instead of math.MaxFloat64 and add sentinel guards (m3db#2929)

* [aggregator] Return NaN instead of 0 for NaN gauge values (m3db#2930)

* [query] Allow Graphite variadic functions to omit variadic args (m3db#2882)

* [query] Graphite query timeout propagation and add per endpoint status/latency metrics (m3db#2880)

* [query] Fix incorrect content type in m3query/ error response (m3db#2917)

* [dbnode] Remove dead code in fs package (m3db#2932)

* [dbnode] Add DocRef() to NS and multiple series metadata types  (m3db#2931)

* Revert "[dbnode] Properly rebuild index segments if they fail verification. (m3db#2879)" (m3db#2944)

This reverts commit c6fe28d.

* [aggregator/client] Include instance ID in write errors (m3db#2945)

* [dbnode] Update series metadata type name. (m3db#2946)

* [dbnode] Remove some dead code from peersSource (m3db#2779)

* Prom read instant handler refactoring (m3db#2928)

* [query] Allow customizing request parsing for /query handler

* fix lint errors

* fix another lint error

* PR comments

* add an interface for hooks

* Removed read instant handler.

* Added some comments.

* Uses functional options pattern.

* Added comments for exported functions.

* Simplified options.

* Got rid of 2 fields in readHandler.

Co-authored-by: Vilius Pranckaitis <vpranckaitis@gmail.com>
Co-authored-by: Rob Skillington <rob.skillington@gmail.com>

* [coordinator] Extract new namespace validation logics (m3db#2919)

* [linter] Fix linter not finding any issues (m3db#2948)

* [dbnode] Add shard to ID batch (m3db#2764)

* [dbnode] Integrate ns cold flusher into tile aggregation. (m3db#2923)

* [tests] Use prometheus v2.22 docker image in docker-integration-tests (m3db#2951)

* [dbnode] Widen exposure of namespace method (m3db#2952)

* [dtest] Allow skipping docker image build by providing existing image (m3db#2942)

* [tests] Fix deprecated Prometheus metric in docker-integration-tests (m3db#2953)

* [lint] Update instances of Reuseable -> Reusable (m3db#2950)

* [integration] Expose TestIndexWrite methods (m3db#2956)

* [docs] Add diagram to comparator readme (m3db#2288)

* [coordinator] Consider uninitialized case for UnaggregatedClusterNamespace with dynamic clusters (m3db#2957)

* [coordinator] Consider uninitialized case for UnaggregatedClusterNamespace with dynamic clusters

* Fixup ready_test

* Apply suggestions from code review

Co-authored-by: arnikola <artem@chronosphere.io>

* PR feedback

* Revert go mod changes

Co-authored-by: arnikola <artem@chronosphere.io>

* [dbnode] Simplified interface for MatchesSeriesIters (m3db#2954)

* [dbnode] Add wide filter (m3db#2949)

* [dbnode] Fix integration code under race with TChannel channel options mutation issue (m3db#2958)

* [dbnode] Additional params for Aggregator (m3db#2959)

* [cluster] Expose underlying staged placement version in ActiveStagedPlacement (m3db#2963)

* [DOCS] Add versions to docs config (m3db#2935)

* Add versions to docs config

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update Victor theme

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [DOCS] SEO fixes (m3db#2937)

* Change site title

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Remove duplicated file

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Reorder config items

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix menus, title and relevant config

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update Victor theme

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Change path configuration

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Change title back to param

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [docs] Update docs to clarify Flush and Buffer in Storage Engine (m3db#2511)

* * Expand a bit about the type of flushes and how buffer is structured due to accommodate the two types of writes
* Emphasis few terms that were hard to read at first time
* Fixes several typos and punctuation.

* Change links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Apply formatting

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Co-authored-by: Chris Chinchilla <chris@chronosphere.io>

* [aggregator] Split up TCP and M3Msg clients (m3db#2964)

* [aggregator] Add ActivePlacement method to TCP client (m3db#2971)

* [watchmanager] Do not leak tickers in stopped watches (m3db#2972)

* [etcd] Clean up logging in WatchManager (m3db#2973)

* [etcd] Log watch error source details (m3db#2974)

* [etcd] Expose cached etcd clients in cluster etcd config service (m3db#2975)

* [dbnode] Prevent potential division by 0 in StreamingWriter (m3db#2965)

* [dbnode] Fix shard time ranges superset logic. Cleanup and add tests. (m3db#2968)

* [docs] Fix link for July 2020 community meeting recording (m3db#2978)

* [query] Fix query metrics assigning wrong status codes for errors (m3db#2960)

* [cluster] Fix deadlock in new ActiveStagedPlacement method (m3db#2980)

* [coordinator] add graphite names to aggregation types (m3db#2970)

* Fix broken og meta tags on static pages (m3db#2982)

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [docs] Remove reference to adding namespace to coordinator config (m3db#2977)

Co-authored-by: Chris Chinchilla <chris@chronosphere.io>

* [coordinator] [query] Add readiness probe for probing current consistency level achievability (m3db#2976)

* [cluster] Fix watch resource leak/hang (m3db#2984)

* Remove odd uses of satori/go.uuid package (m3db#2985)

* [tools] Add benchmarking support to read_data_files (m3db#2986)

* [dbnode] Implement shard.ScanData (m3db#2981)

* [dbnode] Add TChannel channel configuration (m3db#2989)

* [aggregator] Fix panic on shutdown in ForwardedWriter (m3db#2987)

* [DOCS] Fix grammar and create code snippet for step in quickstart (m3db#2961)

* Fix grammar and create code snippet for step in quickstart

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update other files with readiness

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [aggregator] Use a single goroutine to emit metrics for all TCP client queues (m3db#2991)

* [aggregator] Lockless ratelimiter (m3db#2988)

* [x/net/http] Add Proxy field to HTTPClientOptions (m3db#2994)

* [dbnode] Skip out of retention index segments during bootstrap. (m3db#2992)

* [aggregator] keep metric type during the aggregation (m3db#2941)

* Update k8s doc references (m3db#2998)

* [m3coordinator] Allow downsampler to configure custom rule store (m3db#2996)

* [rules] Set tags field of mapping rules history conversion (m3db#2999)

* [query] Fix for panic when deleting a namespace (m3db#2990)

* [coordinator] Update ApplyCustomRuleStore interface (m3db#3000)

* [cluster] Close etcd watches completely before recreating (m3db#2997)

* [coordinator] Add WriteNamespaces API to rules store (m3db#3001)

* [dbnode] Add ability to force enable cold writes from config (m3db#3002)

* Adding Hostinger Logo (m3db#3006)

* [DOCS] Add clustering getting started guides for binaries and kubernetes (m3db#2795)

* Move other quickstarts

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Move configuration from common steps as it no longer matches k8s steps

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Squashed commit of the following:

commit cfffc26414de48e6e5d4784f0908aaebb19743fe
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Mon Oct 26 15:57:52 2020 +0100

    wip

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit e8b9766
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Mon Oct 26 15:53:36 2020 +0100

    Move previous guides to clustering quickstart section

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Update section

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Fix links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Fix links add redirects

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove aggregator mention

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove aggregator mention

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Refactor

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove Docker fo rnow

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove and correct broken links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Revert "Remove Docker fo rnow"

This reverts commit 7b56da1.

Fix more broken links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Draft docker guide

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove duplicate content

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove embedded cluster note

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Move other quickstarts

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Squashed commit of the following:

commit cfffc26414de48e6e5d4784f0908aaebb19743fe
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Mon Oct 26 15:57:52 2020 +0100

    wip

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

commit e8b9766
Author: ChrisChinchilla <chris@chronosphere.io>
Date:   Mon Oct 26 15:53:36 2020 +0100

    Move previous guides to clustering quickstart section

    Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Update section

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Fix links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Fix links add redirects

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove aggregator mention

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove aggregator mention

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Refactor

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Remove and correct broken links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

Respond to feedback

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix rendering issues

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update theme

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Remove failing git info flag

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Respond to feedback

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Respond to feedback and fix rebased files

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Remove helm mentions

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Respond to feedback

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix link

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update theme to version 0.2.2

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Tidy Go mods

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Quick fix to broken code tabs on quickstart (m3db#3010)

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [cluster] Watch follow-ups (m3db#3007)

* [coordinator] Allow clients to pass in augmentM3Tags as a config (m3db#3014)

AugmentM3Tags will augment the metric type to aggregated metrics
to be used within the filter for rules. If enabled, for example,
your filter can specify '__m3_type__:gauge' to filter by gauges.
This is particularly useful for Graphite metrics today.
Furthermore, the option is automatically enabled if static rules are
used and any filter contain an __m3_type__ tag.

* [docs] Fix conflicting OpenAPI operation IDs (m3db#3009)

* [coordinator] Include tags in view.MappingRule Equals (m3db#3015)

* [dbnode] Emit consistencyResultError from fetchTaggedResultAccumulator (m3db#3016)

* [dbnode] Introduce Resource Exhausted RPC Error flag for query limits (m3db#3017)

* [DOCS] Fix new API paths and add Docker version (m3db#3018)

* Fix new API paths and add Docker version

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Force add missing shortcode

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [dbnode] Add server.StorageOptions to TestSetup (m3db#3023)

* [dbnode] Fix "unable to satisfy consistency requirements" error message (m3db#3025)

* [coordinator] Pass along instrument options into custom rule store fn (m3db#3027)

* [DOCS] Fix services ready path (m3db#3026)

* Fix services ready path

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update theme

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [query] Add missing "status" field to PromQL error responses (m3db#2933)

* [dbnode] Add a few tests for query limit exceeded error (m3db#3030)

* [cluster] Add config wiring for etcd timeouts, retries, serialized gets (m3db#3035)

* [aggregator] Pack some tcp client structs for lower memory utilization (m3db#3037)

* [coordinator] Remove unused DisableAutoMappingRules config (m3db#3036)

* [query] Allow injecting function for rewriting error response (m3db#3008)

* [DOCS] SEO and config changes (m3db#3024)

* Add GitHub edit URL

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Add robots generation

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update redirects

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Add 404 page

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix paths

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fi title and description

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Update theme

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix random titles

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [DOCS] Add back FAQ and troubleshooting sections (m3db#3029)

* FAQ and troubleshooting sections

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* Fix links

Signed-off-by: ChrisChinchilla <chris@chronosphere.io>

* [dbnode] Client borrow connection API (m3db#3019)

* [dbnode] Support checkpointing in cold flusher and preparing index writer per block (m3db#3040)

* [coordinator] Ignore rollup rule storage policies when deciding to utilize auto-mapping rules (m3db#3044)

* [coordinator] Ignore rollup rule storage policies when deciding to utiilize auto-mapping rules

* IsRollupRule -> IsMappingRule

* [aggregator] Fix panic on invalid unaggregated payloads (m3db#3041)

* [coordinator] Add Carbon ingest latency metrics (m3db#3045)

* [query] Graphite match regexp all values for __g0__ tag matcher instead of match field (m3db#3021)

* [query] Update Graphite quoted strings to just escape quotes and a flag for previous escape behavior (m3db#3022)

* [coordinator] Add Graphite Carbon ingest contains rule matcher for fast match (m3db#3046)

* [coordinator] Add Graphite rewrite cleanup directive for cleansing incoming metrics (m3db#3047)

* [coordinator] Add Graphite Carbon rewrite cleanup integration test (m3db#3049)

* [query] implemented Graphite's `powSeries` function (m3db#3038)

* [tools] Fix failing linter (m3db#3052)

* [query] Add Graphite support for ** with metric path selectors (m3db#3020)

* [aggregator/client] Metric for dropped metrics (m3db#3054)

Problem:
When collector fails to write a metric to aggregator, it logs an error but in practice it is almost impossible to tell whether it failed to write to both peers owning a shard or only one of them, i.e. whether data is lost or not.

Solultion:
Emit a metric indicating data loss.

* [r2ctl] Bump urijs from 1.19.1 to 1.19.5 in /src/ctl/ui (m3db#3055)

Bumps [urijs](https://github.com/medialize/URI.js) from 1.19.1 to 1.19.5.
- [Release notes](https://github.com/medialize/URI.js/releases)
- [Changelog](https://github.com/medialize/URI.js/blob/gh-pages/CHANGELOG.md)
- [Commits](medialize/URI.js@v1.19.1...v1.19.5)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [coordinator] Add ability to turn off rule matching cache (m3db#3059)

* Don't attempt to lint genny gen'd code (m3db#3063)

* [dbnode][m3ninx] Rename doc.Document -> doc.Metadata (m3db#3062)

* [dtest] Upgrade dockertest to v3.6.3 (m3db#3053)

* [dbnode] Add ExcludeOrigin to skip localhost on BorrowConnections (m3db#3066)

* [dtest] Add a method for Prom remote writes (m3db#3065)

* [dbnode][m3ninx] Add support for reading document containers from an index (m3db#3050)

Reading raw document metadata from an index can end up
being rather expensive for metadata with a lot of tags.
This commit introduces the concept of encoded metadata,
which wrap the relevant section of bytes and provides
an efficient reader to retrieve the non-encoded metadata
without ballooning memory usage. Additionally, introduce a
concept of a document that wraps either raw metadata
or encoded metadata, which can be used regardless of
whether an index segment is backed by metadata (from memory)
or encoded metadata (read from disk)

* [query] Fix aggregated namespace header to be bad request instead of internal server error (m3db#3070)

* [coordinator] Add TagFilter Matches comments (m3db#3072)

* [coordinator] Remove includeRollupsOnDefaultRuleFiltering flag (m3db#3073)

Confirmed this is working as expected, and is not
useful to configure otherwise.

Co-authored-by: Rob Skillington <rob.skillington@gmail.com>

* [dbnode] Remove per-series traces on read queries (m3db#3074)

* [x] Replace InnerError() method usages with xerrors.InnerError() (m3db#3068)

* [aggregator] Add ActivePlacementVersion to tcp client (m3db#3071)

* [coordinator] Rollout augmentM3Tags flag to true by default (m3db#3082)

* [coordinator] Rollout augmentM3Tags flag to true by default

* Fixup operator docs link

* Fixup doc site links

* [matcher/coordinator] Add latency metrics to rule matching (m3db#3083)

* [dbnode][m3ninx] Use new doc.Document in query results to reduce slice allocations (m3db#3057)

* [coordinator] Disable downsampler matcher cache by default (m3db#3080)

* [dtest] Run tests on an existing cluster launched with docker compose (m3db#3067)

* Additional metrics for peers bootstrapper (m3db#3060)

* refactored instrumentation code for peers bootstrapper.
added additional timers for peers bootstrapper to track individual steps.

* fixed linter warnings

* removed dead code

* removed dead code

* changes after review

* added new gauge - bootstrap-retries

* moved some instrumentation methods to instrumentationContexts for better encapsulation and easier usage.

* forgot to set nowFn

* added methods for creating instrumentation contexts.

* changes after code review

* [dbnode] Log namespace and shard upon invalid series read/write count (m3db#3092)

* [dbnode] Fix node serving reads/marked bootstrapped after bootstrap failure (m3db#3088)

* set bootstrap state to Bootstrapped when it returns no errors

* added bootstrap regression test.

* removed new test and updated old test to check for bootstrap states.

* [cluster] Fix flaky watchmanager tests (m3db#3091)

* [dbnode] Remove allocation per series ID when streaming block from disk (m3db#3093)

* [dbnode] Remove reverse index sharing leftovers (m3db#3095)

* Limit for time series read from disk (m3db#3094)

* Limit for time series read from disk

There is additional memory costs when time series need to be loaded from
disk (in addition to the actual bytes being read). This new limit allows
docLimits to be higher so series already in memory can easily be served
without hitting limits.

* [x] Make serialize.TagValueFromEncodedTagsFast() faster (m3db#3097)

* Revert "[aggregator] keep metric type during the aggregation" (m3db#3099)

* [m3db] Check bloom filter before stream request allocation (m3db#3103)

* [m3db] Check bloom filter before stream request allocation

* Add test assertions for bloom filer misses metric

* Remove redundant series-read metric

* Capture seekerMgr instead Rlock (m3db#3104)

seekerMgr could change outside the lock, which might result in a nil
pointer

* Replace bytes.Compare() == 0 with bytes.Equal() (m3db#3101)

* [dbnode] Faster search of tag bytes in convert.FromSeriesIDAndTags (m3db#3075)

* [tests] Skip flaky TestWatchNoLeader (m3db#3106)

* [dbnode] Direct conversion of encoded tags to doc.Metadata (m3db#3087)

* [query] Implemented Graphite's pow function (m3db#3048)

* [tests] test setups exported to allow us to use it from other packages (m3db#3042)

* Disable downsample in case of agg namespace

Check if there's at least one aggregated namespace with downsample disabled. If so, disabled the downsample object entirely

* minor

Co-authored-by: arnikola <artem@chronosphere.io>
Co-authored-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Co-authored-by: Vytenis Darulis <vytenis@uber.com>
Co-authored-by: Vilius Pranckaitis <vpranckaitis@gmail.com>
Co-authored-by: Bo Du <bo@chronosphere.io>
Co-authored-by: Chris Chinchilla <chris@chronosphere.io>
Co-authored-by: Rob Skillington <rob.skillington@gmail.com>
Co-authored-by: λinas <soundvibe@users.noreply.github.com>
Co-authored-by: Wesley Kim <wesley@chronosphere.io>
Co-authored-by: nate <nbroyles@gmail.com>
Co-authored-by: Benjamin Raskin <ben@chronosphere.io>
Co-authored-by: Ryan Allen <rallen090@gmail.com>
Co-authored-by: Ryan Hall <ryanhall07@gmail.com>
Co-authored-by: Antanas Bastys <Antanukas@users.noreply.github.com>
Co-authored-by: 李国忠 <249032432@qq.com>
Co-authored-by: Matt Way <mway@users.noreply.github.com>
Co-authored-by: Alex Bublichenko <46664526+abliqo@users.noreply.github.com>
Co-authored-by: Asaf Mesika <asaf.mesika@gmail.com>
Co-authored-by: shreyassrivatsan <shreyas@chronosphere.io>
Co-authored-by: Jerome Froelich <jeromefroelich@hotmail.com>
Co-authored-by: Gediminas Guoba <gediminas@chronosphere.io>
Co-authored-by: Gibbs Cullen <59837500+gibbscullen@users.noreply.github.com>
Co-authored-by: Dean Wahle <60762514+DeanWahle@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: omarkhalid79 <omar.khalid79@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants