*: new logging channels OPS and HEALTH #57171

knz · 2020-11-26T11:36:37Z

These two commits introduce the OPS and HEALTH channel.

See the individual commits for details.

cockroach-teamcity · 2020-11-26T11:36:46Z

This change is

knz · 2020-12-03T17:52:53Z

Rebased, RFAL

knz · 2020-12-14T15:08:53Z

I have rebased this off master so as to not make it dependent on #57170. Hopefully this makes it easier to review. @itsbilal can you have a look? Thanks

itsbilal

save for some minor comments

Reviewed 15 of 15 files at r1, 48 of 48 files at r2, 4 of 4 files at r3, 8 of 8 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)

pkg/gossip/gossip.go, line 495 at r2 (raw file):

	if newResolverFound {
		if log.V(1) {
			log.Health.Infof(ctx, "found new resolvers from storage; signaling bootstrap")

This seems more like an Ops event to me?

pkg/server/status/runtime.go, line 452 at r2 (raw file):

	mem := gosigar.ProcMem{}
	if err := mem.Get(pid); err != nil {
		log.Health.Errorf(ctx, "unable to get mem usage: %v", err)

Also seems like more of an Ops error, as we can probably stay healthy without this? (Same for all the other calls in this method). I don't feel strongly either way though.

pkg/util/log/logconfig/testdata/yaml, line 99 at r2 (raw file):

----
ERROR: yaml: unmarshal errors:
  line 26: field fluent-servers not found in type logconfig.SinkConfig

Might want to remove the fluent-servers part from this PR.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal)

pkg/gossip/gossip.go, line 495 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This seems more like an Ops event to me?

yep, good catch

pkg/server/status/runtime.go, line 452 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Also seems like more of an Ops error, as we can probably stay healthy without this? (Same for all the other calls in this method). I don't feel strongly either way though.

All good points. Done.

pkg/util/log/logconfig/testdata/yaml, line 99 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Might want to remove the fluent-servers part from this PR.

Good point. Done.

knz · 2020-12-15T22:05:19Z

TFYR!

bors r=itsbilal

craig · 2020-12-15T23:35:45Z

Merge conflict.

Release note (cli change): Logging events that are relevant to cluster operators are now categorized under the new OPS and HEALTH logging channels. These can now be redirected separately from other logging events. The OPS channel is the channel used to report "point" operational events, initiated by user operators or automation: - operator or system actions on server processes: process starts, stops, shutdowns, crashes (if they can be logged), including each time: command-line parameters, current version being run. - actions that impact the topology of a cluster: node additions, removals, decommissions, etc. - cluster setting changes. - zone configuration changes. The HEALTH channel is the channel used to report "background" operational events, initiated by CockroachDB or reporting on automatic processes: - current resource usage, including critical resource usage. - node-node connection events, including connection errors and gossip details. - range and table leasing events. - up-, down-replication; range unavailability.

Release note (cli change): Server terminations that are triggered when a node encounters an internal fatal error are now reported on the OPS channel. The exact text of the error is not reported on the OPS channel however, as it may be complex (e.g. when there is a replica inconsistency) and the OPS channels is typically monitored by tools that just detect irregularities. The text of the message refers instead to the channel where the additional details can be found.

Zone config changes interest the DBAs of the cluster as a whole more than they interest the DBA of the individual SQL application. Release note (cli change): The notable events `set_zone_config` and `remove_zone_config` are now sent to the OPS channel.

knz · 2020-12-16T00:47:40Z

bors r=itsbilal

craig · 2020-12-16T01:25:10Z

Build succeeded:

GitHub CI (Cockroach)

knz requested a review from itsbilal November 26, 2020 11:36

knz requested a review from a team as a code owner November 26, 2020 11:36

knz mentioned this pull request Nov 26, 2020

util/log: new logging channels SQL_SCHEMA, USER_ADMIN and PRIVILEGES #51987

Merged

knz force-pushed the 20201126-log-ops branch from 7b82452 to de2e1a9 Compare November 30, 2020 19:39

knz mentioned this pull request Dec 1, 2020

SLI metric that tracks availability of nodes and possibly ranges at node liveness level #57071

Closed

knz force-pushed the 20201126-log-ops branch 3 times, most recently from 84bc526 to 1f86e4a Compare December 3, 2020 17:28

knz mentioned this pull request Dec 7, 2020

logging: standardize the event log and bring structured details to external logs #57629

Closed

26 tasks

knz force-pushed the 20201126-log-ops branch from 1f86e4a to e436c83 Compare December 14, 2020 15:07

knz force-pushed the 20201126-log-ops branch 2 times, most recently from 0d6fb5d to f7e9558 Compare December 14, 2020 16:07

itsbilal approved these changes Dec 14, 2020

View reviewed changes

knz force-pushed the 20201126-log-ops branch from f7e9558 to f0e7d32 Compare December 15, 2020 22:04

knz commented Dec 15, 2020

View reviewed changes

knz added 3 commits December 16, 2020 01:46

knz force-pushed the 20201126-log-ops branch from f0e7d32 to 3a65664 Compare December 16, 2020 00:47

craig bot merged commit 265122d into cockroachdb:master Dec 16, 2020

knz deleted the 20201126-log-ops branch December 16, 2020 12:50

This was referenced Jan 29, 2021

*: new logging channels OPS and HEALTH cockroachdb/docs#9536

Closed

*: new logging channels OPS and HEALTH cockroachdb/docs#9629

Closed

*: new logging channels OPS and HEALTH cockroachdb/docs#9630

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: new logging channels OPS and HEALTH #57171

*: new logging channels OPS and HEALTH #57171

knz commented Nov 26, 2020 •

edited

Loading

cockroach-teamcity commented Nov 26, 2020

knz commented Dec 3, 2020

knz commented Dec 14, 2020

itsbilal left a comment

knz left a comment

knz commented Dec 15, 2020

craig bot commented Dec 15, 2020

knz commented Dec 16, 2020

craig bot commented Dec 16, 2020

*: new logging channels OPS and HEALTH #57171

*: new logging channels OPS and HEALTH #57171

Conversation

knz commented Nov 26, 2020 • edited Loading

cockroach-teamcity commented Nov 26, 2020

knz commented Dec 3, 2020

knz commented Dec 14, 2020

itsbilal left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Dec 15, 2020

craig bot commented Dec 15, 2020

knz commented Dec 16, 2020

craig bot commented Dec 16, 2020

knz commented Nov 26, 2020 •

edited

Loading