-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: new logging channels OPS and HEALTH #57171
Conversation
7b82452
to
de2e1a9
Compare
84bc526
to
1f86e4a
Compare
Rebased, RFAL |
1f86e4a
to
e436c83
Compare
0d6fb5d
to
f7e9558
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 15 of 15 files at r1, 48 of 48 files at r2, 4 of 4 files at r3, 8 of 8 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)
pkg/gossip/gossip.go, line 495 at r2 (raw file):
if newResolverFound { if log.V(1) { log.Health.Infof(ctx, "found new resolvers from storage; signaling bootstrap")
This seems more like an Ops event to me?
pkg/server/status/runtime.go, line 452 at r2 (raw file):
mem := gosigar.ProcMem{} if err := mem.Get(pid); err != nil { log.Health.Errorf(ctx, "unable to get mem usage: %v", err)
Also seems like more of an Ops error, as we can probably stay healthy without this? (Same for all the other calls in this method). I don't feel strongly either way though.
pkg/util/log/logconfig/testdata/yaml, line 99 at r2 (raw file):
---- ERROR: yaml: unmarshal errors: line 26: field fluent-servers not found in type logconfig.SinkConfig
Might want to remove the fluent-servers
part from this PR.
f7e9558
to
f0e7d32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal)
pkg/gossip/gossip.go, line 495 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
This seems more like an Ops event to me?
yep, good catch
pkg/server/status/runtime.go, line 452 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Also seems like more of an Ops error, as we can probably stay healthy without this? (Same for all the other calls in this method). I don't feel strongly either way though.
All good points. Done.
pkg/util/log/logconfig/testdata/yaml, line 99 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Might want to remove the
fluent-servers
part from this PR.
Good point. Done.
TFYR! bors r=itsbilal |
Merge conflict. |
Release note (cli change): Logging events that are relevant to cluster operators are now categorized under the new OPS and HEALTH logging channels. These can now be redirected separately from other logging events. The OPS channel is the channel used to report "point" operational events, initiated by user operators or automation: - operator or system actions on server processes: process starts, stops, shutdowns, crashes (if they can be logged), including each time: command-line parameters, current version being run. - actions that impact the topology of a cluster: node additions, removals, decommissions, etc. - cluster setting changes. - zone configuration changes. The HEALTH channel is the channel used to report "background" operational events, initiated by CockroachDB or reporting on automatic processes: - current resource usage, including critical resource usage. - node-node connection events, including connection errors and gossip details. - range and table leasing events. - up-, down-replication; range unavailability.
Release note (cli change): Server terminations that are triggered when a node encounters an internal fatal error are now reported on the OPS channel. The exact text of the error is not reported on the OPS channel however, as it may be complex (e.g. when there is a replica inconsistency) and the OPS channels is typically monitored by tools that just detect irregularities. The text of the message refers instead to the channel where the additional details can be found.
Zone config changes interest the DBAs of the cluster as a whole more than they interest the DBA of the individual SQL application. Release note (cli change): The notable events `set_zone_config` and `remove_zone_config` are now sent to the OPS channel.
f0e7d32
to
3a65664
Compare
bors r=itsbilal |
Build succeeded: |
These two commits introduce the OPS and HEALTH channel.
See the individual commits for details.