[VC-35738] Log with klog to stdout and stderr in Kubernetes text format #596

wallrj · 2024-10-16T22:35:21Z

I've settled on using klog and kubernetes component-base module for the reasons given in the comments in the logs.go file.

~~I think this logging UX is most likely to be familiar to Kubernetes platform admins who are likely to be deploying and debugging the venafi-kubernetes-agent.~~

After discussion with @maelvls we settled on the flags: -v, --log-level, --logging-format, and --vmodule
for maximum consistency with the other TLSPK components.

I gave up on Slog because it seemed to have unfamiliar severity / levels and I couldn't get the klog client-go logs to have the right levels with slog.

Using klog allows us to debug problems in client-go.
For example, you can view the API requests and responses using --vmodule=round_tripper=6.

In followup PRs I will convert the existing log lines to use context derived logr loggers.

$ go test ./pkg/logs/... -v -run TestLogs -count 1

...
=== RUN   TestLogs/vmodule-level-3
    logs_test.go:291: FLAGS
        --vmodule=logs_test=3
    logs_test.go:295: STDOUT
        I1025 17:39:47.529791   62093 logs.go:161] "log Print" source="vcert"
        I1025 17:39:47.529998   62093 logs_test.go:43] "slog Info"
        I1025 17:39:47.530019   62093 logs_test.go:46] klog Info
        I1025 17:39:47.530031   62093 logs_test.go:49] "klog InfoS" key="value"
        I1025 17:39:47.530035   62093 logs_test.go:51] "Contextual Info Level 3" logger="foo" key="value"

    logs_test.go:296: STDERR
        W1025 17:39:47.530012   62093 logs_test.go:44] "slog Warn"
        E1025 17:39:47.530015   62093 logs_test.go:45] "slog Error"
        W1025 17:39:47.530023   62093 logs_test.go:47] klog Warning
        E1025 17:39:47.530027   62093 logs_test.go:48] "klog Error" err="fake-error"
        E1025 17:39:47.530039   62093 logs_test.go:52] "Contextual error" err="fake-error" logger="foo" key="value"

--- PASS: TestLogs (0.07s)
    --- PASS: TestLogs/help (0.01s)
    --- PASS: TestLogs/unrecognized-flag (0.01s)
    --- PASS: TestLogs/v-long-form-not-available (0.01s)
    --- PASS: TestLogs/logging-format-unrecognized (0.01s)
    --- PASS: TestLogs/original-defaults (0.01s)
    --- PASS: TestLogs/modified-defaults (0.01s)
    --- PASS: TestLogs/logging-format-json (0.01s)
    --- PASS: TestLogs/log-json-split-stream-false (0.01s)
    --- PASS: TestLogs/logging-format-text (0.01s)
    --- PASS: TestLogs/log-text-split-stream-false (0.01s)
    --- PASS: TestLogs/v-level-3 (0.01s)
    --- PASS: TestLogs/log-level-3 (0.01s)
    --- PASS: TestLogs/vmodule-level-3 (0.01s)

$ ./hack/e2e/test.sh
...
{
  "ts": 1729108594511.0774,
  "caller": "log/log.go:245",
  "msg": "successfully gathered 30 items from \"k8s/replicasets\" datagatherer",
  "v": 0
}
{
  "ts": 1729108594511.1047,
  "caller": "log/log.go:245",
  "msg": "Posting data to: ",
  "v": 0
}
{"ts":1729108595121.0984,"caller":"log/log.go:245","msg":"Data sent successfully.","v":0}

wallrj · 2024-10-16T22:48:24Z

hack/e2e/test.sh

-  | grep -q "Data sent successfully"
+        --follow \
+        --namespace venafi \
+    | timeout 60 jq 'if .msg | test("Data sent successfully") then . | halt_error(0) end'


This is a hack but it halts the pipeline when the expected msg value is observed and prints the matching record to stderr:

https://jqlang.github.io/jq/manual/#halt_error

The timeout stops the pipeline after 60 seconds if the expected msg is not observed.

wallrj · 2024-10-16T22:49:01Z

pkg/logs/logs.go

+	"k8s.io/component-base/featuregate"
+	"k8s.io/component-base/logs"
+	logsapi "k8s.io/component-base/logs/api/v1"
+	_ "k8s.io/component-base/logs/json/register"


Importing this module, is what makes the json log format option available.

james-w · 2024-10-17T14:46:01Z

Hi, does cert-manager and the other components use klog? Do they use the conventions that this MR is adopting?

maelvls · 2024-10-18T07:50:48Z

Hey Richard, great job with adding unit tests around the logs. I really appreciate the effort. Thanks to you, I can just run go test -v to check what the various logging outputs are is great. Well done!

Does cert-manager and the other components use klog? Do they use the conventions that this MR is adopting?

To the question of consistency:

The library convention is klog for older projects such as cert-manager.
For newer projects, the convention is to use klogr as we prefer using structured logging
The default format convention seems to be klog's text format.

For that reason, the choice of klogr would make more sense in the Venafi Kubernetes Agent as it would bring structured logging. My first suggestion would be for us to start with klogr since klogr and klog don't have the same flags, and that would reduce the chance for people realizing they can't use --v anymore. We don't need to re-structure the logs right away, we can do that later (if ever).

Note that I found in kubernetes/klog@44eadc3 that slog is preferred over klogr since last year:

The klogr package doesn't get updated. It shouldn't be used anymore. To make that clearer, it gets marked as "deprecated".

My second suggestion, based on the fact that klogr is being phased out, would be to use std lib's log/slog and use klog as the backend. WDYT @wallrj?

Full picture:

project	lib	flag	values	default**	supports json format
cert-manager	klog	`-v`, `--v`	0 to 9	info, 2	yes with `--logging-format=json`, default to klog's text format
cert-manager webhook	klog	`-v`, `--v`	0 to 9	info, 2	yes with `--logging-format=json`, default to klog's text format
openshift-routes	klogr	`-v`, `--log-level`	0 to 9	info, 1	no, uses klog's text format
csi-driver	klogr	`-v`, `--log-level`	0 to 9	info, 1	no, uses klog's text format
trust-manager	klogr	`-v`, `--log-level`	0 to 9	info, 1	yes with `--log-format=json`, defaults to klog's text format
approver-policy	klogr	`-v`, `--log-level`	0 to 9	info, 1	yes with `--log-format=json`, defaults to klog's text format
venafi-enhanced-issuer	zap	`-zap-log-level`	debug, info, error, 0 to 9	info	yes with `--zap-encoder=json`, defaults to zap's text format
Agent (before this PR)	log + klog*	none			no, uses log's and klogs' text formats
Agent (with this PR)	klog	`-v`, `--v`	0 to 9	?	yes by default, can use klog's text format using `--logging-format=text`
venctl	slog***	`--log-level`	debug, info, warn, error	info	yes with `--log-format=json`, defaults to slog's text format
vcert	log	`--verbose`	true, false	false	no, uses log's text format

* log is the standard library's log package. It is used for the agent's code, while klog is used by the client-go library that the agent uses.
** For example, "info, 4" means that the default severity is "info" and the default level for the "info" severity is 4. The level is always for the "info" level.
*** "slog" and "log" belongs to the standard library (log/slog).

james-w · 2024-10-18T09:32:57Z

Hi, yep, consistency was exactly my concern, thanks for the detailed response. In my opinion it's favourable to have a smaller set to make updates easier down the line, even if that smaller set doesn't include the current best practice yet. If this project were to adopt the conventions of the majority of the other projects then we would have more consistency, and could figure out how/when to migrate them en-masse. Alternatively there may be an argument to update to the latest/greatest now, it depends on the details.

It sounds like you think that log/slog would be preferable to klog at this moment in this project?

maelvls · 2024-10-18T14:32:27Z

@wallrj I've reviewed your PR and made some small adjustments:

In d9afd95, I moved everything to slog (it was pretty easy). We can later migrate every logs.Log.Printf with slog.Info.
In 33f5a12, I added assertions to the stderr and stdout outputs of each test case so that we can track the changes I have made when I moved to slog.
In d9afd95, I've added a "source" to indicate whether a log line originates from VCert or from the agent itself.
In 4428a24, I've made sure the logs.Log.Fatalf logs are showing with the ERROR severity in slog.
In 6752d49, I removed --v and replaced it with --log-level to match the long form of -v that most of our components uses (approver-policy, trust-manager, openshift-routes, and csi-driver). The help now looks like this:
```
+  -v, --log-level Level         number for the log level verbosity
-  -v, --v Level                 number for the log level verbosity
```

Regarding the flags:

It seems we (the cert-manager project) have initiated the convention of the flags -v and --log-level. It has nothing to do with klogr; klogr itself doesn't have a convention around the flag names. Example: https://github.com/cert-manager/openshift-routes/blob/0da68aed69ef66c8c9cefde45d855410813a6b3c/internal/cmd/app/options/options.go#L114-L116

I propose that we ~~only expose the -v flag~~ expose --log-level instead of --v since that's the long form flag that is the most common across our components (approver-policy, trust-manager, openshift-routes, and csi-driver).

Regarding the slog severity problem:

I gave up on Slog because it seemed to have unfamiliar severity / levels and I couldn't get the klog client-go logs to have the right levels with slog.

I've investigated and was able to configure an slog logger in d9afd95. To avoid lots of unrelated changes, the agent logs are still using the standard library's *log.Logger.

Regarding the output format:

I found that none of our projects currently output JSON by default. You mentioned the Kubernetes logging conventions, which say that the JSON format must be enabled with --logging-format, which means that if not passed, the output shouldn't be JSON:

Requires passing --logging-format=json to enable.

@wallrj IMO we should set the default output to "text"... WDYT? I haven't reverted to "text". I'll let you decide.

@wallrj Feel free to merge this PR once you are happy with it. I'll approve it now.

maelvls · 2024-10-18T16:58:40Z

pkg/logs/logs_test.go

+			flags: "",
+			expectStdout: `
+{"ts":0000000000000.000,"caller":"logs/logs.go:000","msg":"log Print","source":"vcert","v":0}
+{"ts":0000000000000.000,"caller":"logs/logs_test.go:000","msg":"slog Info","v":0}


I wish we were able to show the real source of the log line rather than the generic logs/logs_test.go. I guess we can improve this once we have migrated the logs.Log.Printf calls to slog.Info.

wallrj · 2024-10-25T16:56:22Z

@wallrj IMO we should set the default output to "text"... WDYT? I haven't reverted to "text". I'll let you decide.

I agree that text format by default is more consistent with other TLSPK components. Done.

@wallrj Feel free to merge this PR once you are happy with it. I'll approve it now.

I'll merge this after the 1.2.0 release because I'd like to finish the other logging improvements before releasing to customers.

inteon · 2024-10-28T09:32:02Z

pkg/logs/logs.go

+	// the agent, which still uses log.Printf.
+	slog := slog.Default()
+
+	Log = &log.Logger{}


Can we just use slog.SetDefault(...) here?
Using slog.New(logr.ToSlogHandler(logger)) we can convert the klog logger to an slog logger.

Maybe. I'll consider that in a follow up.

inteon · 2024-10-28T16:23:08Z

pkg/logs/logs.go

+		if f.Name == "logging-format" {
+			f.Usage = `Sets the log format. Permitted formats: "json", "text".`
+		}
+		if f.Name == "log-text-split-stream" {


Why are we splitting the log streams? Doesn't this make it hard to reconstruct the order or messages?

Here's my current understanding:

I enabled log-json-split-stream by default to solve the problem that Google Logs Explorer otherwise can't distinguish between JSON formatted errors and info messages.

Severities: By default, logs written to the standard output are on the INFO level and logs written to the standard error are on the ERROR level. Structured logs can include a severity field, which defines the log's severity.
-- https://cloud.google.com/kubernetes-engine/docs/concepts/about-logs#best_practices

(klog JSON format does not include a severity field).

log-text-split-stream is not technically necessary to solve that problem because Google Logs Explorer seems to parse the E, W, and I prefix of the Kubernetes text format. So yes, I could remove this change.

Kubernetes introduced the split-stream options to solve performance problems. Info and warn messages can be buffered and flushed periodically while errors are flushed immediately.

JSON output streams kubernetes/kubernetes#104873

Move KEP-2845 to implementable kubernetes/enhancements#2912 (comment)

By default logs are not buffered and are flushed to stdout and stderr immediately, so there shouldn't be any out-of-order log messages.

If log buffering is enabled, then I think that the buffered info logs will be flushed whenever an error is logged, so there should be any out-of-order log messages in that case either.

Kubernetes plan to make split-stream JSON logging the default for the Kubernetes components.

Ability to route logs based on type/verbosity will be replaced with default splitting info and errors logs to stdout and stderr. We will make this change only in alternative logging formats (like JSON) as we don't want to introduce breaking change in default configuration. Splitting stream will allow treating info and errors with different priorities. It will unblock efforts like Make info logging non-blocking kubernetes/klog#209 to make info logs non-blocking.

-- https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components#logging-defaults

Signed-off-by: Richard Wall <richard.wall@venafi.com>

For some reason, `assert.Equal` doesn't cause the test sub-process to exit with 1. Instead, it returns 0 which means we can't notice the error from the parent process.

That's useful to keep track what the logs look like, especially now that I'm trying to switch from plain klog to slog.

Before, there was no way to know if a given log came from the VCert library or from the agent itself. I had to tweak logToSlogWriter's Writer to remove the extraneous "\n" that log.Printf prints.

Signed-off-by: Richard Wall <richard.wall@venafi.com>

The exact logs output is now verified. Signed-off-by: Richard Wall <richard.wall@venafi.com>

Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj requested a review from maelvls October 16, 2024 22:42

wallrj mentioned this pull request Oct 16, 2024

WIP: Log with slog to stdout in JSON form #595

Closed

wallrj commented Oct 16, 2024

View reviewed changes

wallrj force-pushed the VC-35738/klog-3 branch 2 times, most recently from 411461d to 161382f Compare October 17, 2024 07:10

maelvls force-pushed the VC-35738/klog-3 branch 3 times, most recently from a87bfd3 to 4428a24 Compare October 18, 2024 15:26

maelvls reviewed Oct 18, 2024

View reviewed changes

maelvls force-pushed the VC-35738/klog-3 branch from 1eb699a to a41b832 Compare October 18, 2024 16:59

wallrj force-pushed the VC-35738/klog-3 branch from c2b6d69 to 9771a25 Compare October 25, 2024 16:36

wallrj changed the title ~~[VC-35738] Log with klog to stdout and stderr in JSON format~~ [VC-35738] Log with klog to stdout and stderr in Kubernetes text format Oct 25, 2024

inteon reviewed Oct 28, 2024

View reviewed changes

wallrj and others added 10 commits October 29, 2024 09:33

Log with klog to stdout and stderr in JSON format

e745fe0

Signed-off-by: Richard Wall <richard.wall@venafi.com>

logs_test.go: assert didn't make the test exit with 1, move to require

08d5e8c

For some reason, `assert.Equal` doesn't cause the test sub-process to exit with 1. Instead, it returns 0 which means we can't notice the error from the parent process.

logs_test.go: assert the logs themselves

dcca1b3

That's useful to keep track what the logs look like, especially now that I'm trying to switch from plain klog to slog.

logs.go: move to slog and show the src (vcert or agent) of the logs

ba052e4

Before, there was no way to know if a given log came from the VCert library or from the agent itself. I had to tweak logToSlogWriter's Writer to remove the extraneous "\n" that log.Printf prints.

logs.go: re-wrap the introductionary comment, AKA -> aka.

fda1d3e

logs.go: LogToSlogWriter wasn't setting the right logs as ERROR

e4bf7cb

logs_test.go: replaceWithStaticTimestamps breaks for short PIDs

d9bc223

logs.go: use --log-level instead of --v

c39594a

logs_test.go: replaceWithStaticTimestamps: ts may not have nanoseconds

f3b721e

Set the default logging format to text

6f8c52e

Signed-off-by: Richard Wall <richard.wall@venafi.com>

Remove redundant test for vcert log prefix

73526f3

The exact logs output is now verified. Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj force-pushed the VC-35738/klog-3 branch from 9771a25 to 73526f3 Compare October 29, 2024 10:27

Fix some out-of-date comments

54eb5a5

Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj mentioned this pull request Oct 30, 2024

[VC-35738] Move the pprof endpoints to the main HTTP server rather than starting a second server #600

Merged

wallrj changed the base branch from master to VC-35738/feature October 31, 2024 09:34

wallrj merged commit 415d434 into VC-35738/feature Oct 31, 2024
2 checks passed

wallrj deleted the VC-35738/klog-3 branch October 31, 2024 09:36

wallrj mentioned this pull request Oct 31, 2024

[VC-35738] Feature Branch #602

Merged

12 tasks

This was referenced Nov 20, 2024

Remove Redundant logs #576

Closed

A flag for setting the log level and more verbose log output #220

Closed

[VC-37264] Update E2E test to check for certificate in the API #629

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VC-35738] Log with klog to stdout and stderr in Kubernetes text format #596

[VC-35738] Log with klog to stdout and stderr in Kubernetes text format #596

wallrj commented Oct 16, 2024 •

edited

Loading

wallrj Oct 16, 2024

wallrj Oct 16, 2024

james-w commented Oct 17, 2024

maelvls commented Oct 18, 2024 •

edited

Loading

james-w commented Oct 18, 2024

maelvls commented Oct 18, 2024 •

edited

Loading

maelvls Oct 18, 2024

wallrj commented Oct 25, 2024

inteon Oct 28, 2024

wallrj Oct 31, 2024

inteon Oct 28, 2024

wallrj Oct 29, 2024 •

edited

Loading

[VC-35738] Log with klog to stdout and stderr in Kubernetes text format #596

[VC-35738] Log with klog to stdout and stderr in Kubernetes text format #596

Conversation

wallrj commented Oct 16, 2024 • edited Loading

wallrj Oct 16, 2024

Choose a reason for hiding this comment

wallrj Oct 16, 2024

Choose a reason for hiding this comment

james-w commented Oct 17, 2024

maelvls commented Oct 18, 2024 • edited Loading

james-w commented Oct 18, 2024

maelvls commented Oct 18, 2024 • edited Loading

maelvls Oct 18, 2024

Choose a reason for hiding this comment

wallrj commented Oct 25, 2024

inteon Oct 28, 2024

Choose a reason for hiding this comment

wallrj Oct 31, 2024

Choose a reason for hiding this comment

inteon Oct 28, 2024

Choose a reason for hiding this comment

wallrj Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

wallrj commented Oct 16, 2024 •

edited

Loading

maelvls commented Oct 18, 2024 •

edited

Loading

maelvls commented Oct 18, 2024 •

edited

Loading

wallrj Oct 29, 2024 •

edited

Loading