Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VC-35738] Log with klog to stdout and stderr in Kubernetes text format #596

Merged
merged 12 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"os"
"strings"

"github.com/jetstack/preflight/pkg/logs"
"github.com/spf13/cobra"
"github.com/spf13/pflag"
)
Expand All @@ -17,6 +18,9 @@ var rootCmd = &cobra.Command{
configuration checks using Open Policy Agent (OPA).

Preflight checks are bundled into Packages`,
PersistentPreRun: func(cmd *cobra.Command, args []string) {
logs.Initialize()
},
}

func init() {
Expand All @@ -28,6 +32,8 @@ func init() {
// Execute adds all child commands to the root command and sets flags appropriately.
// This is called by main.main(). It only needs to happen once to the rootCmd.
func Execute() {
logs.AddFlags(rootCmd.PersistentFlags())

if err := rootCmd.Execute(); err != nil {
fmt.Println(err)
os.Exit(1)
Expand Down
7 changes: 4 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module github.com/jetstack/preflight
go 1.22.0

require (
github.com/Venafi/vcert/v5 v5.7.1
github.com/cenkalti/backoff v2.2.1+incompatible
github.com/d4l3k/messagediff v1.2.1
github.com/fatih/color v1.17.0
Expand All @@ -22,13 +23,13 @@ require (
k8s.io/api v0.31.1
k8s.io/apimachinery v0.31.1
k8s.io/client-go v0.31.1
k8s.io/component-base v0.31.0
sigs.k8s.io/controller-runtime v0.19.0
sigs.k8s.io/yaml v1.4.0
)

require (
github.com/Khan/genqlient v0.7.0 // indirect
github.com/Venafi/vcert/v5 v5.7.1 // indirect
github.com/antlr4-go/antlr/v4 v4.13.0 // indirect
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 // indirect
github.com/aymerick/douceur v0.2.0 // indirect
Expand All @@ -37,6 +38,7 @@ require (
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/go-http-utils/headers v0.0.0-20181008091004-fed159eddc2a // indirect
github.com/go-logr/zapr v1.3.0 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/google/cel-go v0.20.1 // indirect
github.com/google/gnostic-models v0.6.9-0.20230804172637-c7be7c783f49 // indirect
Expand All @@ -62,7 +64,6 @@ require (
gopkg.in/ini.v1 v1.67.0 // indirect
k8s.io/apiextensions-apiserver v0.31.0 // indirect
k8s.io/apiserver v0.31.0 // indirect
k8s.io/component-base v0.31.0 // indirect
)

require (
Expand Down Expand Up @@ -100,7 +101,7 @@ require (
google.golang.org/protobuf v1.34.2 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v3 v3.0.1
k8s.io/klog/v2 v2.130.1 // indirect
k8s.io/klog/v2 v2.130.1
k8s.io/kube-openapi v0.0.0-20240430033511-f0e62f92d13f // indirect
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
Expand Down
13 changes: 5 additions & 8 deletions hack/e2e/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -191,14 +191,11 @@ EOF
envsubst <application-team-1.yaml | kubectl apply -f -
kubectl -n team-1 wait certificate app-0 --for=condition=Ready

# Wait for log message indicating success.
# Filter out distracting data gatherer errors and warnings.
# Show other useful log messages on stderr.
# Wait 60s for log message indicating success.
# Parse logs as JSON using jq to ensure logs are all JSON formatted.
# Disable pipefail to prevent SIGPIPE (141) errors from tee
# See https://unix.stackexchange.com/questions/274120/pipe-fail-141-when-piping-output-into-tee-why
set +o pipefail
kubectl logs deployments/venafi-kubernetes-agent \
--follow \
--namespace venafi \
| tee >(grep -v -e "reflector\.go" -e "datagatherer" -e "data gatherer" >/dev/stderr) \
| grep -q "Data sent successfully"
--follow \
--namespace venafi \
| timeout 60 jq 'if .msg | test("Data sent successfully") then . | halt_error(0) end'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack but it halts the pipeline when the expected msg value is observed and prints the matching record to stderr:

The timeout stops the pipeline after 60 seconds if the expected msg is not observed.

3 changes: 3 additions & 0 deletions hack/e2e/values.venafi-kubernetes-agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@ config:
authentication:
venafiConnection:
enabled: true

extraArgs:
- --logging-format=json
155 changes: 154 additions & 1 deletion pkg/logs/logs.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,161 @@
package logs

import (
"bytes"
"fmt"
"log"
"log/slog"
"os"
"strings"

"github.com/spf13/pflag"
"k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/apimachinery/pkg/util/sets"
"k8s.io/component-base/featuregate"
"k8s.io/component-base/logs"
logsapi "k8s.io/component-base/logs/api/v1"
_ "k8s.io/component-base/logs/json/register"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing this module, is what makes the json log format option available.

)

// venafi-kubernetes-agent follows [Kubernetes Logging Conventions] and writes
// logs in [Kubernetes text logging format] by default. It does not support
// named levels (aka. severity), instead it uses arbitrary levels. Errors and
// warnings are logged to stderr and Info messages to stdout, because that is
// how some cloud logging systems (notably Google Cloud Logs Explorer) assign a
// severity (INFO or ERROR) in the UI. The agent's and vcert's logs are written
// logged as Info messages with level=0.
//
// Further reading:
// - [Kubernetes logging conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md)
// - [Kubernetes text logging format](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#text-logging-format)
// - [Why not named levels, like Info/Warning/Error?](https://github.com/go-logr/logr?tab=readme-ov-file#why-not-named-levels-like-infowarningerror)
// - [GKE logs best practices](https://cloud.google.com/kubernetes-engine/docs/concepts/about-logs#best_practices)
// - [Structured Logging KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/1602-structured-logging/README.md)
// - [Examples of using k8s.io/component-base/logs](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/component-base/logs/example),
// upon which this code was based.

var (
// This is the Agent's logger. For now, it is still a *log.Logger, but we
// mean to migrate everything to slog with the klog backend. We avoid using
// log.Default because log.Default is already used by the VCert library, and
// we need to keep the agent's logger from the VCert's logger to be able to
// remove the `vCert: ` prefix from the VCert logs.
Log *log.Logger

// All but the essential logging flags will be hidden to avoid overwhelming
// the user. The hidden flags can still be used. For example if a user does
// not like the split-stream behavior and a Venafi field engineer can
// instruct them to patch --log-json-split-stream=false on to the Deployment
// arguments.
visibleFlagNames = sets.New[string]("v", "vmodule", "logging-format")
// This default logging configuration will be updated with values from the
// logging flags, even those that are hidden.
configuration = logsapi.NewLoggingConfiguration()
// Logging features will be added to this feature gate, but the
// feature-gates flag will be hidden from the user.
features = featuregate.NewFeatureGate()
)

var Log = log.New(os.Stderr, "", log.LstdFlags)
func init() {
runtime.Must(logsapi.AddFeatureGates(features))
// Turn on ALPHA options to enable the split-stream logging options.
runtime.Must(features.OverrideDefault(logsapi.LoggingAlphaOptions, true))
}

// AddFlags adds log related flags to the supplied flag set.
//
// The split-stream options are enabled by default, so that errors are logged to
// stderr and info to stdout, allowing cloud logging systems to assign a
// severity INFO or ERROR to the messages.
func AddFlags(fs *pflag.FlagSet) {
var tfs pflag.FlagSet
logsapi.AddFlags(configuration, &tfs)
features.AddFlag(&tfs)
tfs.VisitAll(func(f *pflag.Flag) {
if !visibleFlagNames.Has(f.Name) {
tfs.MarkHidden(f.Name)
}

// The original usage string includes details about how
// JSON logging is only available when BETA logging features are
// enabled, but that's not relevant here because the feature is enabled
// by default.
if f.Name == "logging-format" {
f.Usage = `Sets the log format. Permitted formats: "json", "text".`
}
if f.Name == "log-text-split-stream" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we splitting the log streams? Doesn't this make it hard to reconstruct the order or messages?

Copy link
Member Author

@wallrj wallrj Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my current understanding:

  1. I enabled log-json-split-stream by default to solve the problem that Google Logs Explorer otherwise can't distinguish between JSON formatted errors and info messages.

    Severities: By default, logs written to the standard output are on the INFO level and logs written to the standard error are on the ERROR level. Structured logs can include a severity field, which defines the log's severity.
    -- https://cloud.google.com/kubernetes-engine/docs/concepts/about-logs#best_practices

    (klog JSON format does not include a severity field).

  2. log-text-split-stream is not technically necessary to solve that problem because Google Logs Explorer seems to parse the E, W, and I prefix of the Kubernetes text format. So yes, I could remove this change.

  3. Kubernetes introduced the split-stream options to solve performance problems. Info and warn messages can be buffered and flushed periodically while errors are flushed immediately.

  4. By default logs are not buffered and are flushed to stdout and stderr immediately, so there shouldn't be any out-of-order log messages.

  5. If log buffering is enabled, then I think that the buffered info logs will be flushed whenever an error is logged, so there should be any out-of-order log messages in that case either.

  6. Kubernetes plan to make split-stream JSON logging the default for the Kubernetes components.

    Ability to route logs based on type/verbosity will be replaced with default splitting info and errors logs to stdout and stderr. We will make this change only in alternative logging formats (like JSON) as we don't want to introduce breaking change in default configuration. Splitting stream will allow treating info and errors with different priorities. It will unblock efforts like Make info logging non-blocking kubernetes/klog#209 to make info logs non-blocking.

    -- https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components#logging-defaults

f.DefValue = "true"
runtime.Must(f.Value.Set("true"))
}
if f.Name == "log-json-split-stream" {
f.DefValue = "true"
runtime.Must(f.Value.Set("true"))
}

// Since `--v` (which is the long form of `-v`) isn't the standard in
// our projects (it only exists in cert-manager, webhook, and such),
// let's rename it to the more common `--log-level`, which appears in
// openshift-routes, csi-driver, trust-manager, and approver-policy.
// More details at:
// https://github.com/jetstack/jetstack-secure/pull/596#issuecomment-2421708181
if f.Name == "v" {
f.Name = "log-level"
f.Shorthand = "v"
}
})
fs.AddFlagSet(&tfs)
}

// Initialize uses k8s.io/component-base/logs, to configure the following global
// loggers: log, slog, and klog. All are configured to write in the same format.
func Initialize() {
// This configures the global logger in klog *and* slog, if compiled with Go
// >= 1.21.
logs.InitLogs()
if err := logsapi.ValidateAndApply(configuration, features); err != nil {
fmt.Fprintf(os.Stderr, "Error in logging configuration: %v\n", err)
os.Exit(2)
}

// Thanks to logs.InitLogs, slog.Default now uses klog as its backend. Thus,
// the client-go library, which relies on klog.Info, has the same logger as
// the agent, which still uses log.Printf.
slog := slog.Default()

Log = &log.Logger{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use slog.SetDefault(...) here?
Using slog.New(logr.ToSlogHandler(logger)) we can convert the klog logger to an slog logger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I'll consider that in a follow up.

Log.SetOutput(LogToSlogWriter{Slog: slog, Source: "agent"})

// Let's make sure the VCert library, which is the only library we import to
// be using the global log.Default, also uses the common slog logger.
vcertLog := log.Default()
vcertLog.SetOutput(LogToSlogWriter{Slog: slog, Source: "vcert"})
// This is a work around for a bug in vcert where it adds a `vCert: ` prefix
// to the global log logger. It can be removed when this is fixed upstream
// in vcert: https://github.com/Venafi/vcert/pull/512
vcertLog.SetPrefix("")
}

type LogToSlogWriter struct {
Slog *slog.Logger
Source string
}

func (w LogToSlogWriter) Write(p []byte) (n int, err error) {
// log.Printf writes a newline at the end of the message, so we need to trim
// it.
p = bytes.TrimSuffix(p, []byte("\n"))

message := string(p)
if strings.Contains(message, "error") ||
strings.Contains(message, "failed") ||
strings.Contains(message, "fatal") ||
strings.Contains(message, "Failed") ||
strings.Contains(message, "While evaluating configuration") ||
strings.Contains(message, "data-path override present") ||
strings.Contains(message, "Cannot marshal readings") {
w.Slog.With("source", w.Source).Error(message)
} else {
w.Slog.With("source", w.Source).Info(message)
}
return len(p), nil
}
Loading