Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2845: Initial draft #2846

Merged
merged 5 commits into from
Aug 26, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
# KEP-2845: Deprecate klog specific flags in Kubernetes Compnents

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Writing logs to files](#writing-logs-to-files)
- [Caveats](#caveats)
- [Risks and Mitigations](#risks-and-mitigations)
- [Users don't want to use go-runner as replacement.](#users-dont-want-to-use-go-runner-as-replacement)
- [Log processing in parent process causes performance problems](#log-processing-in-parent-process-causes-performance-problems)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Beta](#beta)
- [GA](#ga)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Continue supporting all klog features](#continue-supporting-all-klog-features)
- [Release klog 3.0 with removed features](#release-klog-30-with-removed-features)
<!-- /toc -->

## Release Signoff Checklist

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- [ ] e2e Tests for all Beta API Operations (endpoints)
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
- [ ] (R) Graduation criteria is in place
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Production readiness review completed
- [ ] (R) Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

This KEP proposes to deprecate and in the future to remove a subset of the klog
command line flags from Kubernetes components, with goal of making logging of
k8s core components simpler, easier to maintain and extend by community.

## Motivation

Early on Kubernetes adopted glog logging library for logging. There was no
larger motivation for picking glog, as the Go ecosystem was in its infancy at
that time and there were no alternatives. As Kubernetes community needs grew
glog was not flexible enough, prompting creation of its fork klog. By forking we
inherited a lot of glog features that we never intended to support. Introduction
of alternative log formats like JSON created a conundrum, should we implement
all klog features for JSON? Most of them don't make sense and method for their
configuration leaves much to be desired. Klog features are controlled by set of
global flags that remain last bastion of global state in k/k repository. Those
flags don't have a single naming standard (some start with log prefix, some
not), don't comply to k8s flag naming (use underscore instead of hyphen) and
many other problems. We need to revisit how logging configuration is done in
klog, so it can work with alternative log formats and comply with current best
practices.

Lack of investment and growing number of klog features impacted project quality.
Klog has multiple problems, including:
* performance is much worse than alternatives, for example 7-8x than
[JSON format](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#logger-implementation-performance)
* doesn't support throughput to fulfill Kubernetes scalability requirements
[kubernetes/kubernetes#90804](https://github.com/kubernetes/kubernetes/pull/90804)
* complexity and confusion caused by maintaining backward compatibility for
legacy glog features and flags. For example
[kuberrnetes/klog#54](https://github.com/kubernetes/klog/issues/54)

Fixing all those issues would require big investment into logging, but would not
solve the underlying problem of having to maintain a logging library. We have
already seen cases like [kubernetes/kubernetes#90804](https://github.com/kubernetes/kubernetes/pull/90804)
where it's easier to reimplement a klog feature in external project than fixing
the problem in klog. To conclude, we should drive to reduce maintenance cost and
improve quality by narrowing scope of logging library.

As for what configuration options should be standardized for all logging formats
I would look into 12 factor app standard (https://12factor.net/). It defines
logs as steams of events and discourages applications from taking on
responsibility for log file management, log rotation and any other processing
that can be done externally. This is something that Kubernetes already
ehashman marked this conversation as resolved.
Show resolved Hide resolved
encourages by collecting stdout and stderr logs and making them available via
kubectl logs. It's somewhat confusing that K8s components don't comply to K8s
best practices.

### Goals

* Unblock development of alternative logging formats
* Narrow scope of logging with more opinionated approach and smaller set of features
* Reduce complexity of logging configuration and follow standard component configuration mechanism.

### Non-Goals

* Change klog output format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also don't want to remove support for these flags in klog yet, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "these flags in klog"?


## Proposal

I propose that Kubernetes core components (kube-apiserver, kube-scheduler,
kube-controller-manager, kubelet) should drop flags that extend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When and on what sort of timeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecate in 1.23 and remove in 1.26 to follow 3 release deprecation window.

logging over events streams or klog specific features. This change should be
scoped to only those components and not affect broader klog community.

With removal of output stream manipulation flags we need to provide a set of sane
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backwards compatibility, should we support a double set of flags during an overlapping period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There sill not be a double set of flags as we are just removing flags, not replacing them with alternatives.

defaults and convention that will prevent logging formats implementations to
diverge and reintroduce their own flags. As logs should be treated as event
streams I would propose that we separate two main streams "info" and "error"
based on log method called. As error logs should usually be treated with higher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backwards compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's why I'm proposing a multi release plan to make this change.

priority, having two streams prevents single pipeline from being clogged down
(for example [kubernetes/klog#209](https://github.com/kubernetes/klog/issues/209)). For
logging formats writing to standard streams, we should follow UNIX standard
of mapping "info" logs to stdout and "error" logs to stderr.

Flags that should be deprecated:

* --log-dir, --log-file, --log-flush-frequency - responsible for writing to
files and syncs to disk.
Motivation: Not critical as there are easy to set up alternatives like:
shell redirection, systemd service management or docker log driver. Removing
them reduces complexity and allows development of non-text loggers like one
writing to journal.
* --logtostderr, --alsologtostderr, --one-output, --stderrthreshold -
responsible enabling/disabling writing to stderr (vs file).
Motivation: Not needed if writing to files is removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will components log by default if this is removed? Will it be to stdout, stderr, both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default they log to stderr. Flags are very confusing, so here are some examples:

  • Default configuration: log only to stderr
  • If logging to files is enabled via --log-file, logging to stderr is disabled.
  • To write to both stderr and files you need both --log-file and --alsologtostderr.

* --log-file-max-size, --skip-log-headers - responsible configuration of file
rotation.
Motivation: Not needed if writing to files is removed.
ehashman marked this conversation as resolved.
Show resolved Hide resolved
* --add-dir-header, --skip-headers - klog format specific flags .
Motivation: don't apply to other log formats
* --log-backtrace-at - an legacy glog feature.
Motivation: No trace of anyone using this feature.

Flag deprecation should comply with standard k8s policy and require 3 releases before removal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah here we are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:P


This leaves that two flags that should be implemented by all log formats

* -v - control global log verbosity of Info logs
* --vmodule - control log verbosity of Info logs on per file level

Those flags were chosen as they have direct effect of which logs are written,
directly impacting log volume and component performance.

### User Stories

#### Writing logs to files

We should use go-runner as a official fallback for users that want to retain
writing logs to files. go-runner runs as parent process to components binary
reading it's stdout/stderr and is able to route them to files. go-runner is
already released as part of official K8s images it should be as simple as changing:

```
/usr/local/bin/kube-apiserver --log-file=/var/log/kube-apiserver.log
```

to

```
/go-runner --log-file=/var/log/kube-apiserver.log /usr/local/bin/kube-apiserver
```

### Caveats

Is it ok for K8s components to drop support for subset of klog flags?

Technically K8s already doesn't support klog flags. Klog flags are renamed to
comply with K8s flag naming convention (underscores are replaced with hyphens).
Full klog support was never promised to users and removal of those flags should
be treated as removal of any other flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fair.


Is it ok for K8s components to drop support writing to files?
Writing directly to files is an important feature still used by users, but this
doesn't directly necessitates direct support in components. By providing a
external solution like go-runner we can allow community to develop more advanced
features while maintaining high quality implementation within components.
Having more extendable solution developed externally should be more beneficial
to community when compared to forcing closed list of features on everyone.

### Risks and Mitigations

#### Users don't want to use go-runner as replacement.

There are multiple alternatives that allow users to redirect logs to a file.
Exact solution depends on users preferred way to run the process with one shared
property, all of them supports consuming stdout/stderr. For example shell
redirection, systemd service management or
[docker logging driver](https://docs.docker.com/config/containers/logging/configure/).
Not all of them support log rotation, but it's users responsibility to know
complementary tooling that provides it. For example tools like
[logrotate](https://linux.die.net/man/8/logrotate).

#### Log processing in parent process causes performance problems

Passing logs through a parent process is a normal linux pattern used by
systemd-run, docker or containerd. For kubernetes we already use go-runner in
scalability testing to read apiserver logs and write them to file. Before we
reach Beta we should conduct detailed throughput testing of go-runner to
validate upper limit, but we don't expect any performance problem just based on
architecture.

## Design Details

Splitting stdout from stderr would be a breaking change in both klog and
kubernetes components. To avoid that I propose to introduce new logging flag
`--logtostdout` in klog that will allow writing info logs to stdout. This flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All logs? What about error logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is meant to split stdout and stderr. It's name is complementary to "--logtostderr". If both are enabled, info will be send to stdout, error to stderr.

will be used avoid introducing breaking change in klog. For Kubernetes components
we would use this flag to start testing this change and delay enabling this flag
by default by one release when we will hit Beta. As any other klog flag it
should be deprecated when this effort hits GA.

### Test Plan

Go-runner is already used for scalability tests. We should ensure that we cover
all existing klog features.

### Graduation Criteria

#### Alpha

- Klog can be configured without registering flags
- Kubernetes logging configuration drops global state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no global logging state anymore, how will the current log calls like klog.InfoS be handled? That currently falls back to the globally configured klog.

I fully agree that we should get rid of that global state. I'd like to see a logger instance be passed into all functions which do logging, but there's no consensus on how to do that (attach to context vs. explicit parameter) and this KEP doesn't address this.

FWIW, I prefer the approach via context. Adding another parameter implies API breaks (client-go...) and other components already started using the context (logr, operator SDK), so if Kubernetes does the same, different code bases will become interoperable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about using context to replace klog global state, but for now we want to start with removing configuration (flags etc), . This point is about configuration, basically removing global klog flags and passing through config struct to register/validate and initialize logging (same as how other components are configured). At some point this configuration will be applied to global klog state, but at least we clean up configuration.

- Go-runner is feature complementary to klog flags planned for deprecation
- Projects in Kubernetes Org are migrated to go-runner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or systemd-run, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can propose alternatives like systemd-run to documentation, but k8s projects should pick a tool written in golang just for ease of maintenance.

- Add --logtostdout flag to klog disabled by default
- Use --logtostdout in kubernetes tests

#### Beta

- Go-runner project is well maintained and documented
- Documentation on migrating off klog flags is publicly available
- Kubernetes klog flags are marked as deprecated
- Split stdout and stderr logs in Kubernetes components by default

#### GA

- Kubernetes klog flags are removed

### Upgrade / Downgrade Strategy

N/A

### Version Skew Strategy

N/A

## Implementation History

- 20/06/2021 - Original proposal created in https://github.com/kubernetes/kubernetes/issues/99270
- 30/07/2021 - First KEP draft was created

## Drawbacks

Deprecating klog features outside klog might create confusion in community.
Large part of community doesn't know that klog was created from necessity and
is not the end goal for logging in Kubernetes. We should do due diligence to
let community know about our plans and their impact on external components
depending on klog.

## Alternatives

### Continue supporting all klog features
At some point we should migrate all logging
configuration to Options or Configuration. Doing so while supporting all klog
features makes their future removal much harder.

### Release klog 3.0 with removed features
Removal of those features cannot be done without whole k8s community instead of
just k8s core components
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do this as a follow-up eventually...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on problems with klog 2.0, I would prefer to accelerate migration to logr as it would make more sense strategically. We still would need some wrapper code to configure external library but not something worth our own library.

Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
title: Deprecate klog specific flags in Kubernetes components
kep-number: 2845
authors:
- "@serathius"
owning-sig: sig-instrumentation
participating-sigs:
- sig-arch
status: provisional
creation-date: 2021-07-30
reviewers:
- TBD
approvers:
- ehashman

see-also:
- "/keps/sig-instrumentation/1602-structured-logging"
replaces: []
stage: alpha
latest-milestone: "v1.23"
milestone:
alpha: "v1.23"
beta: "v1.24"
stable: "v1.25"

feature-gates: []
disable-supported: true
metrics: []