Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.25] Backports for 2023-10 release #8617

Merged
merged 21 commits into from
Oct 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
09a8ad8
Disable HTTP on main etcd client port
brandond Sep 22, 2023
2d4c27b
Don't ignore assets in home dir if system assets exist
brandond Sep 26, 2023
2863e53
Pass SystemdCgroup setting through to nvidia runtime options
brandond Sep 27, 2023
0b41b55
Bump containerd to v1.7.7-k3s1
brandond Oct 12, 2023
d9a8b6b
Bump busybox to v1.36.1
brandond Oct 12, 2023
4ff80ab
Add ADR for etcd snapshot CRD migration
brandond Jul 28, 2023
c3285ea
Minor updates as per design review discussion
brandond Aug 14, 2023
23ae04b
Add new CRD for etcd snapshots
brandond Sep 8, 2023
e25e83b
Move etcd snapshot code into separate file
brandond Sep 28, 2023
33741cd
Elide old snapshot data when apiserver rejects configmap with ErrRequ…
brandond Sep 29, 2023
8aa79be
Tidy s3 upload functions
brandond Sep 29, 2023
46160e3
Consistently set snapshotFile timestamp
brandond Sep 29, 2023
d2ea032
Move s3 snapshot list functionality to s3.go
brandond Sep 30, 2023
7ad9ab3
Store extra metadata and cluster ID for snapshots
brandond Oct 2, 2023
4b8a4a0
Sort snapshots by time and key in tabwriter output
brandond Oct 10, 2023
9219298
Move snapshot delete into local/s3 functions
brandond Oct 5, 2023
0271fa6
Switch to managing ETCDSnapshotFile resources
brandond Oct 3, 2023
e92935c
Add server token hash to CR and S3
brandond Oct 10, 2023
268f45b
Fix etcd snapshot integration tests
brandond Oct 10, 2023
8ecb3e2
Switch build target from main.go to a package. (#8342)
dlorenc Oct 12, 2023
8200982
Bump traefik, golang.org/x/net, google.golang.org/grpc
brandond Oct 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions cmd/k3s/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,16 +222,20 @@ func getAssetAndDir(dataDir string) (string, string) {
// extract checks for and if necessary unpacks the bindata archive, returning the unique path
// to the extracted bindata asset.
func extract(dataDir string) (string, error) {
// first look for global asset folder so we don't create a HOME version if not needed
_, dir := getAssetAndDir(datadir.DefaultDataDir)
// check if content already exists in requested data-dir
asset, dir := getAssetAndDir(dataDir)
if _, err := os.Stat(filepath.Join(dir, "bin", "k3s")); err == nil {
return dir, nil
}

asset, dir := getAssetAndDir(dataDir)
// check if target content already exists
if _, err := os.Stat(filepath.Join(dir, "bin", "k3s")); err == nil {
return dir, nil
// check if content exists in default path as a fallback, prior
// to extracting. This will prevent re-extracting into the user's home
// dir if the assets already exist in the default path.
if dataDir != datadir.DefaultDataDir {
_, defaultDir := getAssetAndDir(datadir.DefaultDataDir)
if _, err := os.Stat(filepath.Join(defaultDir, "bin", "k3s")); err == nil {
return defaultDir, nil
}
}

// acquire a data directory lock
Expand Down
60 changes: 60 additions & 0 deletions docs/adrs/etcd-snapshot-cr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Store etcd snapshot metadata in a Custom Resource

Date: 2023-07-27

## Status

Accepted

## Context

K3s currently stores a list of etcd snapshots and associated metadata in a ConfigMap. Other downstream
projects and controllers consume the content of this ConfigMap in order to present cluster administrators with
a list of snapshots that can be restored.

On clusters with more than a handful of nodes, and reasonable snapshot intervals and retention periods, the snapshot
list ConfigMap frequently reaches the maximum size allowed by Kubernetes, and fails to store any additional information.
The snapshots are still created, but they cannot be discovered by users or accessed by tools that consume information
from the ConfigMap.

When this occurs, the K3s service log shows errors such as:
```
level=error msg="failed to save local snapshot data to configmap: ConfigMap \"k3s-etcd-snapshots\" is invalid: []: Too long: must have at most 1048576 bytes"
```

A side-effect of this is that snapshot metadata is lost if the ConfigMap cannot be updated, as the list is the only place that it is stored.

Reference:
* https://github.com/rancher/rke2/issues/4495
* https://github.com/k3s-io/k3s/blob/36645e7311e9bdbbf2adb79ecd8bd68556bc86f6/pkg/etcd/etcd.go#L1503-L1516

### Existing Work

Rancher already has a `rke.cattle.io/v1 ETCDSnapshot` Custom Resource that contains the same information after it's been
imported by the management cluster:
* https://github.com/rancher/rancher/blob/027246f77f03b82660dc2e91df6bf2cd549163f0/pkg/apis/rke.cattle.io/v1/etcd.go#L48-L74

It is unlikely that we would want to use this custom resource in its current package; we may be able to negotiate moving
it into a neutral project for use by both projects.

## Decision

1. Instead of populating snapshots into a ConfigMap using the JSON serialization of the private `snapshotFile` type, K3s
will manage creation of an new Custom Resource Definition with similar fields.
2. Metadata on each snapshot will be stored in a distinct Custom Resource.
3. The new Custom Resource will be cluster-scoped, as etcd and its snapshots are a cluster-level resource.
4. Snapshot metadata will also be written alongside snapshot files created on disk and/or uploaded to S3. The metadata
files will have the same basename as their corresponding snapshot file.
5. A hash of the server token will be stored as an annotation on the Custom Resource, and stored as metadata on snapshots uploaded to S3.
This hash should be compared to a current etcd snapshot's token hash to determine if the server token must be rolled back as part of the
snapshot restore process.
6. Downstream consumers of etcd snapshot lists will migrate to watching Custom Resource types, instead of the ConfigMap.
7. K3s will observe a three minor version transition period, where both the new Custom Resources, and the existing
ConfigMap, will both be used.
8. During the transition period, older snapshot metadata may be removed from the ConfigMap while those snapshots still
exist and are referenced by new Custom Resources, if the ConfigMap exceeds a preset size or key count limit.

## Consequences

* Snapshot metadata will no longer be lost when the number of snapshots exceeds what can be stored in the ConfigMap.
* There will be some additional complexity in managing the new Custom Resource, and working with other projects to migrate to using it.
32 changes: 17 additions & 15 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,11 @@ replace (
go.opentelemetry.io/otel/trace => go.opentelemetry.io/otel/trace v0.20.0
go.opentelemetry.io/proto/otlp => go.opentelemetry.io/proto/otlp v0.7.0
golang.org/x/crypto => golang.org/x/crypto v0.0.0-20220315160706-3147a52a75dd
golang.org/x/net => golang.org/x/net v0.7.0
golang.org/x/net => golang.org/x/net v0.17.0
golang.org/x/sys => golang.org/x/sys v0.2.0
google.golang.org/api => google.golang.org/api v0.60.0
google.golang.org/genproto => google.golang.org/genproto v0.0.0-20220107163113-42d7afdf6368
google.golang.org/grpc => google.golang.org/grpc v1.40.0
google.golang.org/grpc => google.golang.org/grpc v1.58.3
gopkg.in/square/go-jose.v2 => gopkg.in/square/go-jose.v2 v2.2.2
k8s.io/api => github.com/k3s-io/kubernetes/staging/src/k8s.io/api v1.25.14-k3s1
k8s.io/apiextensions-apiserver => github.com/k3s-io/kubernetes/staging/src/k8s.io/apiextensions-apiserver v1.25.14-k3s1
Expand Down Expand Up @@ -139,10 +140,10 @@ require (
go.etcd.io/etcd/etcdutl/v3 v3.5.4
go.etcd.io/etcd/server/v3 v3.5.9
go.uber.org/zap v1.24.0
golang.org/x/crypto v0.10.0
golang.org/x/crypto v0.14.0
golang.org/x/net v0.14.0
golang.org/x/sync v0.2.0
golang.org/x/sys v0.11.0
golang.org/x/sync v0.3.0
golang.org/x/sys v0.13.0
google.golang.org/grpc v1.57.0
gopkg.in/yaml.v2 v2.4.0
inet.af/tcpproxy v0.0.0-20200125044825-b6bb9b5b8252
Expand All @@ -164,7 +165,8 @@ require (
)

require (
cloud.google.com/go v0.97.0 // indirect
cloud.google.com/go/compute v1.21.0 // indirect
cloud.google.com/go/compute/metadata v0.2.3 // indirect
github.com/Azure/azure-sdk-for-go v55.0.0+incompatible // indirect
github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
github.com/Azure/go-autorest v14.2.0+incompatible // indirect
Expand Down Expand Up @@ -262,7 +264,7 @@ require (
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/pprof v0.0.0-20230323073829-e72429f035bd // indirect
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
github.com/googleapis/gax-go/v2 v2.1.1 // indirect
github.com/googleapis/gax-go/v2 v2.11.0 // indirect
github.com/gophercloud/gophercloud v0.1.0 // indirect
github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7 // indirect
github.com/grpc-ecosystem/go-grpc-middleware v1.3.0 // indirect
Expand Down Expand Up @@ -332,7 +334,7 @@ require (
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/pquerna/cachecontrol v0.1.0 // indirect
github.com/prometheus/client_golang v1.16.0 // indirect
github.com/prometheus/client_model v0.3.0 // indirect
github.com/prometheus/client_model v0.4.0 // indirect
github.com/prometheus/common v0.42.0 // indirect
github.com/prometheus/procfs v0.10.1 // indirect
github.com/rs/xid v1.4.0 // indirect
Expand Down Expand Up @@ -377,18 +379,18 @@ require (
go.starlark.net v0.0.0-20200306205701-8dd3e2ee1dd5 // indirect
go.uber.org/atomic v1.10.0 // indirect
go.uber.org/multierr v1.9.0 // indirect
golang.org/x/mod v0.10.0 // indirect
golang.org/x/oauth2 v0.7.0 // indirect
golang.org/x/term v0.11.0 // indirect
golang.org/x/text v0.12.0 // indirect
golang.org/x/mod v0.11.0 // indirect
golang.org/x/oauth2 v0.10.0 // indirect
golang.org/x/term v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/time v0.3.0 // indirect
golang.org/x/tools v0.9.3 // indirect
golang.org/x/tools v0.10.0 // indirect
golang.zx2c4.com/wireguard v0.0.0-20230325221338-052af4a8072b // indirect
golang.zx2c4.com/wireguard/wgctrl v0.0.0-20230429144221-925a1e7659e6 // indirect
gonum.org/v1/gonum v0.6.2 // indirect
google.golang.org/api v0.60.0 // indirect
google.golang.org/api v0.126.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20230306155012-7f2fa6fef1f4 // indirect
google.golang.org/genproto v0.0.0-20230711160842-782d3b101e98 // indirect
google.golang.org/protobuf v1.31.0 // indirect
gopkg.in/gcfg.v1 v1.2.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
Expand Down
Loading
Loading