Skip to content

Latest commit

 

History

History
390 lines (350 loc) · 48.8 KB

CHANGELOG-3.4.md

File metadata and controls

390 lines (350 loc) · 48.8 KB

Previous change logs can be found at CHANGELOG-3.3.

v3.4.0 (TBD 2018-09)

See code changes and v3.4 upgrade guide for any breaking changes. Again, before running upgrades from any previous release, please make sure to read change logs below and v3.4 upgrade guide.

Improved

Breaking Changes

  • Make ETCDCTL_API=3 etcdctl default.
    • Now, etcdctl set foo bar must be ETCDCTL_API=2 etcdctl set foo bar.
    • Now, ETCDCTL_API=3 etcdctl put foo bar could be just etcdctl put foo bar.
  • Remove etcd --ca-file flag, instead use etcd --trusted-ca-file (etcd --ca-file flag has been marked deprecated since v2.1).
  • Remove etcd --peer-ca-file flag, instead use etcd --peer-trusted-ca-file (etcd --peer-ca-file flag has been marked deprecated since v2.1).
  • Remove pkg/transport.TLSInfo.CAFile field, instead use pkg/transport.TLSInfo.TrustedCAFile (CAFile field has been marked deprecated since v2.1).
  • Deprecate latest release container tag.
    • docker pull gcr.io/etcd-development/etcd:latest would not be up-to-date.
  • Deprecate minor version release container tags.
    • docker pull gcr.io/etcd-development/etcd:v3.3 would still work.
    • docker pull gcr.io/etcd-development/etcd:v3.4 would not work.
    • Use docker pull gcr.io/etcd-development/etcd:v3.4.x instead, with the exact patch version.
  • Drop ACIs from official release.
  • Exit on empty hosts in advertise URLs.
  • Exit on shadowed environment variables.
    • Address error on shadowed environment variables.
    • e.g. exit with error on ETCD_NAME=abc etcd --name=def.
    • e.g. exit with error on ETCD_INITIAL_CLUSTER_TOKEN=abc etcd --initial-cluster-token=def.
    • e.g. exit with error on ETCDCTL_ENDPOINTS=abc.com ETCDCTL_API=3 etcdctl endpoint health --endpoints=def.com.
  • Change etcdserverpb.AuthRoleRevokePermissionRequest/key,range_end fields type from string to bytes.
  • Rename etcd_debugging_mvcc_db_total_size_in_bytes Prometheus metric to etcd_mvcc_db_total_size_in_bytes.
  • Rename etcdserver.ServerConfig.SnapCount field to etcdserver.ServerConfig.SnapshotCount, to be consistent with the flag name etcd --snapshot-count.
  • Rename embed.Config.SnapCount field to embed.Config.SnapshotCount, to be consistent with the flag name etcd --snapshot-count.
  • Change embed.Config.CorsInfo in *cors.CORSInfo type to embed.Config.CORS in map[string]struct{} type.
  • Remove embed.Config.SetupLogging.
  • Rename etcd --log-output to etcd --log-outputs to support multiple log outputs.
    • etcd --log-output will be deprecated in v3.5.
  • Rename embed.Config.LogOutput to embed.Config.LogOutputs to support multiple log outputs.
  • Change embed.Config.LogOutputs type from string to []string to support multiple log outputs.
    • Now that etcd --log-outputs accepts multiple writers, etcd configuration YAML file log-outputs field must be changed to []string type.
    • Previously, etcd --config-file etcd.config.yaml can have log-outputs: default field, now must be log-outputs: [default].
  • Change v3 etcdctl snapshot exit codes with snapshot package.
    • Exit on error with exit code 1 (no more exit code 5 or 6 on snapshot save/restore commands).
  • Migrate dependency management tool from glide to golang/dep.
    • <= 3.3 puts vendor directory under cmd/vendor directory to prevent conflicting transitive dependencies.
    • 3.4 moves cmd/vendor directory to vendor at repository root.
    • Remove recursive symlinks in cmd directory.
    • Now go get/install/build on etcd packages (e.g. clientv3, tools/benchmark) enforce builds with etcd vendor directory.
  • Replace gRPC gateway endpoint /v3beta with /v3.
    • Deprecated /v3alpha.
    • To deprecate /v3beta in v3.5.
    • In v3.4, curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}' still works as a fallback to curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}', but curl -L http://localhost:2379/v3beta/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}' won't work in v3.5. Use curl -L http://localhost:2379/v3/kv/put -X POST -d '{"key": "Zm9v", "value": "YmFy"}' instead.
  • Change wal package function signatures to support structured logger and logging to file in server-side.
    • Previously, Open(dirpath string, snap walpb.Snapshot) (*WAL, error), now Open(lg *zap.Logger, dirpath string, snap walpb.Snapshot) (*WAL, error).
    • Previously, OpenForRead(dirpath string, snap walpb.Snapshot) (*WAL, error), now OpenForRead(lg *zap.Logger, dirpath string, snap walpb.Snapshot) (*WAL, error).
    • Previously, Repair(dirpath string) bool, now Repair(lg *zap.Logger, dirpath string) bool.
    • Previously, Create(dirpath string, metadata []byte) (*WAL, error), now Create(lg *zap.Logger, dirpath string, metadata []byte) (*WAL, error).
  • Remove pkg/cors package.
  • Change etcd --experimental-enable-v2v3 flag to etcd --enable-v2v3; v2 storage emulation is now stable.
  • Move internal packages to etcdserver.
    • "github.com/coreos/etcd/alarm" to "github.com/coreos/etcd/etcdserver/api/v3alarm".
    • "github.com/coreos/etcd/compactor" to "github.com/coreos/etcd/etcdserver/api/v3compactor".
    • "github.com/coreos/etcd/discovery" to "github.com/coreos/etcd/etcdserver/api/v2discovery".
    • "github.com/coreos/etcd/etcdserver/auth" to "github.com/coreos/etcd/etcdserver/api/v2auth".
    • "github.com/coreos/etcd/etcdserver/membership" to "github.com/coreos/etcd/etcdserver/api/membership".
    • "github.com/coreos/etcd/etcdserver/stats" to "github.com/coreos/etcd/etcdserver/api/v2stats".
    • "github.com/coreos/etcd/error" to "github.com/coreos/etcd/etcdserver/api/v2error".
    • "github.com/coreos/etcd/rafthttp" to "github.com/coreos/etcd/etcdserver/api/rafthttp".
    • "github.com/coreos/etcd/snap" to "github.com/coreos/etcd/etcdserver/api/snap".
    • "github.com/coreos/etcd/store" to "github.com/coreos/etcd/etcdserver/api/v2store".

Dependency

Metrics, Monitoring

Security, Authentication

See security doc for more details.

  • Support TLS cipher suite whitelisting.
  • Add etcd --host-whitelist flag, etcdserver.Config.HostWhitelist, and embed.Config.HostWhitelist, to prevent "DNS Rebinding" attack.
    • Any website can simply create an authorized DNS name, and direct DNS to "localhost" (or any other address). Then, all HTTP endpoints of etcd server listening on "localhost" becomes accessible, thus vulnerable to DNS rebinding attacks (CVE-2018-5702).
    • Client origin enforce policy works as follow:
      • If client connection is secure via HTTPS, allow any hostnames..
      • If client connection is not secure and "HostWhitelist" is not empty, only allow HTTP requests whose Host field is listed in whitelist.
    • By default, "HostWhitelist" is "*", which means insecure server allows all client HTTP requests.
    • Note that the client origin policy is enforced whether authentication is enabled or not, for tighter controls.
    • When specifying hostnames, loopback addresses are not added automatically. To allow loopback interfaces, add them to whitelist manually (e.g. "localhost", "127.0.0.1", etc.).
    • e.g. etcd --host-whitelist example.com, then the server will reject all HTTP requests whose Host field is not example.com (also rejects requests to "localhost").
  • Support etcd --cors in v3 HTTP requests (gRPC gateway).
  • Support ttl field for etcd Authentication JWT token.
    • e.g. etcd --auth-token jwt,pub-key=<pub key path>,priv-key=<priv key path>,sign-method=<sign method>,ttl=5m.
  • Allow empty token provider in etcdserver.ServerConfig.AuthToken.
  • Fix TLS reload when certificate SAN field only includes IP addresses but no domain names.
    • In Go, server calls (*tls.Config).GetCertificate for TLS reload if and only if server's (*tls.Config).Certificates field is not empty, or (*tls.ClientHelloInfo).ServerName is not empty with a valid SNI from the client. Previously, etcd always populates (*tls.Config).Certificates on the initial client TLS handshake, as non-empty. Thus, client was always expected to supply a matching SNI in order to pass the TLS verification and to trigger (*tls.Config).GetCertificate to reload TLS assets.
    • However, a certificate whose SAN field does not include any domain names but only IP addresses would request *tls.ClientHelloInfo with an empty ServerName field, thus failing to trigger the TLS reload on initial TLS handshake; this becomes a problem when expired certificates need to be replaced online.
    • Now, (*tls.Config).Certificates is created empty on initial TLS client handshake, first to trigger (*tls.Config).GetCertificate, and then to populate rest of the certificates on every new TLS connection, even when client SNI is empty (e.g. cert only includes IPs).

etcd server

  • Add etcd --initial-election-tick-advance flag to configure initial election tick fast-forward.
    • By default, etcd --initial-election-tick-advance=true, then local member fast-forwards election ticks to speed up "initial" leader election trigger.
    • This benefits the case of larger election ticks. For instance, cross datacenter deployment may require longer election timeout of 10-second. If true, local node does not need wait up to 10-second. Instead, forwards its election ticks to 8-second, and have only 2-second left before leader election.
    • Major assumptions are that: cluster has no active leader thus advancing ticks enables faster leader election. Or cluster already has an established leader, and rejoining follower is likely to receive heartbeats from the leader after tick advance and before election timeout.
    • However, when network from leader to rejoining follower is congested, and the follower does not receive leader heartbeat within left election ticks, disruptive election has to happen thus affecting cluster availabilities.
    • Now, this can be disabled by setting etcd --initial-election-tick-advance=false.
    • Disabling this would slow down initial bootstrap process for cross datacenter deployments. Make tradeoffs by configuring etcd --initial-election-tick-advance at the cost of slow initial bootstrap.
    • If single-node, it advances ticks regardless.
    • Address disruptive rejoining follower node.
  • Add etcd --pre-vote flag to enable to run an additional Raft election phase.
    • For instance, a flaky(or rejoining) member may drop in and out, and start campaign. This member will end up with a higher term, and ignore all incoming messages with lower term. In this case, a new leader eventually need to get elected, thus disruptive to cluster availability. Raft implements Pre-Vote phase to prevent this kind of disruptions. If enabled, Raft runs an additional phase of election to check if pre-candidate can get enough votes to win an election.
    • etcd --pre-vote=false by default.
    • v3.5 will enable etcd --pre-vote=true by default.
  • etcd --initial-corrupt-check flag is now stable (etcd --experimental-initial-corrupt-checkhaisbeen deprecated).
    • etcd --initial-corrupt-check=true by default, to check cluster database hashes before serving client/peer traffic.
  • etcd --corrupt-check-time flag is now stable (etcd --experimental-corrupt-check-timehaisbeen deprecated).
    • etcd --corrupt-check-time=12h by default, to check cluster database hashes for every 12-hour.
  • etcd --enable-v2v3 flag is now stable.
    • etcd --experimental-enable-v2v3 has been deprecated.
    • Added more v2v3 integration tests.
    • etcd --enable-v2=true --enable-v2v3='' by default, to enable v2 API server that is backed by v2 store.
    • etcd --enable-v2=true --enable-v2v3=/aaa to enable v2 API server that is backed by v3 storage.
    • etcd --enable-v2=false --enable-v2v3='' to disable v2 API server.
    • etcd --enable-v2=false --enable-v2v3=/aaa to disable v2 API server. TODO: error?
    • Automatically create parent directory if it does not exist (fix issue#9609).
    • v4.0 will configure etcd --enable-v2=true --enable-v2v3=/aaa to enable v2 API server that is backed by v3 storage.
  • Add etcd --discovery-srv-name flag to support custom DNS SRV name with discovery.
    • If not given, etcd queries _etcd-server-ssl._tcp.[YOUR_HOST] and _etcd-server._tcp.[YOUR_HOST].
    • If etcd --discovery-srv-name="foo", then query _etcd-server-ssl-foo._tcp.[YOUR_HOST] and _etcd-server-foo._tcp.[YOUR_HOST].
    • Useful for operating multiple etcd clusters under the same domain.
  • Support TLS cipher suite whitelisting.
  • Support etcd --cors in v3 HTTP requests (gRPC gateway).
  • Rename etcd --log-output to etcd --log-outputs to support multiple log outputs.
    • etcd --log-output will be deprecated in v3.5.
  • Add etcd --logger flag to support structured logger and multiple log outputs in server-side.
    • etcd --logger=capnslog will be deprecated in v3.5.
    • Main motivation is to promote automated etcd monitoring, rather than looking back server logs when it starts breaking. Future development will make etcd log as few as possible, and make etcd easier to monitor with metrics and alerts.
    • etcd --logger=capnslog --log-outputs=default is the default setting and same as previous etcd server logging format.
    • etcd --logger=zap --log-outputs=default is not supported when etcd --logger=zap.
      • Instead, use etcd --logger=zap --log-outputs=stderr.
      • Or, use etcd --logger=zap --log-outputs=systemd/journal to send logs to the local systemd journal.
      • Previously, if etcd parent process ID (PPID) is 1 (e.g. run with systemd), etcd --logger=capnslog --log-outputs=default redirects server logs to local systemd journal. And if write to journald fails, it writes to os.Stderr as a fallback.
      • However, even with PPID 1, it can fail to dial systemd journal (e.g. run embedded etcd with Docker container). Then, every single log write will fail and fall back to os.Stderr, which is inefficient.
      • To avoid this problem, systemd journal logging must be configured manually.
    • etcd --logger=zap --log-outputs=stderr will log server operations in JSON-encoded format and writes logs to os.Stderr. Use this to override journald log redirects.
    • etcd --logger=zap --log-outputs=stdout will log server operations in JSON-encoded format and writes logs to os.Stdout Use this to override journald log redirects.
    • etcd --logger=zap --log-outputs=a.log will log server operations in JSON-encoded format and writes logs to the specified file a.log.
    • etcd --logger=zap --log-outputs=a.log,b.log,c.log,stdout writes server logs to multiple files a.log, b.log and c.log at the same time and outputs to os.Stderr, in JSON-encoded format.
    • etcd --logger=zap --log-outputs=/dev/null will discard all server logs.
  • Fix mvcc "unsynced" watcher restore operation.
    • "unsynced" watcher is watcher that needs to be in sync with events that have happened.
    • That is, "unsynced" watcher is the slow watcher that was requested on old revision.
    • "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
    • Which possibly causes missing events from "unsynced" watchers.
    • A node gets network partitioned with a watcher on a future revision, and falls behind receiving a leader snapshot after partition gets removed. When applying this snapshot, etcd watch storage moves current synced watchers to unsynced since sync watchers might have become stale during network partition. And reset synced watcher group to restart watcher routines. Previously, there was a bug when moving from synced watcher group to unsynced, thus client would miss events when the watcher was requested to the network-partitioned node.
  • Fix mvcc server panic from restore operation.
    • Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, etcd server panicked during snapshot restore operation.
    • Now, this server-side panic has been fixed.
  • Fix server panic on invalid Election Proclaim/Resign HTTP(S) requests.
    • Previously, wrong-formatted HTTP requests to Election API could trigger panic in etcd server.
    • e.g. curl -L http://localhost:2379/v3/election/proclaim -X POST -d '{"value":""}', curl -L http://localhost:2379/v3/election/resign -X POST -d '{"value":""}'.
  • Fix revision-based compaction retention parsing.
    • Previously, etcd --auto-compaction-mode revision --auto-compaction-retention 1 was translated to revision retention 3600000000000.
    • Now, etcd --auto-compaction-mode revision --auto-compaction-retention 1 is correctly parsed as revision retention 1.
  • Prevent overflow by large TTL values for Lease Grant.
    • TTL parameter to Grant request is unit of second.
    • Leases with too large TTL values exceeding math.MaxInt64 expire in unexpected ways.
    • Server now returns rpctypes.ErrLeaseTTLTooLarge to client, when the requested TTL is larger than 9,000,000,000 seconds (which is >285 years).
    • Again, etcd Lease is meant for short-periodic keepalives or sessions, in the range of seconds or minutes. Not for hours or days!
  • Enable etcd server raft.Config.CheckQuorum when starting with ForceNewCluster.
  • Allow non-WAL files in etcd --wal-dir directory.
    • Previously, existing files such as lost+found in WAL directory prevent etcd server boot.
    • Now, WAL directory that contains only lost+found or a file that's not suffixed with .wal is considered non-initialized.

API

  • Add snapshot package for snapshot restore/save operations (see godoc.org/github.com/etcd/clientv3/snapshot for more).
  • Add watch_id field to etcdserverpb.WatchCreateRequest to allow user-provided watch ID to mvcc.
    • Corresponding watch_id is returned via etcdserverpb.WatchResponse, if any.
  • Add fragment field to etcdserverpb.WatchCreateRequest to request etcd server to split watch events when the total size of events exceeds etcd --max-request-bytes flag value plus gRPC-overhead 512 bytes.
    • The default server-side request bytes limit is embed.DefaultMaxRequestBytes which is 1.5 MiB plus gRPC-overhead 512 bytes.
    • If watch response events exceed this server-side request limit and watch request is created with fragment field true, the server will split watch events into a set of chunks, each of which is a subset of watch events below server-side request limit.
    • Useful when client-side has limited bandwidths.
    • For example, watch response contains 10 events, where each event is 1 MiB. And server etcd --max-request-bytes flag value is 1 MiB. Then, server will send 10 separate fragmented events to the client.
    • For example, watch response contains 5 events, where each event is 2 MiB. And server etcd --max-request-bytes flag value is 1 MiB and clientv3.Config.MaxCallRecvMsgSize is 1 MiB. Then, server will try to send 5 separate fragmented events to the client, and the client will error with "code = ResourceExhausted desc = grpc: received message larger than max (...)".
    • Client must implement fragmented watch event merge (which clientv3 does in etcd v3.4).
  • Add raftAppliedIndex field to etcdserverpb.StatusResponse for current Raft applied index.
  • Add errors field to etcdserverpb.StatusResponse for server-side error.
    • e.g. "etcdserver: no leader", "NOSPACE", "CORRUPT"
  • Add dbSizeInUse field to etcdserverpb.StatusResponse for actual DB size after compaction.

Note: v3.5 will deprecate etcd --log-package-levels flag for capnslog; etcd --logger=zap --log-outputs=stderr will the default. v3.5 will deprecate [CLIENT-URL]/config/local/log endpoint.

Package embed

Package integration

client v3

  • Add WithFragment OpOption to support watch events fragmentation when the total size of events exceeds etcd --max-request-bytes flag value plus gRPC-overhead 512 bytes.
    • Watch fragmentation is disabled by default.
    • The default server-side request bytes limit is embed.DefaultMaxRequestBytes which is 1.5 MiB plus gRPC-overhead 512 bytes.
    • If watch response events exceed this server-side request limit and watch request is created with fragment field true, the server will split watch events into a set of chunks, each of which is a subset of watch events below server-side request limit.
    • Useful when client-side has limited bandwidths.
    • For example, watch response contains 10 events, where each event is 1 MiB. And server etcd --max-request-bytes flag value is 1 MiB. Then, server will send 10 separate fragmented events to the client.
    • For example, watch response contains 5 events, where each event is 2 MiB. And server etcd --max-request-bytes flag value is 1 MiB and clientv3.Config.MaxCallRecvMsgSize is 1 MiB. Then, server will try to send 5 separate fragmented events to the client, and the client will error with "code = ResourceExhausted desc = grpc: received message larger than max (...)".

etcdctl v3

gRPC proxy

  • Fix etcd server panic from restore operation.
    • Let's assume that a watcher had been requested with a future revision X and sent to node A that became network-partitioned thereafter. Meanwhile, cluster makes progress. Then when the partition gets removed, the leader sends a snapshot to node A. Previously if the snapshot's latest revision is still lower than the watch revision X, etcd server panicked during snapshot restore operation.
    • Especially, gRPC proxy was affected, since it detects a leader loss with a key "proxy-namespace__lostleader" and a watch revision "int64(math.MaxInt64 - 2)".
    • Now, this server-side panic has been fixed.

gRPC gateway

Package raft

Tooling

Go