Skip to content

Commit

Permalink
Merge pull request #1100 from johnbelamaric/coredns
Browse files Browse the repository at this point in the history
Automatic merge from submit-queue.

Add coredns proposal
  • Loading branch information
Kubernetes Submit Queue authored Oct 23, 2017
2 parents e579a0e + 7d3bbce commit 7c42510
Showing 1 changed file with 223 additions and 0 deletions.
223 changes: 223 additions & 0 deletions contributors/design-proposals/network/coredns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# Add CoreDNS for DNS-based Service Discovery

Status: Pending

Version: Alpha

Implementation Owner: @johnbelamaric

## Motivation

CoreDNS is another CNCF project and is the successor to SkyDNS, which kube-dns is based on. It is a flexible, extensible
authoritative DNS server and directly integrates with the Kubernetes API. It can serve as cluster DNS,
complying with the [dns spec](https://github.com/kubernetes/dns/blob/master/docs/specification.md).

CoreDNS has fewer moving parts than kube-dns, since it is a single executable and single process. It is written in Go so
it is memory-safe (kube-dns includes dnsmasq which is not). It supports a number of use cases that kube-dns does not
(see below). As a general-purpose authoritative DNS server it has a lot of functionality that kube-dns could not reasonably
be expected to add. See, for example, the [intro](https://docs.google.com/presentation/d/1v6Coq1JRlqZ8rQ6bv0Tg0usSictmnN9U80g8WKxiOjQ/edit#slide=id.g249092e088_0_181) or [coredns.io](https://coredns.io) or the [CNCF webinar](https://youtu.be/dz9S7R8r5gw).

## Proposal

The proposed solution is to enable the selection of CoreDNS as an alternate to Kube-DNS during cluster deployment, with the
intent to make it the default in the future.

## User Experience

### Use Cases

* Standard DNS-based service discovery
* Federation records
* Stub domain support
* Adding custom DNS entries
* Making an alias for an external name [#39792](https://github.com/kubernetes/kubernetes/issues/39792)
* Dynamically adding services to another domain, without running another server [#55](https://github.com/kubernetes/dns/issues/55)
* Adding an arbitrary entry inside the cluster domain (for example TXT entries [#38](https://github.com/kubernetes/dns/issues/38))
* Verified pod DNS entries (ensure pod exists in specified namespace)
* Experimental server-side search path to address latency issues [#33554](https://github.com/kubernetes/kubernetes/issues/33554)
* Limit PTR replies to the cluster CIDR [#125](https://github.com/kubernetes/dns/issues/125)
* Serve DNS for selected namespaces [#132](https://github.com/kubernetes/dns/issues/132)
* Serve DNS based on a label selector
* Support for wildcard queries (e.g., `*.namespace.svc.cluster.local` returns all services in `namespace`)

By default, the user experience would be unchanged. For more advanced uses, existing users would need to modify the
ConfigMap that contains the CoreDNS configuration file.

### Configuring CoreDNS

The CoreDNS configuration file is called a `Corefile` and syntactically is the same as a
[Caddyfile](https://caddyserver.com/docs/caddyfile). The file consists of multiple stanzas called _server blocks_.
Each of these represents a set of zones for which that server block should respond, along with the list
of plugins to apply to a given request. More details on this can be found in the
[Corefile Explained](https://coredns.io/2017/07/23/corefile-explained/) and
[How Queries Are Processed](https://coredns.io/2017/06/08/how-queries-are-processed-in-coredns/) blog
entries.

### Configuration for Standard Kubernetes DNS

The intent is to make configuration as simple as possible. The following Corefile will behave according
to the spec, except that it will not respond to Pod queries. It assumes the cluster domain is `cluster.local`
and the cluster CIDRs are all within 10.0.0.0/8.

```
. {
errors
log
cache 30
health
prometheus
kubernetes 10.0.0.0/8 cluster.local
proxy . /etc/resolv.conf
}
```

The `.` means that queries for the root zone (`.`) and below should be handled by this server block. Each
of the lines within `{ }` represent individual plugins:

* `errors` enables [error logging](https://coredns.io/plugins/errors)
* `log` enables [query logging](https://coredns.io/plugins/log/)
* `cache 30` enables [caching](https://coredns.io/plugins/cache/) of positive and negative responses for 30 seconds
* `health` opens an HTTP port to allow [health checks](https://coredns.io/plugins/health) from Kubernetes
* `prometheus` enables Prometheus [metrics](https://coredns.io/plugins/metrics)
* `kubernetes 10.0.0.0/8 cluster.local` connects to the Kubernetes API and [serves records](https://coredns.io/plugins/kubernetes/) for the `cluster.local` domain and reverse DNS for 10.0.0.0/8 per the [spec](https://github.com/kubernetes/dns/blob/master/docs/specification.md)
* `proxy . /etc/resolv.conf` [forwards](https://coredns.io/plugins/proxy) any queries not handled by other plugins (the `.` means the root domain) to the nameservers configured in `/etc/resolv.conf`

### Configuring Stub Domains

To configure stub domains, you add additional server blocks for those domains:

```
example.com {
proxy example.com 8.8.8.8:53
}
. {
errors
log
cache 30
health
prometheus
kubernetes 10.0.0.0/8 cluster.local
proxy . /etc/resolv.conf
}
```

### Configuring Federation

Federation is implemented as a separate plugin. You simply list the federation names and
their corresponding domains.

```
. {
errors
log
cache 30
health
prometheus
kubernetes 10.0.0.0/8 cluster.local
federation cluster.local {
east east.example.com
west west.example.com
}
proxy . /etc/resolv.conf
}
```

### Reverse DNS

Reverse DNS is supported for Services and Endpoints. It is not for Pods.

You have to configure the reverse zone to make it work. That means knowing the service CIDR and configuring that
ahead of time (until [#25533](https://github.com/kubernetes/kubernetes/issues/25533) is implemented).

Since reverse DNS zones are on classful boundaries, if you have a classless CIDR for your service CIDR
(say, a /12), then you have to widen that to the containing classful network. That leaves a subset of that network
open to the spoofing described in [#125](https://github.com/kubernetes/dns/issues/125); this is to be fixed
in [#1074](https://github.com/coredns/coredns/issues/1074).

PTR spoofing by manual endpoints
([#124](https://github.com/kubernetes/dns/issues/124)) would
still be an issue even with [#1074](https://github.com/coredns/coredns/issues/1074) solved (as it is in kube-dns). This could be resolved in the case
where `pods verified` is enabled but that is not done at this time.

### Deployment and Operations

Typically when deployed for cluster DNS, CoreDNS is managed by a Deployment. The
CoreDNS pod only contains a single container, as opposed to kube-dns which requires three
containers. This simplifies troubleshooting.

The Kubernetes integration is stateless and so multiple pods may be run. Each pod will have its
own connection to the API server. If you (like OpenShift) run a DNS pod for each node, you should not enable
`pods verified` as that could put a high load on the API server. Instead, if you wish to support
that functionality, you can run another central deployment and configure the per-node
instances to proxy `pod.cluster.local` to the central deployment.

All logging is to standard out, and may be disabled if
desired. In very high queries-per-second environments, it is advisable to disable query logging to
avoid I/O for every query.

CoreDNS can be configured to provide an HTTP health check endpoint, so that it can be monitored
by a standard Kubernetes HTTP health check. Readiness checks are not currently supported but
are in the works (see [#588](https://github.com/coredns/coredns/issues/588)). For Kubernetes, a
CoreDNS instance will be considered ready when it has finished syncing with the API.

CoreDNS performance metrics can be published for Prometheus.

When a change is made to the Corefile, you can send each CoreDNS instance a SIGUSR1, which will
trigger a graceful reload of the Corefile.

### Performance and Resource Load

The performance test was done in GCE with the following components:

* CoreDNS system with machine type : n1-standard-1 ( 1 CPU, 2.3 GHz Intel Xeon E5 v3 (Haswell))
* Client system with machine type: n1-standard-1 ( 1 CPU, 2.3 GHz Intel Xeon E5 v3 (Haswell))
* Kubemark Cluster with 5000 nodes

CoreDNS and client are running out-of-cluster (due to it being a Kubemark cluster).

The following is the summary of the performance of CoreDNS. CoreDNS cache was disabled.

Services (with 1% change per minute\*) | Max QPS\*\* | Latency (Median) | CoreDNS memory (at max QPS) | CoreDNS CPU (at max QPS) |
------------ | ------------- | -------------- | --------------------- | ----------------- |
1,000 | 18,000 | 0.1 ms | 38 MB | 95 % |
5,000 | 16,000 | 0.1 ms | 73 MB | 93 % |
10,000 | 10,000 | 0.1 ms | 115 MB | 78 % |

\* We simulated service change load by creating and destroying 1% of services per minute.

\** Max QPS with < 1 % packet loss

## Implementation

Each distribution project (kubeadm, minikube, kubespray, and others) will implement CoreDNS as an optional
add-on as appropriate for that project.

### Client/Server Backwards/Forwards compatibility

No changes to other components are needed.

The method for configuring the DNS server will change. Thus, in cases where users have customized
the DNS configuration, they will need to modify their configuration if they move to CoreDNS.
For example, if users have configured stub domains, they would need to modify that configuration.

When serving SRV requests for headless services, some responses are different from kube-dns, though still within
the specification (see [#975](https://github.com/coredns/coredns/issues/975)). In summary, these are:

* kube-dns uses endpoint names that have an opaque identifier. CoreDNS instead uses the pod IP with dashes.
* kube-dns returns a bogus SRV record with port = 0 when no SRV prefix is present in the query.
coredns returns all SRV record for the service (see also [#140](https://github.com/kubernetes/dns/issues/140))

Additionally, federation may return records in a slightly different manner (see [#1034](https://github.com/coredns/coredns/issues/1034)),
though this may be changed prior to completing this proposal.

In the plan for the Alpha, there will be no automated conversion of the kube-dns configuration. However, as
part of the Beta, code will be provided that will produce a proper Corefile based upon the existing kube-dns
configuration.

## Alternatives considered

Maintain existing kube-dns, add functionality to meet the currently unmet use cases above, and fix underlying issues.
Ensuring the use of memory-safe code would require replacing dnsmasq with another (memory-safe) caching DNS server,
or implementing caching within kube-dns.

0 comments on commit 7c42510

Please sign in to comment.