Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP for CoreDNS to GA. #1956

Merged
merged 1 commit into from
May 31, 2018
Merged

Conversation

rajansandeep
Copy link
Contributor

@rajansandeep rajansandeep commented Mar 21, 2018

This KEP is defined to graduate CoreDNS to GA.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 21, 2018
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 21, 2018
@rajansandeep
Copy link
Contributor Author

/sig network

@k8s-ci-robot k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Mar 21, 2018
@k8s-github-robot k8s-github-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Mar 21, 2018

## Motivation

* CoreDNS is more flexible and extensible than what kube-dns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajansandeep let's not indent these, we don't want it showing up as pre-formatted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/than what/than/

@fturib
Copy link

fturib commented Mar 21, 2018

I think you need to add the file that hold the NEXT-KEP-NUMBER.
increment the number, and add the file in this PR. (to be sure no one else will take the number)

@fturib
Copy link

fturib commented Mar 22, 2018

/review @bowei

Copy link
Member

@stp-ip stp-ip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreDNS for GA \o/

### Non-Goals

* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Retaining the ConfigMap of a custom/modified image of CoreDNS during upgrade. That is, if the user has modified the Corefile in the ConfigMap, those changes will not be retained (the user should back it up and redo the changes on the new ConfigMap).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels quite intrusive to overwrite user configuration on an upgrade without a path to prevent this. At least needs to be documented or warned for imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can document that and also make this statement a little clearer.

Just to clarify, at upgrade time, the ConfigMap is not retained only if the coredns image we find is "unofficial" (not shipped with alpha (1.9) or beta (1.10)... ) and coredns will be upgraded to the latest image and default ConfigMap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - this is user-hostile. If you expose a ConfigMap you are publishing it as a mechanism to configure the app. If you intend to blow it away, you should just not expose it. at all.

Now we have to talk about defaults vs user-overrides.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but this is just for the ConfigMap, right? That is, how should we handle changes to the manifest/deployment itself?

In this case then we should add the new, updated default ConfigMap under a different name, to make it easy for the user to switch to that if they want to, or at least compare any differences. This also means that in the event of non-backwards compatible changes to the Corefile (which we might do, but only when we bump the minor - not patch - version), we will need to do something special.

@rajansandeep can you make sure the 1.0.x -> 1.1.x remains backwards compatible for the Corefile? If so we don't need to worry about that for a while.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbelamaric : what about this proposition for upgrade 1-ConfigMap, 2-Deployment
1- never update the ConfigMap (as you suggest). It is created first time of installation of CoreDNS (either by upgrade from kube-dns, either by installation from scratch), and on subsequent upgrade, we keep that ConfigMap - Until we reach the issue of non-backward compatible corefile. At that time we figure out with special upgrade.

NOTE: does it worth to create a "coredns-default" ConfigMap ? It would show no more than the one of the coredns/deployment repository since kube-dns config no longer exist.

2- For the deployment part, first installation creates the deployment of CoreDNS. Subsequent upgrades only change the image of CoreDNS - unless admin prevent it by removing the label "kubeadm-auto-upgrade" that the first installation is setting.


* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Retaining the ConfigMap of a custom/modified image of CoreDNS during upgrade. That is, if the user has modified the Corefile in the ConfigMap, those changes will not be retained (the user should back it up and redo the changes on the new ConfigMap).
* Translation configuration of kube-dns to equivalent CoreDNS that is defined outside of the kube-dns ConfigMap. For example, modifications to the manifest or `dnsmasq` configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be considered a breaking change and needs to be documented imo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be required as there a users of kube-dns with the configmap, and their setups will break if this is not provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn’t anything that prevents them from continuing to use Kube-dns, which is backwards compatible.

I don’t think we should claim support for arbitrary customizations, and “customizer beware” is our default stance.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bowei : we are converting the kube-dns ConfigMap to a CoreDns ConfigMap. this "non-goal" statement is to warn that a customized deployment of kube-dns with extra options on the cmd-lines of dnsmasq or kube-dns containers will not be translated.

in other words we support translation "StubDomains", "UpstreamNameServer" and "Federation" is supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear - I read this as "we WILL support graceful upgrades from the kube-dns configmap, but we will NOT support upgrade from arbitrary other config" - is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin, yes. The options that can be configured in the kube-dns ConfigMap will be retained in the conversion to the Corefile. But we aren't going to tweak things for arbitrary config changes to the dnsmasq CLI options, etc. If the user has that, they should probably retain the kube-dns installation until they can upgrade in a more considered manner.


* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Retaining the ConfigMap of a custom/modified image of CoreDNS during upgrade. That is, if the user has modified the Corefile in the ConfigMap, those changes will not be retained (the user should back it up and redo the changes on the new ConfigMap).
* Translation configuration of kube-dns to equivalent CoreDNS that is defined outside of the kube-dns ConfigMap. For example, modifications to the manifest or `dnsmasq` configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be required as there a users of kube-dns with the configmap, and their setups will break if this is not provided.

## Graduation Criteria

* Passing all e2e conformance and DNS related tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalability tests

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bowei : what have you in mind ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the terseness, I mean that we should get the CoreDNS running as a part of regular k8s scale tests.

Having it possible to be run as target of perf-test for DNS would be great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have looked at the perf-test for DNS, we can do that but it will require a lot of changes to those tests. They are geared toward discovering/testing varying levels of resources devoted to dnsmasq and kube-dns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have an integration style test for CoreDNS in that style given the surprises we've experienced wrt to resource limits etc from the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there are two items:

  • CoreDNS as part of the regular k8s scale runs (e.g. up to 5k nodes)
  • Integration test style resource determination.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bowei : Let me verify I understand your 2 points:

CoreDNS as part of the regular k8s scale runs (e.g. up to 5k nodes)

That means have CoreDNS be part of the regular k8s scale runs at 5k nodes
Therefore ensure we have a DNS e2e tests with the flag "[Feature:Performance]".
It will automatically run with all perf-tests like "ci-kubernetes-kubemark-gce-scale" (which is the one running 5k nodes with Kubemark)

integration test style resource determination

From this sentence, and your comment before, I understand you advise to extend those perf-test for DNS and create some dedicated tests for CoreDNS to measure the resources used by CoreDNS under stress of queries.

is my understanding correct ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


## Motivation

* CoreDNS is more flexible and extensible than what kube-dns.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop "what" here.

### Goals

* Have CoreDNS run as the default DNS server both after a new installation and an upgrade of the existing installation.
* Make CoreDNS available as an [image in gcr.io](https://github.com/kubernetes/kubernetes/pull/59753) and define a workflow/process to update the CoreDNS versions in the future.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably file a coredns issue for this as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miekg : maybe, but I propose we wait to have a better understanding of who is doing what in the new process. Specifically who will have the responsibility to generating the image and push to gcr.io

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably keep a fork of CoreDNS in github.com/kubernetes (or in a new, related org) and cut our own containers from there. We should be doing that for everything we ship.

## Graduation Criteria

* Passing all e2e conformance and DNS related tests.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoscaling?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miekg : autoscaling is generic and should work with coredns.
However I noticed an issue in the e2e tests of autoscaling with DNS - because the test itself waits for kube-dns to run. This issue will have to be fixed. See kubernetes/kubernetes#61176


### Goals

* Have CoreDNS run as the default DNS server both after a new installation and an upgrade of the existing installation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this.


* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Retaining the ConfigMap of a custom/modified image of CoreDNS during upgrade. That is, if the user has modified the Corefile in the ConfigMap, those changes will not be retained (the user should back it up and redo the changes on the new ConfigMap).
* Translation configuration of kube-dns to equivalent CoreDNS that is defined outside of the kube-dns ConfigMap. For example, modifications to the manifest or `dnsmasq` configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn’t anything that prevents them from continuing to use Kube-dns, which is backwards compatible.

I don’t think we should claim support for arbitrary customizations, and “customizer beware” is our default stance.

* CoreDNS has fewer moving parts than kube-dns, taking advantage of the plugin architecture, making it a single executable and single process.
* It is written in Go, making it memory-safe (kube-dns includes dnsmasq which is not).
* CoreDNS has [better performance](https://github.com/kubernetes/community/pull/1100#issuecomment-337747482) than [kube-dns](https://github.com/kubernetes/community/pull/1100#issuecomment-338329100) in terms of greater QPS, lower latency, and lower memory consumption.
* It supports a number of [use cases](https://coredns.io/2017/05/08/custom-dns-entries-for-kubernetes/) that kube-dns doesn't. As a general-purpose authoritative DNS server it has a lot of functionality that kube-dns couldn't reasonably be expected to add.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the DNS spec for Kube, so it is pretty irrelevant to this plan

### Goals

* Have CoreDNS run as the default DNS server both after a new installation and an upgrade of the existing installation.
* Make CoreDNS available as an [image in gcr.io](https://github.com/kubernetes/kubernetes/pull/59753) and define a workflow/process to update the CoreDNS versions in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably keep a fork of CoreDNS in github.com/kubernetes (or in a new, related org) and cut our own containers from there. We should be doing that for everything we ship.

### Non-Goals

* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Retaining the ConfigMap of a custom/modified image of CoreDNS during upgrade. That is, if the user has modified the Corefile in the ConfigMap, those changes will not be retained (the user should back it up and redo the changes on the new ConfigMap).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - this is user-hostile. If you expose a ConfigMap you are publishing it as a mechanism to configure the app. If you intend to blow it away, you should just not expose it. at all.

Now we have to talk about defaults vs user-overrides.


* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Retaining the ConfigMap of a custom/modified image of CoreDNS during upgrade. That is, if the user has modified the Corefile in the ConfigMap, those changes will not be retained (the user should back it up and redo the changes on the new ConfigMap).
* Translation configuration of kube-dns to equivalent CoreDNS that is defined outside of the kube-dns ConfigMap. For example, modifications to the manifest or `dnsmasq` configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear - I read this as "we WILL support graceful upgrades from the kube-dns configmap, but we will NOT support upgrade from arbitrary other config" - is that right?

* Supporting [Autopath](https://coredns.io/plugins/autopath/), which reduces the high query load caused by the long DNS search path in Kubernetes
* Making an alias for an external name [#39792](https://github.com/kubernetes/kubernetes/issues/39792)
* Dynamically adding services to another domain, without running another server [#55](https://github.com/kubernetes/dns/issues/55)
* Adding an arbitrary entry inside the cluster domain (for example TXT entries [#38](https://github.com/kubernetes/dns/issues/38))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should document, endorse, or emphasize operations that do not go through Kubernetes API, except with the explicit warning that it is coredns specific and non-portable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So are you thinking we should pull these from the KEP?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of listing the various extra features of CoreDNS, just link to a CoreDNS specific document. That way it's linked, but can be made clear it's out of scope for this KEP and k8s in general.

@fturib
Copy link

fturib commented Apr 2, 2018

@bowei @thockin : the KEP has been updated with the comments. Can you have a second look to check it integrates all your concerns. Thanks!

* Fate of kube-dns in future releases, i.e. deprecation path.

## Graduation Criteria

Copy link
Contributor

@rramkumar1 rramkumar1 Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific here? For example, can we say what is the minimum amount of test runs that need to pass before we say we are "passing all e2e conformance and DNS related tests". Similar for all criteria listed here. That way, there are clear numbers that need to be hit and there is no ambiguity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rramkumar1 Just to clarify:
I submit a PR to merge now making CoreDNS default in kubeadm, kube-up... to see if the tests are "passing" and if we're satisfied with the result, we keep the PR, else we revert.

And what is the right number of runs to determine it's a success?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rramkumar1 : why would we need to specify "amount of test runs" ? I would think that one is enough no ? as long as all test are OK with CoreDNS then CoreDNS is acceptable.

Are you worry about kind of flaking tests ? or other commits that would interfere and failed some of the "conformance" tests (independently of DNS service) ?

To prevent to have a wrong view do to other errors )not linked to the DNS service), we could propose to have one of the "conformance" + "DNS" tests running with kube-dns and one with CoreDNS. so we can validate we have the same level of passing tests.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For e2e test, I would propose:

  • all tests filtered by "--ginkgo.skip=\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]"
    (the regular tests "xxx-kubernetes-e2e-gce").

As the success of each test depends of the code itself (not only the DNS Service), we would verify that CoreDNS is equivalent to kube-dns in regards of the passed e2e tests. That means validate
"ci-kubernetes-e2e-gce-gci-ci-master" against the exact same test "ci-kubernetes-e2e-gce-gci-ci-master-kube-dns" with kube-dns deployed.

They are supposed to be always the same, or at least none of the success tests with kube-dns is failing with coredns.

@fturib
Copy link

fturib commented Apr 12, 2018

@rramkumar1 : the point on e2e tests to validate is modified to be more precise. We are not proposing a sequence of run to be green, but rather a comparison for the same commit between CoreDNS and the former kube-dns : we want to validate that CoreDNS is able to replace kube-dns.

For the e2e performance tests, the same way, we just want that those performances are not hit by CoreDNS, in other words we want the test to stay green.

Let me know if you agree with the new verbiage.
Thanks!

@rramkumar1
Copy link
Contributor

@fturib LGTM. Thanks for clarifying!

@cblecker
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 16, 2018
@rajansandeep rajansandeep force-pushed the corednskep branch 2 times, most recently from 8bd12a2 to af066fe Compare May 23, 2018 14:59
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2018

* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Translation configuration of kube-dns to equivalent CoreDNS that is defined outside of the kube-dns ConfigMap. For example, modifications to the manifest or `dnsmasq` configuration.
* Fate of kube-dns in future releases, i.e. deprecation path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making coredns the default in every installer

@thockin
Copy link
Member

thockin commented May 24, 2018

one small nit - please xref the other KEP

@rajansandeep
Copy link
Contributor Author

@thockin done.
Let me know if there are any other comments to be addressed for approval! Thanks!

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor tweaks.

creation-date: 2018-03-21
last-updated: 2018-05-18
status: provisional
succeeded by: https://github.com/kubernetes/community/pull/2167
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you were thinking of "superseded-by", but we don't want to use that in this KEP (it is for ones that get replaced by a new KEP). Instead use "see-also" (which takes a list).

* Translation of CoreDNS ConfigMap back to kube-dns (i.e., downgrade).
* Translation configuration of kube-dns to equivalent CoreDNS that is defined outside of the kube-dns ConfigMap. For example, modifications to the manifest or `dnsmasq` configuration.
* Fate of kube-dns in future releases, i.e. deprecation path.
* Making CoreDNS the default in every installer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference/link the new KEP here too.

@fturib
Copy link

fturib commented May 29, 2018

@johnbelamaric : the KEP was updated following your comments. Can you review/verify/approve ? thanks!.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 30, 2018
KEP coredns for GA only
@thockin
Copy link
Member

thockin commented May 31, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2018
@k8s-ci-robot k8s-ci-robot merged commit bc690fa into kubernetes:master May 31, 2018
@fturib
Copy link

fturib commented May 31, 2018

Thanks @thockin !

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request May 31, 2018
Automatic merge from submit-queue (batch tested with PRs 63445, 63820). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

DNS record scale test

**What this PR does / why we need it**:

Adds e2e scalability test for querying DNS with a scaled up number of records.  Specifically, it creates ~~30 services per node~~ 10000 services, then queries the cluster DNS and validates the response. This relates to a graduation criteria listed in kubernetes/community#1956.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
none

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/dns cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.