Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

CoreDNS pod will affect resource calculation for hived scheduler #5056

Closed
hzy46 opened this issue Nov 5, 2020 · 3 comments · Fixed by #5071
Closed

CoreDNS pod will affect resource calculation for hived scheduler #5056

hzy46 opened this issue Nov 5, 2020 · 3 comments · Fixed by #5071
Assignees

Comments

@hzy46
Copy link
Contributor

hzy46 commented Nov 5, 2020

  1. Kubernetes deploys the coreDNS pods in nodes within the cluster. It is not guaranteed that Kubernetes will/won't deploy the coreDNS pod in a certain node. Some PAI worker nodes may have it, while others may not.

  2. The coreDNS pod requests 1 CPU and 500MiB memory. It will affect resoure calculation for hived scheduler.

    e.g. One cluster has one master node and two worker nodes. Every worker node has 10 allocatable CPUs. In the beginning, the coreDNS pod is deployed in the master node. So the admin configures 10 CPUs in every worker node in hived scheduler. But, for some reasons, one coreDNS pod is deployed in one worker node. Thus the worker node's allocatable CPU number becomes 9. It may cause infinite job retries because hived may always schedule job to this worker node.

    image

@Binyang2014
Copy link
Contributor

Remove coredns may cause pod which not using hostNetwork failed to access internet. May need to move it to master node.
I'm not sure about the AKS env. If we can control the coredns in AKS

@hzy46
Copy link
Contributor Author

hzy46 commented Nov 10, 2020

I set the coredns requests to zero, and tested the following items:

  • if the pod can access internet
  • if Service works in cluster

The tests were OK.

@Binyang2014 Please help review #5071 .

@hzy46
Copy link
Contributor Author

hzy46 commented Nov 10, 2020

For those who have already installed k8s, I suggest to use kubectl edit to set coreDNS requests to zero. Details:

  1. kubectl edit deployment/coredns -n kube-system

  2. set spec.template.spec.containers[0].resources.cpu to 0m, and spec.template.spec.containers[0].resources.memory to 0Mi

image

  1. save the config, and wait a few minutes.

@suiguoxin suiguoxin mentioned this issue Nov 16, 2020
39 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants