Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't access pods by service cluster ip except you access from the node where pod in. #1958

Closed
Bowser1704 opened this issue Jun 26, 2020 · 6 comments

Comments

@Bowser1704
Copy link

Bowser1704 commented Jun 26, 2020

Version:

k3s version v1.17.5+k3s1 (58ebdb2)
K3s arguments:

/usr/local/bin/k3s server
Describe the bug

Can't access pods by service cluster ip except you access from the node where pod in.
But i can access by the pod ip
To Reproduce

   [root@bowser1704 cert-manager]# kubectl get svc -n cert-manager
   NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
   cert-manager           ClusterIP   10.43.157.248   <none>        9402/TCP   20h
   cert-manager-webhook   ClusterIP   10.43.247.15    <none>        443/TCP    20h
   
   # can't access
   [root@bowser1704 cert-manager]# curl -k https://10.43.247.15:443
   ^C
   
   [root@iZuf6dq9lezw045stckkhsZ cert-manager]# kubectl describe pods cert-manager-webhook-7c6c464d7b-vp44p -n cert-manage
   
   ...
	Node:         foo1/172.19.145.208
   ...

moving the pod from other nodes to master.

  # edit pod yaml
  nodeSelector:
   k3s.io/internal-ip: 172.19.145.188
     
  # check the node where pod in
  [root@iZuf6dq9lezw045stckkhsZ cert-manager]# kubectl describe pods cert-manager-webhook-7c6c464d7b-vp44p -n cert-manage
  
  ...
       Node:         bowser1704/172.19.145.188
  ...
     
  # accessing in master node
  [root@bowser1704 cert-manager]# curl -k https://10.43.247.15:443
  404 page not found

Expected behavior

accessing any pod in any node.
Actual behavior

can't access.
Additional context / logs

I check the ip-tables, all the nodes have the same rules.
I use flannel and vxlan as the flannel-backend.

@Bowser1704 Bowser1704 changed the title Can Can't access pods by service cluster ip except you access from the node where pod in. Jun 26, 2020
@brandond
Copy link
Member

brandond commented Jun 26, 2020

There are many open issues about this across multiple repos, but you can follow the thread from here: #1266 (comment)

tl;dr vxlan is broken at the moment due to a kernel bug; you can either switch to host-gw or work around it by running an ethtool command (that needs to be re-run every time hosts are rebooted) on every node.

@dweomer
Copy link
Contributor

dweomer commented Jun 26, 2020

@brandond wrote:

There are many open issues about this across multiple repos, but you can follow the thread from here: #1266 (comment)

tl;dr vxlan is broken at the moment due to a kernel bug; you can either switch to host-gw or work around it by running an ethtool command (that needs to be re-run every time hosts are rebooted) on every node.

I had forgotten about the vxlan issue and assumed this was related the ongoing iptables/nftables issue(s) that should be solved via #1914.

@brandond
Copy link
Member

@dweomer could be that too. Between vxlan and nftables things can be a little rough to get going if you're new.

@Bowser1704
Copy link
Author

There are many open issues about this across multiple repos, but you can follow the thread from here: #1266 (comment)

tl;dr vxlan is broken at the moment due to a kernel bug; you can either switch to host-gw or work around it by running an ethtool command (that needs to be re-run every time hosts are rebooted) on every node.

thanks.
I changed the flannel backend from vxlan to host-gw, but it doesn't seem to work.

  • method to change flannel backend.
vim /etc/systemd/system/k3s.service
  • what I modified.
ExecStart=/usr/local/bin/k3s server --flannel-backend host-gw
  • restart k3s
systemctl daemon-reload
systemctl restart k3s
  • check my k3s net-conf
[root@bowser1704 ~]# cat /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json
{
	"Network": "10.42.0.0/16",
	"Backend": {
	"Type": "host-gw"
}
}

But it still doesn't work.

[root@bowser1704 ~]# kubectl get svc -n food
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
redis          ClusterIP   10.43.105.118   <none>        7388/TCP   48d
food-backend   ClusterIP   10.43.105.114   <none>        8080/TCP   48d
[root@bowser1704 ~]# curl http://10.43.105.114:8080/sd/health
^C
[root@bowser1704 ~]# kubectl get ep -n food
NAME           ENDPOINTS         AGE
food-backend   10.42.1.4:8080    48d
redis          10.42.0.10:6388   48d
[root@bowser1704 ~]# curl http://10.42.1.4:8080/sd/health
OK
[root@bowser1704 ~]#

Do you have any suggestions?
thanks.

@Bowser1704
Copy link
Author

There are many open issues about this across multiple repos, but you can follow the thread from here: #1266 (comment)

tl;dr vxlan is broken at the moment due to a kernel bug; you can either switch to host-gw or work around it by running an ethtool command (that needs to be re-run every time hosts are rebooted) on every node.

sorry.
I just realized there are some diffrencec with #1266 (comment)
I dont's use the hostNetwork: true parameter .

Do you have any suggestions?
thanks.

@Bowser1704
Copy link
Author

similar to #1638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants