Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 使用 raven 云边 pod 无法通信 #1773

Closed
fighterhit opened this issue Nov 7, 2023 · 7 comments
Closed

[BUG] 使用 raven 云边 pod 无法通信 #1773

fighterhit opened this issue Nov 7, 2023 · 7 comments
Labels

Comments

@fighterhit
Copy link

What happened:
按照官方文档在已存在的 k8s 集群上接入了一台边缘节点,部署了 openyurt 和 raven 后,云边 pod 无法使用 ip 通信

集群节点信息:
kubectl get no -owide 输出(省略云端master和其它worker节点):

  • fuxi-dl-41(云端k8s worker节点,有公网ip 42.xxx)
  • debian11(边端接入的节点,有公网ip 183.xxx)
fuxi-dl-41                  Ready                         developing            208d   v1.23.13                            172.31.1.102     <none>        Debian GNU/Linux 11 (bullseye)   5.10.0-20-amd64         containerd://1.6.8
debian11                    Ready,SchedulingDisabled      <none>                17h    v1.23.13                            183.xxx    <none>        Debian GNU/Linux 11 (bullseye)   5.10.0-26-amd64         containerd://1.6.8

What you expected to happen:
云边pod、service可以像在一个k8s集群内一样通信

Anything else we need to know?:

  • CNI 插件:Calico v3.23.1,下面是 calico-node.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "9"
  labels:
    k8s-app: calico-node
  name: calico-node
  namespace: kube-system
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: calico-node
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2023-08-30T16:55:04+08:00"
      creationTimestamp: null
      labels:
        k8s-app: calico-node
    spec:
      containers:
      - env:
        - name: DATASTORE_TYPE
          value: kubernetes
        - name: KUBECONFIG
          value: "/host/etc/cni/net.d/calico-kubeconfig"
        - name: KUBERNETES_SERVICE_HOST
          value: "xxx"
        - name: KUBERNETES_SERVICE_PORT
          value: "yyy"
        - name: WAIT_FOR_DATASTORE
          value: "true"
        - name: NODENAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CALICO_NETWORKING_BACKEND
          valueFrom:
            configMapKeyRef:
              key: calico_backend
              name: calico-config
        - name: CLUSTER_TYPE
          value: k8s,bgp
        - name: IP
          value: autodetect
        - name: CALICO_IPV4POOL_IPIP
          value: Always
        - name: CALICO_IPV4POOL_VXLAN
          value: Never
        - name: CALICO_IPV6POOL_VXLAN
          value: Never
        - name: FELIX_IPINIPMTU
          valueFrom:
            configMapKeyRef:
              key: veth_mtu
              name: calico-config
        - name: FELIX_VXLANMTU
          valueFrom:
            configMapKeyRef:
              key: veth_mtu
              name: calico-config
        - name: FELIX_WIREGUARDMTU
          valueFrom:
            configMapKeyRef:
              key: veth_mtu
              name: calico-config
        - name: CALICO_IPV4POOL_CIDR
          value: 172.16.0.0/16
        - name: CALICO_DISABLE_FILE_LOGGING
          value: "true"
        - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
          value: ACCEPT
        - name: FELIX_IPV6SUPPORT
          value: "false"
        - name: FELIX_HEALTHENABLED
          value: "true"
        - name: FELIX_IPTABLESBACKEND
          value: Auto
        - name: IP_AUTODETECTION_METHOD
          value: "cidr=172.31.1.0/24,183.xxx/32"
        envFrom:
        - configMapRef:
            name: kubernetes-services-endpoint
            optional: true
        image: hub.fuxi.netease.com/library/calico/node:v3.23.1
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/calico-node
              - -shutdown
        name: calico-node
        resources:
          requests:
            cpu: 250m
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host/etc/cni/net.d
          name: cni-net-dir
        - mountPath: /lib/modules
          name: lib-modules
          readOnly: true
        - mountPath: /run/xtables.lock
          name: xtables-lock
        - mountPath: /var/run/calico
          name: var-run-calico
        - mountPath: /var/lib/calico
          name: var-lib-calico
        - mountPath: /var/run/nodeagent
          name: policysync
        - mountPath: /sys/fs/
          mountPropagation: Bidirectional
          name: sysfs
        - mountPath: /var/log/calico/cni
          name: cni-log-dir
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      initContainers:
      - command:
        - /opt/cni/bin/calico-ipam
        - -upgrade
        env:
        - name: KUBERNETES_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CALICO_NETWORKING_BACKEND
          valueFrom:
            configMapKeyRef:
              key: calico_backend
              name: calico-config
        - name: IP_AUTODETECTION_METHOD
          value: cidr=172.31.1.0/24
        envFrom:
        - configMapRef:
            name: kubernetes-services-endpoint
            optional: true
        image: hub.fuxi.netease.com/library/calico/cni:v3.23.1
        imagePullPolicy: IfNotPresent
        name: upgrade-ipam
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/cni/networks
          name: host-local-net-dir
        - mountPath: /host/opt/cni/bin
          name: cni-bin-dir
      - command:
        - /opt/cni/bin/install
        env:
        - name: CNI_CONF_NAME
          value: 10-calico.conflist
        - name: CNI_NETWORK_CONFIG
          valueFrom:
            configMapKeyRef:
              key: cni_network_config
              name: calico-config
        - name: KUBERNETES_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CNI_MTU
          valueFrom:
            configMapKeyRef:
              key: veth_mtu
              name: calico-config
        - name: SLEEP
          value: "false"
        - name: IP_AUTODETECTION_METHOD
          value: cidr=172.31.1.0/24
        envFrom:
        - configMapRef:
            name: kubernetes-services-endpoint
            optional: true
        image: hub.fuxi.netease.com/library/calico/cni:v3.23.1
        imagePullPolicy: IfNotPresent
        name: install-cni
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /host/opt/cni/bin
          name: cni-bin-dir
        - mountPath: /host/etc/cni/net.d
          name: cni-net-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-node-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      serviceAccount: calico-node
      serviceAccountName: calico-node
      terminationGracePeriodSeconds: 0
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoExecute
        operator: Exists
      volumes:
      - hostPath:
          path: /lib/modules
          type: ""
        name: lib-modules
      - hostPath:
          path: /var/run/calico
          type: ""
        name: var-run-calico
      - hostPath:
          path: /var/lib/calico
          type: ""
        name: var-lib-calico
      - hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        name: xtables-lock
      - hostPath:
          path: /sys/fs/
          type: DirectoryOrCreate
        name: sysfs
      - hostPath:
          path: /opt/cni/bin
          type: ""
        name: cni-bin-dir
      - hostPath:
          path: /etc/cni/net.d
          type: ""
        name: cni-net-dir
      - hostPath:
          path: /var/log/calico/cni
          type: ""
        name: cni-log-dir
      - hostPath:
          path: /var/lib/cni/networks
          type: ""
        name: host-local-net-dir
      - hostPath:
          path: /var/run/nodeagent
          type: DirectoryOrCreate
        name: policysync
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: OnDelete
  • kubectl get gateways -oyaml 输出
apiVersion: v1
items:
- apiVersion: raven.openyurt.io/v1alpha1
  kind: Gateway
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"raven.openyurt.io/v1alpha1","kind":"Gateway","metadata":{"annotations":{},"name":"gw-cloud"},"spec":{"endpoints":[{"nodeName":"fuxi-dl-41","underNAT":false}]}}
    creationTimestamp: "2023-11-07T02:21:06Z"
    generation: 2
    name: gw-cloud
    resourceVersion: "324168460"
    uid: fb63386d-41d3-4a56-aaca-fe9c255222f9
  spec:
    endpoints:
    - nodeName: fuxi-dl-41
      publicIP: 42.xxx
  status:
    activeEndpoint:
      nodeName: fuxi-dl-41
      publicIP: 42.xxx
    nodes:
    - nodeName: fuxi-dl-41
      privateIP: 172.31.1.102
      subnets:
      - 172.16.59.192/26
      - 172.16.60.0/26
- apiVersion: raven.openyurt.io/v1alpha1
  kind: Gateway
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"raven.openyurt.io/v1alpha1","kind":"Gateway","metadata":{"annotations":{},"name":"gw-hangzhou"},"spec":{"endpoints":[{"nodeName":"debian11","underNAT":true}]}}
    creationTimestamp: "2023-11-07T02:21:06Z"
    generation: 2
    name: gw-hangzhou
    resourceVersion: "324168413"
    uid: 4ed75f7e-c550-4114-9bf0-4f8c9cb1e50e
  spec:
    endpoints:
    - nodeName: debian11
      publicIP: 183.xxx
      underNAT: true
  status:
    activeEndpoint:
      nodeName: debian11
      publicIP: 183.xxx
      underNAT: true
    nodes:
    - nodeName: debian11
      privateIP: 183.xxx
      subnets:
      - 172.16.154.192/26
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
  • 云、边通信的两个 Pod IP:
云端 fuxi-dl-41 上:172.16.60.59
边端 debian11 上:172.16.154.205

raven 配置:

apiVersion: v1
data:
  forward-node-ip: "true"
  metric-bind-addr: :18080
  vpn-driver: libreswan
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: raven-agent
    meta.helm.sh/release-namespace: kube-system
  labels:
    app.kubernetes.io/managed-by: Helm
  name: raven-agent-config
  namespace: kube-system
  • raven 内vpn链接信息(raven pod 内执行 /usr/libexec/ipsec/whack --status):

Environment:

  • OpenYurt version: 因stable v1.3.4对于calico有bug,使用社区提供修复calico bug 的 yurt-manager luckymrwang/yurt-manager:hotfix-v1.3.4
  • Kubernetes version (use kubectl version): 1.23.13
  • OS (e.g: cat /etc/os-release): 云边都是 Debian 11
  • Kernel (e.g. uname -a): 云端 Linux fuxi-dl-41 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux,边端 Linux debian11 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux
  • Install tools:
  • Others:

others

/kind bug

@fighterhit fighterhit added the kind/bug kind/bug label Nov 7, 2023
@River-sh
Copy link
Contributor

River-sh commented Nov 7, 2023

It seems that the private ip address of debian11 in gw-hangzhou.status.nodes is a public address

@fighterhit
Copy link
Author

fighterhit commented Nov 7, 2023

It seems that the private ip address of debian11 in gw-hangzhou.status.nodes is a public address

@River-sh Yes, the debian11 edge node only has public IP. Is there a way to solve it?

@fighterhit
Copy link
Author

When set forward-node-ip:false:

vpn log:

@River-sh
Copy link
Contributor

River-sh commented Nov 8, 2023

When set forward-node-ip:false:

vpn log:

The edge node is actively initiating a link, but fails to do so. You can use tcpdump tu capture the packet and check whether there are authenticated packets,tcpdump -i any udp port 4500

@River-sh
Copy link
Contributor

River-sh commented Nov 8, 2023

When set forward-node-ip:false:

vpn log:

The edge node is actively initiating a link, but fails to do so. You can use tcpdump tu capture the packet and check whether there are authenticated packets,tcpdump -i any udp port 4500

You can set the underNAT of the edge gateway endpoints to be false because it has a public IP address

@fighterhit
Copy link
Author

When set forward-node-ip:false:

vpn log:

The edge node is actively initiating a link, but fails to do so. You can use tcpdump tu capture the packet and check whether there are authenticated packets,tcpdump -i any udp port 4500

You can set the underNAT of the edge gateway endpoints to be false because it has a public IP address

@River-sh Unfortunately setting it to false doesn't work either. I'm not sure what this value should be set to, because at first I set it to false but then in the DingTalk group 珩轩 told me that the edge node should be set to true. But neither seems to work.

Copy link

stale bot commented Feb 9, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 9, 2024
@stale stale bot closed this as completed Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants