Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healing receives the invalid monitor event #1586

Closed
glazychev-art opened this issue Feb 13, 2024 · 2 comments
Closed

Healing receives the invalid monitor event #1586

glazychev-art opened this issue Feb 13, 2024 · 2 comments
Assignees

Comments

@glazychev-art
Copy link
Contributor

Current Behavior

Sometimes healing receives the invalid monitor event.

Failure Information (for bugs)

Steps to Reproduce

  1. Create kind cluster with 2 nodes
  2. Deploy spire
  3. Deploy basic NSM setup
  4. Deploy 2 NSEs: with IPv4 and IPv6 CIDRs
  5. Deploy 4 NSC with disabled liveliness checker. Each one connects to both NSEs
  6. Delete the forwarders one by one. After each restart we check connectivity.
  7. Ping is not working and there is connection.state == DOWN in healing monitor events
Feb 12 10:44:50.519�[37m [DEBU] [id:nsc-kernel-86567c684c-54n2m-0] [heal:eventLoop] [type:networkService] �[0m(7.1)         received eventIn: connections:{key:"826d5f4c-6ec6-4e9e-b2e5-e0744db262b0" value:{id:"826d5f4c-6ec6-4e9e-b2e5-e0744db262b0" network_service:"component-restart-ipv4" mechanism:{cls:"LOCAL" type:"KERNEL" parameters:{key:"inodeURL" value:"inode://4/4026532548"} parameters:{key:"name" value:"nsm-1"}} context:{ip_context:{src_ip_addrs:"172.16.1.3/32" dst_ip_addrs:"172.16.1.2/32" src_routes:{prefix:"172.16.1.2/32"} dst_routes:{prefix:"172.16.1.3/32"} excluded_prefixes:"10.96.0.0/16" excluded_prefixes:"10.244.0.0/16"} dns_context:{} MTU:1446} labels:{key:"nodeName" value:"kind-worker2"} path:{index:1 path_segments:{name:"nsc-kernel-86567c684c-54n2m" id:"nsc-kernel-86567c684c-54n2m-0" token:"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWNvbXBvbmVudC1yZXN0YXJ0L3BvZC9uc2Mta2VybmVsLTg2NTY3YzY4NGMtNTRuMm0iLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9uc20tc3lzdGVtL3BvZC9uc21nci12OWtqdyJdLCJleHAiOjE3MDc3MzM5NjN9.6u5akvYWQYojfOy8Wcdx7frG66Jk2-xVr9JcCpbqFHRuOCXBmb2Vly6_Yg1OnJDbMEY4Ab7MuHFEntCu0HAc7Q" expires:{seconds:1707733963 nanos:197161874}} path_segments:{name:"nsmgr-v9kjw" id:"826d5f4c-6ec6-4e9e-b2e5-e0744db262b0" token:"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL25zbWdyLXY5a2p3IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLXZwcC1qdmQ4ZiJdLCJleHAiOjE3MDc3MzQ0NDR9.-l4vna51dfJt4-MqqPxqESoWFQTpppinYnF3hhz3wiUsSHQwT2B12rCAc7BA9oXwmVZR52ioT6wWnwG8P7e-Ww" expires:{seconds:1707734444 nanos:702556546}} path_segments:{name:"forwarder-vpp-jvd8f" id:"8f4a4ebe-07dd-4e23-8aff-d1d2288380f0" token:"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci12cHAtanZkOGYiLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9uc20tc3lzdGVtL3BvZC9uc21nci04a2RmcyJdLCJleHAiOjE3MDc3MzQ0NDV9.UL65dJrG6Oc49MnnEqIgjEZXziztdgLcPs6VfZ0nW4IKM-X3UPd_2lO9qkAz-4hQyQYVNWz8cuVu4ki4_cStOA" expires:{seconds:1707734445 nanos:908047657} metrics:{key:"client_drops" value:"0"} metrics:{key:"client_interface" value:"VXLAN/vxlan_tunnel1"} metrics:{key:"client_rx_bytes" value:"0"} metrics:{key:"client_rx_packets" value:"0"} metrics:{key:"client_tx_bytes" value:"0"} metrics:{key:"client_tx_packets" value:"0"} metrics:{key:"server_drops" value:"0"} metrics:{key:"server_interface" value:"VIRTIO/tap7"} metrics:{key:"server_rx_bytes" value:"0"} metrics:{key:"server_rx_packets" value:"0"} metrics:{key:"server_tx_bytes" value:"0"} metrics:{key:"server_tx_packets" value:"0"}} path_segments:{name:"nsmgr-8kdfs" id:"babfe01e-0e09-464a-9339-f15da0fa7f0a" token:"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL25zbWdyLThrZGZzIiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLXZwcC1qdmQ4ZiJdLCJleHAiOjE3MDc3MzQ0NDh9.tRE-61OT9LgFNJdHCV-4XUcS-KwpL1x_hYjsAqZs-ik-SYXqvs_FiEEoZhWTRa5R8pdA3WwRMfXEz0lfP6V-Rg" expires:{seconds:1707734448 nanos:723775612}} path_segments:{name:"forwarder-vpp-ldn5x" id:"a6ecf1dc-6637-4a50-bc44-1f9b006e4810" token:"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci12cHAtbGRuNXgiLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9ucy1jb21wb25lbnQtcmVzdGFydC9wb2QvbnNlLWtlcm5lbC1pcHY0LTc3ZjZiZmY0OC1obXB4NCJdLCJleHAiOjE3MDc3MzQ0NDZ9.O-4JTU2FYpDOd59HoThJXaFW5Lp-Tv-UvrfqPET3jYR2Ky4M5VvAmgMP7OWOP7ngMuHG-ZR93xUsDW_eZpy1Tw" expires:{seconds:1707734446 nanos:911412285} metrics:{key:"client_drops" value:"0"} metrics:{key:"client_interface" value:"VIRTIO/tap0"} metrics:{key:"client_rx_bytes" value:"0"} metrics:{key:"client_rx_packets" value:"0"} metrics:{key:"client_tx_bytes" value:"0"} metrics:{key:"client_tx_packets" value:"0"} metrics:{key:"server_drops" value:"0"} metrics:{key:"server_interface" value:"VXLAN/vxlan_tunnel3"} metrics:{key:"server_rx_bytes" value:"0"} metrics:{key:"server_rx_packets" value:"0"} metrics:{key:"server_tx_bytes" value:"0"} metrics:{key:"server_tx_packets" value:"0"}} path_segments:{name:"nse-kernel-ipv4-77f6bff48-hmpx4" id:"58e2e403-c310-4cd7-98d8-1a9ce6ce587d" token:"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWNvbXBvbmVudC1yZXN0YXJ0L3BvZC9uc2Uta2VybmVsLWlwdjQtNzdmNmJmZjQ4LWhtcHg0IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLXZwcC1sZG41eCJdLCJleHAiOjE3MDc3MzQ0NDZ9.VvTPEFKiLIXyJ7a9wwH7leQxXarUbc3GS23jifro14g7vyVP29Pv9dAOQj_rlQTKYv4sFFKOA7lS3JtgqT1oOA" expires:{seconds:1707734446 nanos:915109630} metrics:{key:"server_interface" value:"KERNEL/component--58e2"}}} network_service_endpoint_name:"nse-kernel-ipv4-77f6bff48-hmpx4" payload:"ETHERNET" state:DOWN}}, err: <nil>
Feb 12 10:44:50.519�[33m [WARN] [id:nsc-kernel-86567c684c-54n2m-0] [heal:eventLoop] [type:networkService] �[0m(7.2)         Control plane is down
Feb 12 10:44:50.519�[37m [DEBU] [id:nsc-kernel-86567c684c-54n2m-0] [heal:eventLoop] [type:networkService] �[0m(7.3)         Reconnect with reselect
Feb 12 10:44:50.519�[36m [INFO] [id:nsc-kernel-86567c684c-54n2m-0] [heal:eventLoop] [type:networkService] �[0m(7.4)         Heal success
Feb 12 10:44:50.520�[37m [TRAC] [id:nsc-kernel-86567c684c-54n2m-0] [type:networkService] �[0m(2.190)    close={"id":"nsc-kernel-86567c684c-54n2m-0","network_service":"component-restart-ipv4","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"file:///proc/thread-self/ns/net","name":"nsm-1"}},"context":{"ip_context":{"src_ip_addrs":["172.16.1.7/32"],"dst_ip_addrs":["172.16.1.6/32"],"src_routes":[{"prefix":"172.16.1.6/32"}],"dst_routes":[{"prefix":"172.16.1.7/32"}],"excluded_prefixes":["10.96.0.0/16","10.244.0.0/16"]},"dns_context":{},"MTU":1446},"labels":{"nodeName":"kind-worker2"},"path":{"path_segments":[{"name":"nsc-kernel-86567c684c-54n2m","id":"nsc-kernel-86567c684c-54n2m-0","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWNvbXBvbmVudC1yZXN0YXJ0L3BvZC9uc2Mta2VybmVsLTg2NTY3YzY4NGMtNTRuMm0iLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9uc20tc3lzdGVtL3BvZC9uc21nci12OWtqdyJdLCJleHAiOjE3MDc3MzQ4MDh9.wg8LzTlqW5TTmlJBOXUr7zWAJnGbC1RvbN-Z95x2vcIlswmgWr0QcUhUzW6hmcQhFmWyhKO--H8Js_a1BwFmMQ","expires":{"seconds":1707734808,"nanos":254775768}},{"name":"nsmgr-v9kjw","id":"826d5f4c-6ec6-4e9e-b2e5-e0744db262b0","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL25zbWdyLXY5a2p3IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLXZwcC1qdmQ4ZiJdLCJleHAiOjE3MDc3MzUyODh9.F8Txlm_qZKngGYplXo1RXoruMVDbBkyedLIE-7Wh7__yHTlEvQDrg5hkNXqrDkVebdIDqF_dTAXTEc6MDLnVdg","expires":{"seconds":1707735288,"nanos":294567422}},{"name":"forwarder-vpp-jvd8f","id":"6c463884-975e-4ba5-84c2-f89b3051f5f2","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci12cHAtanZkOGYiLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9uc20tc3lzdGVtL3BvZC9uc21nci04a2RmcyJdLCJleHAiOjE3MDc3MzUyODh9.hU-2cer32jO0jX8lhPy1WEh8RKCD_fOnjGAbzA2xiBKeJ2fP_GIeycfK5zLNvZQXf06sQyHDY8o3CbWm7W2Q9A","expires":{"seconds":1707735288,"nanos":349424042},"metrics":{"client_drops":"0","client_interface":"VXLAN/vxlan_tunnel4","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_interface":"VIRTIO/tap8","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"nsmgr-8kdfs","id":"babfe01e-0e09-464a-9339-f15da0fa7f0a","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL25zbWdyLThrZGZzIiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLXZwcC1sZG41eCJdLCJleHAiOjE3MDc3MzUyODh9.TxaP_pQMSMCJanAwigxsVOOAgRMBQGVL8w3-M5mq0OFbtIuvCzlB5PuhSUWJb2tuxqcBh9Bij6-jJKsA4XPdvQ","expires":{"seconds":1707735288,"nanos":384309864}},{"name":"forwarder-vpp-ldn5x","id":"a6ecf1dc-6637-4a50-bc44-1f9b006e4810","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci12cHAtbGRuNXgiLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9ucy1jb21wb25lbnQtcmVzdGFydC9wb2QvbnNlLWtlcm5lbC1pcHY0LTc3ZjZiZmY0OC1obXB4NCJdLCJleHAiOjE3MDc3MzUyODh9.KNMhuSrPKK9pBAG6MRz7CBiMhcF-pgFFy67EXl61mcFSIyawvtY_ZMNds1yIcpDChEXbAFx-HgNoPhVqxEu-mg","expires":{"seconds":1707735288,"nanos":430517940},"metrics":{"client_drops":"0","client_interface":"VIRTIO/tap4","client_rx_bytes":"0","client_rx_packets":"0","client_tx_bytes":"0","client_tx_packets":"0","server_drops":"0","server_interface":"VXLAN/vxlan_tunnel2","server_rx_bytes":"0","server_rx_packets":"0","server_tx_bytes":"0","server_tx_packets":"0"}},{"name":"nse-kernel-ipv4-77f6bff48-hmpx4","id":"58e2e403-c310-4cd7-98d8-1a9ce6ce587d","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWNvbXBvbmVudC1yZXN0YXJ0L3BvZC9uc2Uta2VybmVsLWlwdjQtNzdmNmJmZjQ4LWhtcHg0IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLXZwcC1sZG41eCJdLCJleHAiOjE3MDc3MzUyODh9.21IQkHbMtpCb3RQGcp0925oF2nh13Sb0Dth51Yu2jcq9f9ZHP8c_9CT5Tzjetsjhy4muMZy4u7QaX3yhD_VWQQ","expires":{"seconds":1707735288,"nanos":431893475},"metrics":{"server_interface":"KERNEL/component--58e2"}}]},"network_service_endpoint_name":"nse-kernel-ipv4-77f6bff48-hmpx4","payload":"ETHERNET"}

This example with some modifications can be very helpful:
https://github.com/networkservicemesh/deployments-k8s/tree/bf89a8a5005e7745e0160318d9162cdcb96732a0/examples/heal_extended/component-restart

Context

This issue is hard to reproduce. You probably need to reduce the request timeout to 5s (or less).

Failure Logs

cl.log

@glazychev-art
Copy link
Contributor Author

Results from ci with a very similar behavior:
Build: https://github.com/networkservicemesh/integration-k8s-gke/actions/runs/7879640502/job/21500287770?pr=426
Logs: logs-1112.zip
(see finance-client-6f88bb9bcf-g9zm8)

@glazychev-art
Copy link
Contributor Author

Fixed - networkservicemesh/cmd-nsc#608

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant