-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic healing should work with ovs forwarder #9778
Comments
Related issue: networkservicemesh#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Seems that NSM_MAX_TOKEN_LIFETIME default setting is 24h instead of 10 minutes lifetime used in vpp forwarder. |
Same result with NSM_MAX_TOKEN_LIFETIME=10min setting. |
@ljkiraly Could you also attach logs from the NSC? It could help with investigation. |
@denis-tingaikin Just realized that I missed to add the logs to the zip file. I updated the attachment and I adding the nsc logs here as well. {forwarder log}
Sep 12 16:53:02.609�[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] �[0m(57.1) request-response={"id":"81d3be5c-cad4-4811-a6ef-5c4da4306c30","network_service":"local-forwarder-death","mechanism":{"cls":"LOCAL","type":"KERNEL","parameters":{"inodeURL":"inode://4/4026534093","name":"local-forw-f280","supportsVlan":"true"}},"context":{"ip_context":{"src_ip_addrs":["172.16.1.1/32"],"dst_ip_addrs":["172.16.1.0/32"],"src_routes":[{"prefix":"172.16.1.0/32"}],"dst_routes":[{"prefix":"172.16.1.1/32"}],"excluded_prefixes":["10.96.0.0/16","10.244.0.0/16"]},"dns_context":{}},"labels":{"nodeName":"kind-worker","podName":"alpine"},"path":{"index":2,"path_segments":[{"name":"alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b","id":"alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWxvY2FsLWZvcndhcmRlci1kZWF0aC9wb2QvYWxwaW5lIiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvbnNtZ3ItZjR2NGgiXSwiZXhwIjoxNjk0NTM4MTgyfQ.oYJibfRxCXQIcNEyvF_lWY-fMjzuBaktrGXvDcNnznI6X4z44MNRCtVd-NHJOGPtaAoxDUghUDYeE_6WcpFtxQ","expires":{"seconds":1694538182,"nanos":388924276}},{"name":"nsmgr-f4v4h","id":"b4b84141-cc9e-4e7b-be7f-cc3b89816e4a","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL25zbWdyLWY0djRoIiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLW92cy1iOTcybiJdLCJleHAiOjE2OTQ1MzgxODJ9.l5ssrZ5w4uhRW30WnjD3XAvti8S7TvszBtkw6ogCyltUwNZgX8TgRXQaqgEuBv9Zrv31Kl-8dhLWcnncrvMGfw","expires":{"seconds":1694538182,"nanos":407122067}},{"name":"forwarder-ovs-b972n","id":"81d3be5c-cad4-4811-a6ef-5c4da4306c30","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci1vdnMtYjk3Mm4iLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9ucy1sb2NhbC1mb3J3YXJkZXItZGVhdGgvcG9kL25zZS1rZXJuZWwtNmQ2NDc3ODc1Zi1ndGdkNyJdLCJleHAiOjE2OTQ1MzgxODJ9.ghEOIYMc0eYbRBxo269dicBL-_40o5vPxIwGQecEAQ5roOqR1FQHvWKQ3SKU3wrGNf_ylkbykrM7nVvR2VRS1g","expires":{"seconds":1694538182,"nanos":482070996}},{"name":"nse-kernel-6d6477875f-gtgd7","id":"f2807884-5a41-4d88-9a42-58347aada4c1","token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWxvY2FsLWZvcndhcmRlci1kZWF0aC9wb2QvbnNlLWtlcm5lbC02ZDY0Nzc4NzVmLWd0Z2Q3IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLW92cy1iOTcybiJdLCJleHAiOjE2OTQ1MzgxODJ9.V7oLhExCJrFkTwZYIG-LASq5vMT4HpMW-szmg2S5HVMJ8Qyww8cRK0_sTtX7bBQz9_uRnWnJw5Pb8TViuRW4jw","expires":{"seconds":1694538182,"nanos":485355075}}]},"network_service_endpoint_name":"nse-kernel-6d6477875f-gtgd7","payload":"ETHERNET"}
Sep 12 16:53:02.611�[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] �[0m(53.1) request-response-diff={"mechanism":{"parameters":{"inodeURL":"file:///proc/799735/fd/16"}}}
Sep 12 16:53:02.649�[36m [INFO] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] �[0m(42.1) /proc/sys/net/ipv4/ping_group_range was set to 0 2147483647
Sep 12 16:53:02.653�[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.1/32] [Flags:0] [Gw:<nil>] [Scope:link] [duration:531.547µs] [link.Name:local-forw-f280] [netlink:RouteReplace] [type:networkService] �[0m(38.1) completed
Sep 12 16:53:02.654�[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.1/32] [Flags:0] [Gw:<nil>] [Scope:universe] [duration:145.383µs] [link.Name:local-forw-f280] [netlink:RouteReplace] [type:networkService] �[0m(38.2) completed
Sep 12 16:53:02.798�[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] �[0m(33.2) request-response-diff={"mechanism":{"parameters":{"-supportsVlan":"true","inodeURL":"file:///proc/799735/fd/13","name":"nsm-1"}}}
Sep 12 16:53:02.801�[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] �[0m(24.1) request-diff={"connection":{"mechanism":{"cls":"LOCAL","parameters":{"+inodeURL":"file:///proc/799735/fd/13","+name":"nsm-1"},"type":"KERNEL"},"path":{"path_segments":{"2":{"expires":{"nanos":482070996},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zbS1zeXN0ZW0vcG9kL2ZvcndhcmRlci1vdnMtYjk3Mm4iLCJhdWQiOlsic3BpZmZlOi8vazhzLm5zbS9ucy9ucy1sb2NhbC1mb3J3YXJkZXItZGVhdGgvcG9kL25zZS1rZXJuZWwtNmQ2NDc3ODc1Zi1ndGdkNyJdLCJleHAiOjE2OTQ1MzgxODJ9.ghEOIYMc0eYbRBxo269dicBL-_40o5vPxIwGQecEAQ5roOqR1FQHvWKQ3SKU3wrGNf_ylkbykrM7nVvR2VRS1g"},"3":{"expires":{"nanos":485355075,"seconds":1694538182},"token":"eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzcGlmZmU6Ly9rOHMubnNtL25zL25zLWxvY2FsLWZvcndhcmRlci1kZWF0aC9wb2QvbnNlLWtlcm5lbC02ZDY0Nzc4NzVmLWd0Z2Q3IiwiYXVkIjpbInNwaWZmZTovL2s4cy5uc20vbnMvbnNtLXN5c3RlbS9wb2QvZm9yd2FyZGVyLW92cy1iOTcybiJdLCJleHAiOjE2OTQ1MzgxODJ9.V7oLhExCJrFkTwZYIG-LASq5vMT4HpMW-szmg2S5HVMJ8Qyww8cRK0_sTtX7bBQz9_uRnWnJw5Pb8TViuRW4jw"}}},"state":0},"mechanism_preferences":{"0":{"parameters":{"+inodeURL":"file:///proc/799735/fd/13","+name":"nsm-1","-supportsVlan":"true"}}}}
Sep 12 16:53:02.802�[36m [INFO] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] �[0m(23.1) /proc/sys/net/ipv4/ping_group_range was set to 0 2147483647
Sep 12 16:53:03.819�[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.0/32] [Flags:0] [Gw:<nil>] [Scope:link] [duration:127.581µs] [link.Name:nsm-1] [netlink:RouteReplace] [type:networkService] �[0m(21.1) completed
Sep 12 16:53:03.819�[37m [DEBU] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [Dst:172.16.1.0/32] [Flags:0] [Gw:<nil>] [Scope:universe] [duration:85.979µs] [link.Name:nsm-1] [netlink:RouteReplace] [type:networkService] �[0m(21.2) completed
Sep 12 16:53:03.824�[37m [TRAC] [id:81d3be5c-cad4-4811-a6ef-5c4da4306c30] [type:networkService] �[0m(11.1) request-response-diff={"mechanism":{"parameters":{"inodeURL":"inode://4/4026532527"}}} Sep 12 16:51:38.061�[31m [ERRO] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [dstIP:172.16.1.0] [srcIP:172.16.1.1] [type:networkService] �[0m(7.1) No packets received
Sep 12 16:51:38.061�[33m [WARN] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [heal:eventLoop] [type:networkService] �[0m(7.2) Data plane is down
Sep 12 16:51:38.061�[37m [DEBU] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [heal:eventLoop] [type:networkService] �[0m(7.3) Reconnect with reselect
Sep 12 16:51:38.061�[37m [DEBU] [id:alpine-b626de3e-acbb-4b4e-9c31-a81a43ca7f2b-0] [heal:eventLoop] [type:networkService] �[0m(7.4) received eventIn: <nil>, err: rpc error: code = Canceled desc = context canceled |
@ljkiraly Hello, it looks like there is a problem with closing a connection. I tested it the next way: I modified NSC to do:
resp, err := nsmClient.Request(ctx, request)
if err != nil {
logger.Fatalf("failed connect to NSMgr: %v", err.Error())
}
logger.Infof("successfully connected to %v. Response: %v", u.NetworkService(), resp)
time.Sleep(time.Minute)
_, err = nsmClient.Close(ctx, resp)
if err != nil {
logger.Infof("Error while closing a connection to %v. %s", u.NetworkService(), err.Error())
}
time.Sleep(time.Second * 15)
resp, err = nsmClient.Request(ctx, request)
if err != nil {
logger.Fatalf("failed connect to NSMgr: %v", err.Error())
} NSC fails to do the second request because there is an error in
If on the second request I use a new request, it passes without any errors on newID := "id-1"
request1 := &networkservice.NetworkServiceRequest{
Connection: &networkservice.Connection{
Id: newID,
NetworkService: u.NetworkService(),
Labels: u.Labels(),
},
MechanismPreferences: []*networkservice.Mechanism{
u.Mechanism(),
},
}
resp1, err := nsmClient.Request(ctx, request1)
if err != nil {
logger.Fatalf("failed connect to NSMgr: %v", err.Error())
} Could you please check all Closes in |
Related issue: networkservicemesh#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Move an orphan nsm interface to host namespace and delete it. Related issue: networkservicemesh/deployments-k8s#9778 Related PR: networkservicemesh/sdk-ovs#275 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Move an orphan nsm interface to host namespace and delete it. Related issue: networkservicemesh/deployments-k8s#9778 Related PR: networkservicemesh/sdk-ovs#275 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Move an orphan nsm interface to host namespace and delete it. Related issue: networkservicemesh/deployments-k8s#9778 Related PR: networkservicemesh/sdk-ovs#275 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Move an orphan nsm interface to host namespace and delete it. Related issue: networkservicemesh/deployments-k8s#9778 Related PR: networkservicemesh/sdk-ovs#275 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Move an orphan nsm interface to host namespace and delete it. Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: #9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Related issue: networkservicemesh/deployments-k8s#9778 Signed-off-by: Laszlo Kiraly <laszlo.kiraly@est.tech>
Expected Behavior
NSM should restore after forwarder death.
Current Behavior
Basic setup of NSM was used, containing forwarder-ovs . The connection between NSC and NSE is using local 'kernel' mechanism. After forwarder restart (local or remote) the connection does not restored (checked more then 10 minutes after forwarder restart)
Failure Information
No data path healing is expected, but the generic healing should work.
Steps to Reproduce
kubectl apply -k https://github.com/networkservicemesh/deployments-k8s/examples/ovs?ref=2ea1a26c7ddebd212224560e2706d9f1b03f140b
Failure Logs
local-forwarder-death.zip
The text was updated successfully, but these errors were encountered: