-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msg="GRPC error: rpc error: code = NotFound desc = node XXXXX was not found" #328
Comments
I am having the same issue. Docker version 18.06.2-ce tridentctl logs
kubectl describe pod that's requesting the pvc I think this may have something to do with an old install that did not clean up properly? How can we completely remove Trident to try again? I have tried clearing out the trident entries in /var/lib/kubelet and in /var/lib/trident. but to no avail so far. |
I am using CentOS 7 deployed through kubespray.
Cheers.
…On Tue, Jan 14, 2020, 17:04 Balasubramanian Ramesh Babu < ***@***.***> wrote:
@titansmc <https://github.com/titansmc> and @kmwm3
<https://github.com/kmwm3> can you share some more info on your k8s
environment? Are you running vanilla k8s? What's the underlying OS on your
underlying nodes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#328?email_source=notifications&email_token=AB6QAYGBBWBFRPJBJWLPPV3Q5XO7LA5CNFSM4KFETPN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5E5SA#issuecomment-574246600>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB6QAYHOU7ECZAWPWTQFEZDQ5XO7LANCNFSM4KFETPNQ>
.
|
i have same issue the problem is trident did not get my cluster node asset through log it only join part of cluster node... so pvc only mount on specific node, else all failed... time="2020-02-04T09:18:10Z" level=debug msg="Authenticated by HTTPS REST frontend." peerCert=trident-node |
@teramucho, Kubernetes calls Trident's API to add the node once it is successfully registered. If a node in the cluster isn't added to Trident then that node may not have properly registered. Check the Trident node and driver registrar sidecar logs for errors. Also, check the kubelet logs. If this doesn't resolve your issue please contact NetApp Support. |
All, a fix was just merged to address a situation where K8S DNS is not configured properly which can lead to the error as reported in this issue. Trident patches that contain the fix will be released in the near future. Thanks for your patience. |
This issue was fixed with the Trident 20.01.1 release. |
@gnarl Still got the issue on one of our clusters: $ tridentctl -n trident get backend
+------------------+----------------+--------------------------------------+--------+---------+
| NAME | STORAGE DRIVER | UUID | STATE | VOLUMES |
+------------------+----------------+--------------------------------------+--------+---------+
| <redacted> | ontap-nas | <redacted> | online | 1 |
+------------------+----------------+--------------------------------------+--------+---------+
$
$
$ tridentctl -n trident version
+----------------+----------------+
| SERVER VERSION | CLIENT VERSION |
+----------------+----------------+
| 20.01.1 | 20.01.0 |
+----------------+----------------+ Trident cant find a few of the nodes in the cluster: time="2020-04-25T14:28:49Z" level=error msg="Node info not found." node=node020
time="2020-04-25T14:28:49Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node020 was not found"
time="2020-04-25T14:28:49Z" level=error msg="Node info not found." node=node020
time="2020-04-25T14:28:49Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node020 was not found"
time="2020-04-25T14:28:50Z" level=error msg="Node info not found." node=node018
time="2020-04-25T14:28:50Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node018 was not found"
time="2020-04-25T14:28:50Z" level=error msg="Node info not found." node=node018
time="2020-04-25T14:28:50Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node018 was not found" Any ideas what to try to get them up and running? These machines were correctly connected before. Now we reinstalled the cluster (as training for new ops) and then the nodes dont get added anymore. |
Is there a latest update on this issue. |
I have this problem to running OCP4 80% of the nodes are working the other 20% fails. ./tridentctl version Server Version: 4.4.9 The node is missing because the Trident object was not created |
Hi @presidenten, @ramancde, and @bigg01 we've investigated the issue and have not been able to reproduce it. If you see the issue again please contact NetApp support and provide Trident logs so that we can determine what is causing the issue. |
There are two likely scenarios why Trident does not find a Kubernetes node. It can be because of a networking issue within Kubernetes or a DNS issue. The Trident node daemonset that runs on each Kubernetes node must be able to communicate with the Trident controller to register the node with Trident. If networking changes occurred after Trident was installed this problem may only be observed with new Kubernetes nodes that are added to the cluster. |
This matches the kind of issue I am facing. Only newly added nodes won't register with the trident. I tried restarting the trident pods, tried removing/adding the impacted nodes but nothing helps. There have been no networking changes on the cluster and I don't see any networking/DNS related issues on the cluster. Any pointers on how I can investigate this further? |
The same error about not finding the node (not registered with Trident controller) seems to happen with K8s 1.17 and Trident 20.07 when the Autoscaler of Kubernetes adds a node to bring a pod in - the PV for the pod doesn't get added as a consequence, and the Pod is Pending. |
As indicated above we haven't been able to reproduce this issue yet. Please open a case with NetApp support so that we can collect additional information. To open a case with NetApp, please go to https://mysupport.netapp.com/site/.
|
In my case, it turned out to be an issue with DNS on some nodes, trident-csi pod running on some nodes could not resolve trident-csi.trident service hence could not register the node. |
@khatrig thanks for updating this issue. |
For everyone that encountered this reported issue it was determined that either a DNS or a networking issue kept the Trident node DaemonSet from registering with the Trident controller. Commit 8e51987 improves the Info log message to help the Trident user resolve this registration issue. |
Describe the bug
Following the basic example in the documentation fails to attach the volume to the Pod.
Environment
Provide accurate information about the environment to help us reproduce the issue.
Docker
k8s version
To Reproduce
Follow the basic example
Expected behavior
attach the created volume to the Pod
Additional context
I also see in the logs errors related to iSCSI, which I believe we are not using.
The text was updated successfully, but these errors were encountered: