-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requesting elaboration for HCN error 2151350293 (0x803b0015) #485
Comments
@tzifudzi there are two likely conditions that can result in HCN_E_REQUEST_UNSUPPORTED to get returned when deleting a Namespace:
Can you provide some more output from the logs before and after that message? That might help me narrow down exactly where this came from. |
Hey @grcusanz, thanks for the response! Regarding the request for additional logs, the error that we retrieved that error from was found in the containerd logs. Below is a larger snippet of the error:
I also tried looking through the WinEvent logs for HNS, but there wasn't anything that I saw of value. If there is any additional log that you would like me to fetch, please do let me know! |
@grcusanz Thanks for responding with early thoughts about this.
I think we can rule out having an invalid Guid as I confirmed the syntax is correct and also is shown in the logs @VincentVTran shared. The GUID is
Interesting theory. I think we can also rule this out based on your the fact that as you suggest we would have likely got the explicit message as described. Perhaps some HCS/HNS traces can help rule this out |
I had to make another reproduce environment, but was able to get the HNS log for the namespace delete request:
In this reproduction, the containerd error log is seen as below:
|
Also here is the namespace log using
|
After seeing the logs @VincentVTran shared it suggests that what you said when you mentioned below is likely whats happening.
@grcusanz Can you please confirm this based on the log output? In the above log you can clearly see a container still attached to the namespace.
When @VincentVTran co-debugged and tried to kill the container forcibly its still failing to delete the namespace and the container always shows up as attached. @VincentVTran Will be taking a deeper dive to find more clues about why the namespace fails to release the container attachment. As a future improvement it would be nice if HCN operation response returned something more explicit in the error message. |
Also when running |
Below is the HCS logs:
To confirm, the process does not exist anymore. Confirmed with |
Hi All, thanks for the additional information, in particular the HNS snippet. Sorry I was out for a few days which delayed my response. I traced the return path from the two places where this exception occurs and matched it with the message in the log. It looks like the detailed message does get lost as the exception works its way through the call stack, so if there is an active container you will still get a generic message in the logs. Given your further investigations this is almost certainly the case. Detach-HnsEndpoint should remove the container ID from the namespace, and then you should be able to delete the namespace. Try using that cmdlet to do the cleanup, then try to delete the namespace. Report back here with your findings. |
Hey @grcusanz , after cross-checking the endpoint IP with the remaining container IP - it seems like the HNSEndpoint for that container is already deleted. Using
When fetching the HNS-Endpoint using In contrast to normal working pods, the endpoints with the corresponding IP addresses appear. |
Also before the endpoint was deleted, it has the name of "Endpoint name "cid-932331a3-b265-4fce-b4c8-079e70982379". So when trying to rerun the
|
Hi grcusanz, do you have any feedback about the potential next steps for this issue?
|
Hi All, I've been experimenting with this and managed to figure a few things out. First, run this command to get detailed info on the namespace: For example:
As long as you see items in the 'Containers' section you will not be able to delete the namespace. Fortunately, there is a way to clean that up.
If the endpoint still exists when you run get-hnsendpoint, then delete it using remove-hnsendpoint. If the endpoint no longer exists, then follow this example to remove the reference from the namespace (replace <namespace id> and <endpoint id> with the appropriate guids):
The Result should say Success:true. Note if you re-run hnsdiag it may still show up there. At this point ignore it. Repeat for any other namespace ids that no longer exist. Now you should be able to remove the container references using a similar command (replace <namespace id> and <container id> with the appropriate Ids).:
Repeat for each container id. If all of the above is successful, then you should be able to re-run: Now you can delete the namespace: Let me know if that helps. It's a littl tricky to get right. If you get stuck, please share your output and the output of:
Thanks! |
Hi @grcusanz,
We are still not able to delete HNS namespace as can be seen in command output of step#4. We can also see that Step#2 was successfully in deleting container from hns namespace even though output shows failure. |
Hi @grcusanz , |
Hi @grcusanz, just follow up to see if you have any updates on this? Thanks! |
Hi @grcusanz, Is there any updates on this yet that you can share? Thank you! |
This issue has been open for 30 days with no updates. |
the issue still exists |
This issue has been open for 30 days with no updates. |
1 similar comment
This issue has been open for 30 days with no updates. |
this issue still exists |
This issue has been open for 30 days with no updates. |
This issue has been open for 30 days with no updates. |
Summary
I am requesting more details about the HCN error code
2151350293
(0x803b0015
). In a scenario I am facing, this error is returned during theHcnDeleteNamespace
operation when invoked by containerd via hcsshim.To help me troubleshoot an ongoing issue and perhaps also benefit the broader community,
Further detail
When rapidly scaling containers (up and down) on a Kubernetes Windows node running containerd, I am sporadically encountering errors while trying to terminate Kubernetes pods. The pods get stuck in terminating status while the containerd tries to remove the network namespace using the
HcnDeleteNamespace
operation. The error message in the logs is as follows:More context
Current theories being explored for which more error verbosity can help rule out
Environment
Similar Issues
These issues cite the same error code might benefit from this issue being responded to
Terminating
rancher/rke2#5551The text was updated successfully, but these errors were encountered: