-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in Delta XDS if number of resources is more than 10 #875
Comments
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
There is a PR open so I wouldn't call it stale. |
Some additional context. Why the deadlock appeared after #752?The PR replaces Why there is no deadlock in SotW?SotW has 2 different implementations for xds and ads. ADS has How the deadlock can be solved in Delta?
Probably there are some other options I haven't thought of. @valerian-roche @jpeach @alecholmez please take a look, I'd appreciate any input here, and I'd be happy to contribute the solution. |
Hey, sorry for the delay, I have not had much capacity to work on this recently. I plan on spending some time on the control-plane in Q2, though likely not in the coming weeks. If you want to take a stab at an implementation I can review it. |
Thanks for the reply @valerian-roche!
Just checking my understanding, does it mean the
So that each "not explicitly defined" type adds a |
Hey, I'd like to not go in this direction. The old model of Prior to the ads change delta used to fork a goroutine for each watch. I believe this can be reused if the resource of the watch is not a standard xds resource considered within the main channel buffering. This would likely require having another muxed channel for those watches, as queuing in the other one could lead to deadlock, but adding an entry on the select should not create an issue. Overall I expect the change to be:
|
If my understanding is correct, having another muxed channel for those watchers with the "default" capacity can still lead to deadlocks. Imagine two things happening simultaneously:
If the Snapshot contains more resource types than the "default" capacity, the That's why I brought up the |
@valerian-roche what do you think? Does it make sense to go with |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
Not stale, still the case |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Not stale, still the case |
In Kuma we're implementing the
ResourceSnapshot
interface and our snapshot has more than 20 distinct resource types.After upgrading to
v0.12.0
we see deadlocks on the server and it seems like it was caused by #752. AFAICT the potential deadlock was fixed by increasing the channel capacity, but it's not fixed for us since our snapshot has more resource types.I'm looking for help or advice on what'd be the best way to fix it? It doesn't seem right that
server
depends on the number of resource types in the snapshot to work properly.The text was updated successfully, but these errors were encountered: