-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta ADS with simple cache not working for ECDS #612
Comments
@haoruolei It seems your issue is related to #613 |
Thanks @AmitKatyal-Sophos I took a look at your issue and I think it's the same issue as mine. |
Could you please try out the fix and see if it solves your problem ? |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Hi, I finally got a chance to reproduce and debug this issue. By adding and analyzing the logs in go-control-plane, I can confirm that the root cause is the same as #613. In short, the resourceVersions map in server stream state was not updated correctly due to race condition and subscriptions to resources could be lost. However, my testing showed that the fix #615 is problematic. Please see below:
The above logic in fix 615 only accounts for cases where the version is updated for a resource. At least two scenarios are missing from my testing: Given the above analysis, I propose that we revert #615. With #559, updating resourceVersion map by overriding has no issue because possible info loss was already taken care of when creating the delta response. |
Hi @haoruolei Thank you for your update. I am looking at reverting #615 as I believe this is breaking the logic in some cases by having the version map no longer matching what was actually returned. Following #559 this workaround is no longer needed and even harmful as it is no longer reflecting additions and removal of resources within the version map. You mention that you have tests that are failing with it. Can you confirm if the same tests are working on 0.11.0 or if there are other cases I need to address? |
Hi @valerian-roche Thanks for taking this up. Per my testings, 0.11.0 alone works fine and I don't see any issues. |
Hi envoy go-control-plane,
We recently migrated our project from SOTW ADS to Delta ADS, and we also added support for ECDS. In production, we observed some abnormal behaviors. Posting here to see if anyone has information or can help to investigate. We have several instances with the same setup for management server and envoy. The management server provides the same resource in the snapshot. The only difference we notice is the start-up time(about several minutes apart). We saw the following:
We were not able to get more info from the log but the guess is there may be some concurrency issue. Any help will be appreciated. Thanks
The text was updated successfully, but these errors were encountered: