-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent DNS Failure with Kong Gateway and multi-zone kuma service mesh #2829
Comments
|
No, I do not see anything strange anywhere else in the logs. I also tried some monitoring with
either it is annotating the services:
or this
or this:
Also, I have tried to do a complete reinstall of the entire mesh and the services- but same result. I am stuck with this since |
I recently tried to change the Gateway to nginx, but the results are same:
Happens intermittently, like once in an hour, and fixes automatically in around a minute. |
This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. |
This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. |
This looks like this issue that was fixed in 1.3.1: #2756 |
I'm going to mark this as duplicated but please reopen if you disagree! |
I have similar issue and not sure what is the fix. Can you help me what was the fix Kuma Service mesh We have 6-zone clusters but we notice these error only in one cluster which may have about 600+ pods..
|
@sravanakinapally can you confirm this is intermittent? Could you show the log of the dataplane during this time? I'm wondering if we can maybe track this down to something on the Envoy config. |
@sravanakinapally Hey. Do you have any updates for us on this? Is this still affecting you? |
@sravanakinapally - any updates here? |
@slonka let me know if you need more logs This was intermittent but it's happening often now we upgraded the cluster to
|
|
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. |
@sravanakinapally could you update to the newer version of the kuma, and get back with information if it still happens? I would like to push this forward, and resolve the issue if necessary. |
@bartsmykla yes, this is still happening. I am working with John H on this. |
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. |
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. |
@johnharris85 did we figure out what was wrong here? |
I'm not sure if these issues are 100% related but they do look similar. A short summary is that we've had some problems with Kong's DNS client. There's been a quick-fix in 3.5.0 and we've merged also a broader fix but it's still to be released. More info here: Kong/kong#9959 (comment) |
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. |
that honestly looks like #8301 that was fixed in 2.5.0 and backported to patch versions of earlier minors. |
@arjunsalyan can you check if this still happens on a recent version? |
pinging @arjunsalyan again |
Sorry guys, we no longer have the setup on which we had the issue earlier and so there is no way for me to test or reproduce this. We can close this ticket if similar issues have been addressed. |
Summary
I have a three cluster setup on GKE:
Zone A has kong gateway and ingress controller installed along with some other services. To expose the services on Zone B I create an external name on zone A pointing to the service on Zone B and then create an ingress for it.
All works fine, except that intermittently (4-5 times in a day) kong throws this error:
Here
services-externalname
is the name of the external name service on Zone A andfe.dev.svc.80.mesh
is the Kuma DNS address for the service running on zone B.And then this appears:
It stays unhealthy for a minute or so, and then automatically returns back to healthy. During this period kong is not able to serve this and throws the error when trying to access through gateway:
Failure to get a peer from the ring balancer
Steps To Reproduce
I am just listing down things I did:
1.19.12-gke.2101
Additional Details & Logs
I have tried to follow all steps from the documentation. Did I do a mistake, or is something we need to fix?
The text was updated successfully, but these errors were encountered: