-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Active listeners" not active if at least one is waiting for an SDS update #6904
Comments
For the record, This can be tracked in stats by looking at |
@PiotrSikora where do you see the listener marked as "active"? That might be a bug or something we can expose better in stats/admin/etc. The 2nd part AFAICT is WAI, at least as expected today. The way Envoy init works is that everything goes through the init process before listeners are started and actually accept connections. This is so that we make sure the server is fully initialized prior to accepting traffic. We could obviously relax this so that listeners start taking traffic as their dependencies are initialized, but this would be a non-trivial code change. Perhaps open an explicit enhancement ticket on this part? |
Those listeners are listed under
Is there a better place to look for the status? In any case, the
Ah, I incorrectly assumed that this was already the case, thanks! But wouldn't this be effectively the enhancement ticket? |
It's an artifact of how warming works pre and post init. I agree we can do better here. envoy/source/server/listener_manager_impl.cc Lines 825 to 830 in 1345d53
It's hard for viewers to look at a giant ticket like this and grok what is going on. Now that we understand both issues, I would suggest opening two details tickets for each sub-issue that we can then action on, as they are independent issues (mostly). |
Having listeners warm independently also better lines up with the goal of better handling of partial xDS update failure handling (e.g. having a partial RDS update be able to unblock a bunch of listeners). So, +1 to this. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions. |
This might happen if the listener is "added" but not at "replaced". If there is no draining listener at this port, it's nice to have the listen() before it's ready to server. The benefit is that the incoming connection are queued instead of rejected. Maybe we should give the listeners in this state a new name instead of "active". |
@PiotrSikora asked me to take a look, I go over the code and chatted with @silentdai , here're some thoughts (scope only to why
This is some discussion on original PR
@mattklein123 |
I think @asraa also bumped into something in this area the other day while developing a dynamic xDS fuzzer; there is some interaction between workers started and listener warming that can lead to a crash in early initialization phases. |
Yes! Sort of. I set up a crash by removing a listener before worker threads are started. I think this happens because:
I'll set up a separate issue detailing the test case. |
@asraa xDS fuzzer would be helpful! I am making changes to listener. Tests are breaking (including crash and expectations doesn't reach. Even with the crashing test It could be the test cases access the invalid address in the way that prod code wouldn't. |
I would need to go back through all this code. At a high level, I agree that during the first init/boot we can/should be using the warming/active lists. I think I probably did it the way that it is initially just because it was simpler for some reason, possibly due to global init manager processing. Unfortunately it's been long enough that I would have to go through and refresh my memory again. I'm happy to help review any changes in this area of someone wants to tackle. |
Thanks @mattklein123 for your feedback. I took another round of code scanning today, so far I haven't found anything that prevent us adding listeners to I'll try that approach, see what breaks, and get back to this thread if anything is unclear to me. |
I see the same same issue that global init manager is not yet initializing when the registered targets are ready. My solution is to add a It might be reasonable to make |
IIRC the way this is handled today is that the target should just immediately call the ready callback if it's ready. I believe this is done elsewhere. If that is not working I think we should debug why? |
envoy/source/common/init/target_impl.h Lines 71 to 72 in e1450a1
If the comment is true, InitManager should never knew if the ready() is invoked before. I can add another Target impl as so far I am the one who really need it. I believe this logic could simplify the existing code as well. |
I'm not sure that comment is correct. IIRC a target can just call ready() from within the initialize callback, which presumably it can do if it's already ready. cc @mergeconflict who recently worked on this code and should be able to comment. |
@mattklein123: @silentdai is right. If |
Sorry what I'm suggesting is to just immediately call |
Oh, my bad, I forgot to address your comment. That's one of the use cases I originally intended, and it does work (see e.g. this test), but I'm assuming that @silentdai has some use case where he has to call |
Back to the original issue, I talked with @silentdai and now have more understanding.
Not sure if we should proceed with my proposal above. If not, we still need to clarify in config_dump, have special logic to print active_listeners to not confuse users. @mattklein123 @PiotrSikora @htuch WDTY? |
We definitely have to block server start so we must use the global init manager. That can't change. However it would definitely be possible to make the code more complex and switch to using both the warming/active lists, and make sure that we take care of removal correctly during global init. This is definitely a bug and we should handle it. (This is the issue @asraa is seeing.) Personally, I would probably focus on two separate things here:
I would have to spend more time going back through the code to suggest specific fixes, but my advice is to iterate and let's come up with simple solutions to (1) and (2) above and go from there? |
@mergeconflict Yes, I have one use case. I have subscription callback(more precisely the filter chain update) as a dependency of listener warm up, which indicates it should register as a target. |
Closing this issue since the two problems proposed from this issue are both resolved.
|
Why is this WAI? |
It's WAI as the code was written today. Please open up a fresh issue on making this part better so we can keep things straightforward? |
Just want to make sure I understand what we agree on
1 makes real behavior changes, 2. only makes code/implementation easier to follow. |
Correct. If we want to track (1) please open a new issue on that. |
Description:
All listeners from LDS are marked as "active", but they don't even
listen()
if at least one of them is waiting for an SDS update.I believe there are 2 issues here, really:
Repro steps / Admin and Stats Output:
Start "fake" xDS server that will accept connection, but will never respond:
Start Envoy:
Verify that listeners from LDS are configured, and marked as active and not warming:
Try to connect to both listeners, discover that neither is listening:
Config:
Logs:
cc @mattklein123 @htuch @duderino @costinm @jplevyak @silentdai @philrud @howardjohn @wattli @quanjielin @JimmyCYJ
The text was updated successfully, but these errors were encountered: