-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trf: Add client traffic management to the CLI #3959
base: master
Are you sure you want to change the base?
Conversation
The new smuggling facility allows mgt to abuse the heritage process and sneak file descriptors to the cache process via a socket pair. This new trade route will allow the cache to escape its jail and get privileged file descriptors anyway. The smuggler's fence is hidden in plain sight, in the heritage facility, which explains such a smug attitude. Signed-off-by: Dridi Boukelmoune <dridi.boukelmoune@gmail.com>
It will be used by the file descriptor smuggler to pass nonces for listen sockets.
We use the Traffic VSM segment to pass the nonces to the cache process, and this how they synchronize in case of partial success. Though not likely during startup, it is still possible to fail the transfer of a file descriptor, for example when running into a limit. It will become less unlikely once this may happen at any point of the cache run time.
The interface offered for operations is a set of traffic.accept, traffic.refuse and traffic.status commands with similar semantics as start, stop and status. The child process will eventually listen to the traffic.start and traffic.refuse commands, but for now only a pseudo accept_traffic parameter is added in struct params. It will be used when tasks running on worker threads reach certain points to decide whether to keep processing requests or not.
It will be used internally to interrupt blocking accept(2) calls.
The accept tasks pool themselves in order to later block on the pool_accepting variable. There could be a more efficient scheduling policy, but once we stop accepting new traffic the workload will mechanically wind down and having one worker per pool per listen socket stuck in a sleep loop ought to be an acceptable trade off, just like during the cache startup. This is triggered by MGT via the cache's CLI, using the same traffic management commands. When the acceptor shutdown is triggered, worker threads tracked as busy accepting new connections are signaled with the no-op SIGUSR1. The CLI thread waits for the last busy worker to signal that nothing else is using the listen sockets, so they can be closed safely, getting rid of an unfortunate race condition.
Stop accepting new streams but allow ongoing transactions to complete gracefully. Since this is an error condition that allows the rx loop to continue, avoid sending a GOAWAY frame more than once.
Otherwise it leads to this sequence of events: - client sends request - request proceeds - client triggers timeout_linger - client session enters waiter - traffic.refuse - client sends request - client session leaves the waiter - request proceeds The last step is the one traffic.refuse is supposed to prevent.
It looks to me like I accidentally removed someone from the reviewer list. If it did, apologies, this was unintentional, I just wanted to add myself... |
You did not, all good. edit: you probably saw a reviewer suggestion that went away once you added yourself. |
I think this is a valuable prototype, but it does have a lot of "bolted on" aspects which I would prefer to avoid. As mentioned during bugwash, originally the Mgt/Child CLI pipe came close to being a socketpair(2) precisely so that fd's could be passed through, but mostly as a matter of saving development time it became a pipe(2) instead. I think the way I would like to see this progress is the following:
This regains the advantage of sockets surviving restarts.
The issue of naming sockets in VCL is not really different from naming stevedores, but we need to decide if a socket is "abstract" or if it is it's address, ie: if you can:
Also as mentioned, "traffic." sounds to me like the place bandwidth limiting and steering lives, and if we think we might go there, then maybe sockets should live there too. Otherwise I think socket. communicates better what goes on. |
We considered using the CLI to pass file descriptors and decided against. With a separate socket pair we can cleanly stash file descriptors in the cache process, and then engage with the actual operation via the CLI. This way the cache-main thread doesn't have to worry about both listening to the "plain" CLI stream and control messages for file descriptors. Nothing changes in VCLI.
Prior to this pull request, open/close happen in MGT and listen happens in cache. With this pull request that is still the case, except that:
I'm not sure how to interpret this one but today, despite having a copy of the file descriptors in MGT and the ability to "survive" restarts, we reopen them regardless. One advantage of moving the file descriptors to the cache process is that sockstat and other similar programs report that listen sockets are owned by the cache process, while the CLI sockets are owned by MGT, showing a more accurate picture.
While I very much agree with the idea of managing listen sockets dynamically, turning Why not first agree on a CLI definition for
No strong opinion on this one. As a reminder, this goes beyond closing sockets as we allow ongoing client traffic to complete and only bar new requests. I think we can still move from the all-or-nothing model to per-socket management without difficulty once we cross that bridge. Still, I would rather see us move incrementally there. |
bugwash:
|
Based on the discussions so far, here is the list of commands we would ultimately aim for. --->8----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8---
With no argument, this would be equivalent to When Opening an open socket would probably be a no-op. This is how When both If
With no argument, this would be equivalent to When Closing a closed socket would probably be a no-op. This is how
List all listen sockets, their status (open/close) and probably other things like reference counters. This would replace
The In the absence of a The optional
New sockets are added open by default.
Close and remove all listen sockets for the ---8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<--- The Once we reach consensus, I can turn this into a VIP. |
LGTM.
|
That was implied. I didn't want to focus on how we'd do this. For example for |
To begin my review of the proposed interface, I would like to recap the current
Path can now also be I see some shortcomings with this syntax:
With this in mind, I would like to discuss the proposed CLI syntax. And before anyone shouts "but
I am missing a clarfication where the existing PROTO and path (UDS) options go. Also I would like to take the opportunity of a new CLI definition to introduce an acceptor type (implementation), analogous to the stevedore implementation: We are not there yet and we do not have a precedent, but I am working on alternative acceptor interfaces. These would be implemented as extensions. We could add a registration interface similar to the stevedore interface, and the CLI could offer the option to select the implementation. All in all, I would propose this syntax:
So for UDS, the user, group and mode options would come after the path, separated by The protocol would gain options. "HTTP" would remain the default. In a first step, the only (and default) acceptor would be Some examples with this proposed syntax
Overall, I agree with these semantics, and worst case they could be changed more easily than the syntax. Only regarding the last point I wonder if it could not be a good idea to change parameters without ever closing a socket, so I would tend to a declarative "socket.open applies the settings also to already open sockets".
The way I understand the main intention of this PR, we still want to facilitate traffic control, despite having moved to socket operations. So the two phases of a socket shutdown discussed here are
For the latter, VCL or other code would need to query the status, , obviously. But should there not be a phase even before the socket closure to allow other cluster members, load balancers or DNS controllers to even more smoothly prepare for the shutdown? I would think that, for example, starting to send negative replies to health probes could be a good idea, like with this dummy vcl::
Thus, I propose a grace argument to
Upon reception of the command, varnish would set the socket to grace status but not yet close it and set a timer. When that expires, it would actually close the socket(s).
I do not understand why we need I would see a point, however, in having |
Thanks for the extensive review!
It's not so much a two-phase operation, but rather one active operation (closing) and then passively reacting to the closed session. When we close the listen sockets in this patch series, there are steps in the HTTP/1 and h2 state machines where we check the status to either prevent HTTP/1 keep-alive or send a GOAWAY frame. On a per-socket basis it would be the same, but the status check would be different (for example finding the I'm not arguing against VCL support for this, only saying that I don't expect it as a day-1 requirement and rather something we can easily add. I'm however arguing for having the core code check the status and prevent subsequent requests on sessions that have their listen sockets closed.
For what it's worth it's just a VAV, so both spaces and commas act as separators. I haven't delved into VAV code in a while but I believe we can dquote arguments. It of course comes with the inconvenience that your shell would require nested quoting, but I believe this is perfectly doable today.
By address here I mean either IP/host, path and abstract, maybe endpoint was a better word for this. The reason why there were no options in the first place was precisely If we want to change something about a listening socket (and again, this is probably not the priority and something that could land later) I would rather go with a separate command:
I'm thinking
We can even make the protocol a special case since it doesn't depend (yet?) on the endpoint type:
At the very least I don't like the idea of shoving options in the endpoint:
Likewise, I don't like the idea of using commas as separator in the CLI, we already have blanks to delimit arguments:
(but I do like the idea of per-socket acceptor tuning, like the backlog)
I suppose I answered this partly already, but I would like to add that this would be a similar dichotomy as To get the non-default, we can always add an option on the CLI side. The
Let's try one more time then:
Where the |
I would think VCL should always be in control. But that's an implementation question, this conversation is regarding a CLI specification and I agree that does not need to be in the day-1 implementation. At any rate, this was only setting the stage for proposing a grace argument to
OK, good point, I did not look at the implementation.
(edit: renamed The add/mod/open idea in combination with Also I am fine with named options instead of optional positional arguments. What is more important to me is that some points I raised still seem to not be reflected.
|
My bad, I took things out of order and forgot some.
You are referring to this, right?
This is not something possible today, right now this is one or the other. But I understand you have ambitions to use PROXY protocol for something other than HTTP. If that had to turn into a list of protocols, then why not? For example If we ever cross that bridge, I'd rather make PROXY protocol orthogonal to the application protocol. I worry that we may be looking a little too far ahead in this case.
Either we consistently give options a name like we do for UDS, or we grow new options:
The problem is that I don't understand. This patch series closes listen sockets upon |
So first of all, the example I chose was probably rather bad. Yes, having a list of protocols could be an option, but what I meant was that the PROXY protocol would take HTTP as an option. Maybe a better example could be At any rate, when we design an interface, I would like to see where such options have a place.
Can we please design an interface where it is clear what is an endpoint option, what is a protocol option and what is an acceptor option? It's OK if you do not like my optional positional arguments with comma separated options, but the above does not look like a valid alternative to me.
How would that work? Would some instance outside Varnish look after that? If yes, why not integrate it? |
So here's a concrete proposal:
|
We discussed this on IRC this morning and settled on a minimal set of commands, see the first revision in VIP 33. https://github.com/varnishcache/varnish-cache/wiki/VIP33:-Socket.*-CLI-commands |
I think the current revision of https://github.com/varnishcache/varnish-cache/wiki/VIP33%3A-Socket.%2A-CLI-commands could use polish of the socket.list output. It was important to @dridi to get away from the comma separated options, so I think the output should look something like this:
If we want to get away from an options separator and use whitespace, we will probably need to use quotes, as in this potential example:
In general, I would prefer if we could also have an agreement on the future interface design - with the option to change it for good reasons. I started an edit in https://etherpad.wikimedia.org/p/vip33 . Here's where I left off: First revision
This will be the initial behavior without arguments, and commands will be refined when optional arguments are defined. Future expansion of the interfaceThis interface is foreseen to evolve. The curent proposal for the future direction is:
See #3959 for discussions. |
Note from VDD: socket.add would probably take -t argument instead of -a and -p |
And after issuing or receiving a goaway, stop processing frames when there are no streams left. Initially submitted as part of listen sockets management. Refs #3959 Conflicts: bin/varnishd/http2/cache_http2_proto.c
And after issuing or receiving a goaway, stop processing frames when there are no streams left. Initially submitted as part of listen sockets management. Refs #3959
And after issuing or receiving a goaway, stop processing frames when there are no streams left. Initially submitted as part of listen sockets management. Refs #3959
And after issuing or receiving a goaway, stop processing frames when there are no streams left. Initially submitted as part of listen sockets management. Refs #3959
The interface offered for operations is a set of
traffic.accept
,traffic.refuse
andtraffic.status
commands with similar semantics asstart
,stop
andstatus
.With this patch series we may start refusing traffic while the cache process is running. This can be used to implement a graceful shutdown, or dynamic listen endpoints, but the scope here is limited to traffic management as in accepting or refusing new client traffic, in a broad sense.
This has been discussed informally in the past, but for the details on what refusing traffic means, here is what the manual/help say about the new commands:
To circumvent the jail security model, @mbgrydeland implemented a facility to pass file descriptors from mgt to cache. This is partly related to the CLI and security barriers discussion started in #3906 (comment).