-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: Consider communicating stats across hot-restart via RPC rather than shared memory #4974
Comments
This seems related to #3804. Having a stats RPC protocol that enforces backwards compatibility (e.g. using proto3) should address the hot restart problem, i think. |
\begin{strikethrough} Either way, there will be a window of time where some stats will get dropped. Any other advantages of using RPC for this? |
Ignore previous comment; I just realized there's already a ton of RPC going on in the hot-restart transfer, so layering some stats data in there is pretty incremental, and using a file is not going to simplify anything. Also some shared-memory locks are still needed per https://blog.envoyproxy.io/envoy-hot-restart-1d16b14555b5 |
@jmarantz yeah was going to reply with similar. We already do periodic "get parent stats." IMO it's actually a pretty incremental change to just get all the stats during this interval (the only complexity will be it will require pagination). IMO we should just get this done, because the other stats stuff has become so complicated that it dwarfs the complexity of changing how we do hot restart stats, and then we can delete a ton of code and focus on a single impl. |
In your hot-restart blog post you indicated the importance of keeping gauges accurate; e.g. outstanding connection count. It seems to me that for that specific stat, the best strategy might be not to transfer the gauge at all, as it really would be accurate for the process if just started fresh and didn't get adjusted based on the old process. Is that right? For others that we might want to persist, should we just delay the paginated transfer of stats from the old process to the new one until the old one has retired all outstanding requests? The new process will be accumulating its own stats, but they could just be combined with the transferred stats as they stream in, so nothing will be lost. |
The way it works currently is that every This will work very well and relatively easily for all stats sinks that use flushing. For any pull/admin based stats, it's a little more complicated. IMO the admin code should keep a shadow copy of the parent stats, and on-demand add them to the local stats. I think this will be the easiest thing to do and anyone that is not hot restarting won't pay any extra for this. |
I meant to ping you when you are active on slack and i am free... I didn't totally understand all of the above. But let me just ask some questions:
|
This is configurable on the CLI. I have a use-case (involving TCP proxying) where I have it set to multiple hours, because there's no graceful way to tell a tcp-proxies connection that it's time to close soon.
Given the above (moderately long time windows), I think the stats should be sent periodically (probably with configurable period), but no slower than 1/minute. |
@ggreenway thanks -- then I think we'd need to atomically zero the old-process stat values as we transmit them to the new process, to avoid double-counting them? Or alternatively we could keep the old-process stat values separate in the new process and only combine them when the old process is allowed to die. Other ideas? |
Yeah, something along those lines. And you'd need to re-zero everytime you send an updated value from the old to new. Or track the last value that was transmitted so you can compute deltas. Possibly use |
Zeroing in the old process, via https://en.cppreference.com/w/cpp/atomic/atomic/exchange, feels like a good approach as it doesn't require using any additional per-stat storage in the new process, and would work for gauges, and the stats-sink/admin-access to stats flow wouldn't need to change. @mattklein123 wdyt? This diverges a bit from what you were suggesting. |
Per @ggreenway this is configurable and can be a very long time. This is why we currently have the concept of a flush interval (which defaults to 5s). During every flush interval, we flush stats to output sinks in the primary process only, but before we do this we ask the parent process for certain stats, as well as merge histograms. Its during this configurable flush interval that I am suggesting asking the parent process for all stats.
IMO during every configurable flush interval, as we do today.
Yes, I'm referring to people that read stats via the admin interface, whether via Prometheus, JSON, etc.
Sorry this is my own language. I call the newest hot restart process the primary, the one that came before the parent, the one that came before the grandparent, etc. Does that makes sense?
For gauges, I think we just send the entire gauge value over, and we can add them in the primary process. As I said above, I think we need to audit all usages of guages that call For counters, I agree with @ggreenway that we can just use the pending increment. Since the parent process stops flushing, we can simply latch the pending increment when the primary asks the parent for its stats, and send the increment. This can be added to the latched increment and total in the secondary. (One small point is whether you send the old counter total to the new process on hot restart. This would be nice, but is something we can discuss). |
I discussed this with @jmarantz this morning, and looked into the code. I see where pending_increment_ fits in and how it would be useful. Having the child keep a periodically updated shadow copy of the parent's gauges, which it combines with its own when presenting to the outside world, makes sense too. All of that makes sense as things to be done for admin/pull-based stats. What isn't clear to me: what needs to be done for stat sinks? From the discussion ("We already do periodic 'get parent stats'"), it kind of sounds like they already handle hot restarts exactly in the way being discussed here.
I think sending the old counter total over should be straightforward. If counter communication is built around using pending_increment_ to send "how many more have there been since last flush/poll", then since the child will start off with an implicit 0 for all the parent's counters, we can just have the first "increment" the parent sends actually be the whole count.
I got briefly started on this. There are ~100 non-test uses of set() on gauges. I gave them a quick lookthrough, and it's kind of a grab bag. I see some that are counts/totals that would clearly be legitimate to sum, some that appear to be meant as booleans, some (like hot_restart_epoch) that are meaningful numbers but not summable, and some that are just completely opaque, like hashes. I'm tempted to suggest introducing a new stat type for these non-numerical opaque states, since something like a hash seems like an abuse of the gauge mechanism - for instance, it wouldn't even make sense to try to export a hash to Prometheus as a Prometheus gauge, right? So, they're not even being used as "actual" gauges. That said, a new type wouldn't completely solve the current problem: integer versions like hot_restart_epoch might reasonably seem gauge-worthy, but aren't summable. |
Correct, nothing needs to be done. Basically the push/sink model is a) get stats from parent, b) add them to local stats, and push/flush.
+1, this sounds like a great solution. |
It's actually way fewer than 100; I think I might have read the wrong number from the code search thing I was using. It's more like ~30 different gauges using set(), and 15 or 20 look like they wouldn't make sense to sum. |
Yeah I was very surprised by the original number. This sounds more like what I was expecting. Still, I think your plan of having different more intentional stat types sounds perfect to me and I don't think would be very hard to implement. |
😊 haha yeah, whoops. (A link for completeness: I listed those uses over in #5790). |
Quick check-in before I get too far into implementing: it appears that there are currently just two things (memory_allocated and num_connections) that are communicated via RPC. So, essentially all stats - both admin and sink-handled - will need to be covered by what I'm building. Or, am I missing RPCs happening somewhere else? I see that the current RPC setup uses sendmsg/recvmsg. That is necessary to enable fd passing, yes? Additionally, it's kept nice and simple by assuming all requests and replies are always <= 4096 bytes, including type+len info. Here's what I'm envisioning. Communicate over a domain socket. In each direction, the protocol is a series of: The stats would be passed.... either in explicitly named protobuf fields, or in a statname->statvalue map. I'm not clear on whether e.g. It could instead be gRPC, although I think that's kind of overkill here. Of course, it's totally possible to do everything within the existing framework. But, I think it will be much easier to get the implementation right with a nice flexible protobuf with repeated/map fields, rather than manually packing arbitrary, variable-size stats into raw memory, always being careful to move onto a new message before overrunning 4096 bytes. |
Nope that's it. You can use the same app flow and just extend this sequence to transmit all stats between the parent and new process, so the changes should be mostly at the RPC layer.
+1, we have talked about moving this to proto for quite some time, and I think a "gRPC light" is the right way to go here. Very excited for this work!! |
Fixed! |
:) And thanks for the callout on Twitter: https://twitter.com/mattklein123/status/1123742661469454341 |
Yes, thank you; glad the change made you that happy! It was also a nice overview of the motivation and result, since I was entirely focused on the mechanical details the whole time, and not really seeing the big picture. |
Description:
[optional Relevant Links:]
https://blog.envoyproxy.io/envoy-stats-b65c7f363342
https://github.com/envoyproxy/envoy/blob/master/source/docs/stats.md
The text was updated successfully, but these errors were encountered: