-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats memory usage can probably be improved quite a bit #3585
Labels
Milestone
Comments
mattklein123
added
enhancement
Feature requests. Not bugs or questions.
help wanted
Needs help!
labels
Jun 11, 2018
This was referenced Jun 22, 2018
Merged
@ambuc here's the potato, let me know if you want to hold it. |
There's a symbol table PR out here: #3927. |
htuch
pushed a commit
that referenced
this issue
Jan 30, 2019
…olTable API without taking locks. (#5414) Adds an abstract interface for SymbolTable and alternate implementation FakeSymbolTableImpl, which doesn't take locks. Once all stat tokens are symbolized at construction time, this FakeSymbolTable implementation can be deleted, and real-symbol tables can be used, thereby reducing memory and improving stat construction time per #3585 and #4980 . Note that it is not necessary to pre-allocate all elaborated stat names because multiple StatNames can be joined together without taking locks, even in SymbolTableImpl. This implementation simply stores the characters directly in the uint8_t[] that backs each StatName, so there is no sharing or memory savings, but also no state associated with the SymbolTable, and thus no locks needed. Risk Level: low Testing: //test/common/stats/... Signed-off-by: Joshua Marantz <jmarantz@google.com>
danzh2010
pushed a commit
to danzh2010/envoy
that referenced
this issue
Jan 31, 2019
…olTable API without taking locks. (envoyproxy#5414) Adds an abstract interface for SymbolTable and alternate implementation FakeSymbolTableImpl, which doesn't take locks. Once all stat tokens are symbolized at construction time, this FakeSymbolTable implementation can be deleted, and real-symbol tables can be used, thereby reducing memory and improving stat construction time per envoyproxy#3585 and envoyproxy#4980 . Note that it is not necessary to pre-allocate all elaborated stat names because multiple StatNames can be joined together without taking locks, even in SymbolTableImpl. This implementation simply stores the characters directly in the uint8_t[] that backs each StatName, so there is no sharing or memory savings, but also no state associated with the SymbolTable, and thus no locks needed. Risk Level: low Testing: //test/common/stats/... Signed-off-by: Joshua Marantz <jmarantz@google.com>
fredlas
pushed a commit
to fredlas/envoy
that referenced
this issue
Mar 5, 2019
…olTable API without taking locks. (envoyproxy#5414) Adds an abstract interface for SymbolTable and alternate implementation FakeSymbolTableImpl, which doesn't take locks. Once all stat tokens are symbolized at construction time, this FakeSymbolTable implementation can be deleted, and real-symbol tables can be used, thereby reducing memory and improving stat construction time per envoyproxy#3585 and envoyproxy#4980 . Note that it is not necessary to pre-allocate all elaborated stat names because multiple StatNames can be joined together without taking locks, even in SymbolTableImpl. This implementation simply stores the characters directly in the uint8_t[] that backs each StatName, so there is no sharing or memory savings, but also no state associated with the SymbolTable, and thus no locks needed. Risk Level: low Testing: //test/common/stats/... Signed-off-by: Joshua Marantz <jmarantz@google.com> Signed-off-by: Fred Douglas <fredlas@google.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#3508 is about not having to allocate stats memory as an NxM block. But really the much bigger prize here is to use a lot less stats memory by not storing gigabytes of repeated strings. Why we want the fully elaborated stat name in memory at all?
If we represented stats structurally with a format string with named variable substitutions, e.g.
prefix.$var1.keyword.$var2
and variable assignments:
var1="xxxx"
var2="yyyy"
The static strings there "prefix.$var1.keyword.$var2" and "var1" and "var2" would not be needed in dynamic memory at all but could live in the code text as a static const char[] or one of those lazy-static-initialized structs of strings. All we'd need to keep in dynamic memory are the substitutions xxxx and yyyy in that case. And I think typically many stats would have the same valued substitutions which could share the same memory. This would be a little complex as a shared-memory block, depending on whether we need to free and reallocate within the shm-block.
I was thinking about this because we are starting to count stats memory in the gigabytes (reference earlier bug (#3463) where uint32 wasn't enough for byte offsets for @ggreenway). Most of this memory is for strings, most of which are really variations on common patterns. It'd be nice to ultimately use that memory for data cache.
I think this would speed things up too -- I was kind of going in that direction in my earlier optimizations (I got a working prototype based on structured substitutions instead of regexes) but I found I was able to get enough of the startup speed improvements with hacks to skip regex lookups, but it'd be faster still if we just used a better rep in the first place. But the main goal here is scalability.
Of course RawStatData could have a name() method which could elaborate a string for printing for debug or whatever, but I think many structured stat sinks would benefit from having the structure be explicit in the representation. Of course we'd have to make maps of stats know about this structure to avoid hanging onto the elaborated string data.
A simpler variation on the above was suggested by @htuch is to store arrays of "." separated tokens, each of which could be part of a symbol table. This would be simpler to integrate but still require complex regex-based tag-token extraction.
A few questions remain about how to do this, especially in the context of hot restart, but opening this issue now to collect discussion.
The text was updated successfully, but these errors were encountered: