perf(accesscontrol,schema): replace LRUExpire cache #247
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stacked on #246
We are investigating high memory usage due to
accesscontrol.AccessSet
andtypes.APISchemas
, which correlate with the objects being cached byaccesscontrol.AccessStore
andschema.Collection
respectively. Understanding how memory is actually distributed is hard, since the size of each object depends on the number of permissions granted to every user.Something I noticed is that the
LRUExpireCache
fromk8s.io/apimachinery
, which is being used in these packages, only removes items when explicitly requested, but does not "actively" purge expired values from the cache. It only removes them when necessary. In practice, considering that our code never deletes items from the cache (only replaces them), this means that the size of the cache will slowly grow until reaching the maximum configured, then replace expired items one by one as needed.In contrast, the same library also provides an
Expiring
implementation (without LRU mechanism), that does have a GC mechanism to prune all expired entries, not only one by one.One downside of this is that, since it does not keep a list of recent accesses, it is not possible to set a maximum cache size (my proposal here is to just log when that happens). Also, this makes it more important to set a sensible value for the entries TTL.
This makes this option not ideal, but I think it's still a reasonably good and immediate alternative, which does not involve adding a new dependency or adding or very own cache implementation (as too many exist already).
Also, I added a way to revert the cache to the previous implementation, by setting the environment variable
CATTLE_STEVE_CACHE_BACKEND
tolru
orLRU
.