Tiered caching, off-heap cache support #1211

Bukhtawar · 2021-09-03T11:34:45Z

Describe the bug
The existing caches are based out of heap which suffers from

The memory overhead of objects is very high, often causing the retained size of the data to overflow.
Java garbage collection becomes slow as the in-heap data increases(restricted by 10%).

Proposal
As a result of these factors using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure. Furthermore, this cache will stay warm even if the service is restarted, whereas the in-process cache will need to be rebuilt in memory (which likely means terrible initial performance).

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Bukhtawar · 2021-09-03T11:35:57Z

Request admins to remove the BUG label, missed categorizing it

ben-manes · 2021-09-19T03:37:39Z

The custom CacheBuilder is incredibly slow. A simple concurrency benchmark shows it to be worse than a synchronized LRU cache.

Benchmark	OpenSearch	LinkedHashMap	Caffeine
read-only	2.5 M/s	7.4 M/s	130 - 150 M/s
read-write	2.3 M/s	7.7 M/s	100 - 115 M/s
write-only	1.6 M/s	7.7 M/s	50 - 65 M/s

The hit rate of this cache is not very good, e.g. search workloads typically are frequency-biased.

Benchmark	OpenSearch	Caffeine	Optimal
WS1 @ 2M	6.09 %	23.06 %	41.28 %
WS1 @ 4M	21.60 %	41.32 %	57.80 %
WS1 @ 6M	45.74 %	55.02 %	65.85 %

Regardless of how you evolve the caches, ideally they should not themselves be a bottleneck.

Bukhtawar · 2021-09-19T07:04:39Z

Thanks @ben-manes would you be able to share details on the set-up, the numbers are certainly interesting.

ben-manes · 2021-09-19T07:56:45Z

This jmh benchmark uses a Zipfian distribution to emulate hotspots in the request pattern. That references ElasticSearch, but the code has not been changed in this fork. A server-class run shows that Caffeine can scale linearly.

The simulator provides a hit rate analysis, where the WebSearch traces can be downloaded from UMass. Of course different workloads favor different policies, so this summary of the most interesting variety demonstrates a policy can be robustly a top performer. The research papers include more scenarios.

anasalkouz · 2021-10-04T23:23:47Z

Hi @Bukhtawar, are you actively working on this? if yes, please assign it yourself.

andrross · 2021-11-01T17:55:46Z

It's pretty clear that in isolation the cache used in OpenSearch is not particularly performant. I think the next steps here are to benchmark the performance in the full system, ensure we're capturing all the right cache-level metrics, and then prototype cache replacements and measure the impact. We're probably not going to pick this up in the short term but if there is more data suggesting that this is a more pressing problem please do provide that information here.

ben-manes · 2021-11-01T18:31:35Z

@maosuhan opened elastic/elasticsearch#69646 which indicated a bottleneck due to this cache. It is not a theoretical problem, but there isn't a lot of details for how to reproduce so as to not blindly jump to the conclusion that the cache should be improved to resolve that lock contention.

Bukhtawar added bug Something isn't working untriaged labels Sep 3, 2021

minalsha added enhancement Enhancement or improvement to existing feature or request distributed framework and removed bug Something isn't working labels Sep 7, 2021

anasalkouz removed the untriaged label Sep 14, 2021

andrross mentioned this issue Nov 11, 2022

[Searchable Snapshot] Design file caching mechanism for block based files #4964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tiered caching, off-heap cache support #1211

Tiered caching, off-heap cache support #1211

Bukhtawar commented Sep 3, 2021

Bukhtawar commented Sep 3, 2021

ben-manes commented Sep 19, 2021

Bukhtawar commented Sep 19, 2021

ben-manes commented Sep 19, 2021

anasalkouz commented Oct 4, 2021

andrross commented Nov 1, 2021

ben-manes commented Nov 1, 2021

Tiered caching, off-heap cache support #1211

Tiered caching, off-heap cache support #1211

Comments

Bukhtawar commented Sep 3, 2021

Bukhtawar commented Sep 3, 2021

ben-manes commented Sep 19, 2021

Bukhtawar commented Sep 19, 2021

ben-manes commented Sep 19, 2021

anasalkouz commented Oct 4, 2021

andrross commented Nov 1, 2021

ben-manes commented Nov 1, 2021