[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching #6225

aabukhalil · 2023-02-08T00:18:55Z

Coming from #4964 and based on concerns raised from #5641 (review)

More context is in #4964 (comment), #4964 (comment) and #5641 (comment)

So far we have evaluated using Guava/Caffeine (#4964 (comment)) cache but they don't support custom eviction logic (prevent evicting entries when they have active usage) and they are not open for extension. That's why #5641 has modified version of Guava/Caffeine cache. A new suggestion on how to use Caffeine cache came here so we need to re-evaluate using Guava/Caffeine again.

Apache JCS was suggested in #5641 and it look promising because it is more open for extension rather than modification but still I'm not able to find a straight forward path/ perfectly fitting implementation for our cache usage.

aabukhalil · 2023-02-08T23:31:21Z

When evaluating which cache implementation to use or if to stick with in house implementation, we need to think about near future use cases (within 1 year). I can think of:

Will we give used ability to pin/prefetch IndexInput and/or make IndexInput always available locally ? if yes then cache should support enabling/disabling specific entries by calling an API of the cache
Will we support more advanced smart eviction logic ?

aabukhalil · 2023-02-09T21:52:44Z

this is exactly how we can utilize Caffeine cache #5641 (comment)

andrross · 2023-03-10T23:30:50Z

I put together a quick commit on my fork that shows how we can replace the hand-rolled LinkedDeque with a LinkedHashMap.

reta · 2023-03-13T15:41:54Z

I put together a quick commit on my fork that shows how we can replace the hand-rolled LinkedDeque with a LinkedHashMap.

Have you run the benchmarks (#6610) against LinkedHashMap alternative? Just curious what would be the diff

andrross · 2023-04-04T05:29:56Z

Have you run the benchmarks (#6610) against LinkedHashMap alternative? Just curious what would be the diff

@reta I put a PR together here with the benchmark results: #6968

andrross · 2023-04-04T17:23:32Z

Just want to provide some updates and more context here:

The overall functionality provided by this cache feature is that remote index data is pulled down from the object store in chunks (8MiB by default) and written to local disk, and searches are served using the chunks from the local disk. The total size of the disk available for use as the "cache" is fixed and defined by a node settings. When that total size is exhausted, unused chunks are deleted from the disk to make room for new chunks.

This is implemented (after #6968) by using two Maps:

data: A plain HashMap that holds pointers to all chunks on disk that are actively being used. These cannot be deleted as ongoing searches hold references to them. If the total size of these chunks is larger than the disk cache size, then the amount of disk space being used will exceed the cache size setting. The intent is that the local disk cache is likely to be much larger than the size of the chunks that can be actively used by concurrent searches at any given time.
lru: A LinkedHashMap that holds pointers to chunks on disk that are not actively being used and therefore eligible for eviction. When a request adds a new chunk to be used, then if we are exceeding the disk cache size lru will be iterated over and entries will be removed (and therefore deleted from disk) until we are under the limit.

All operations on the cache are guarded by a simple ReentrantLock that prevents concurrent access. To provide additional concurrency, a segmented approach is taken where Runtime.getRuntime().availableProcessors() number of caches are created, and each cache is set to use disk cache size / Runtime.getRuntime().availableProcessors() as its capacity.

In my opinion, the segmented approach adds complexity and is still vulnerable to hot keys. I think the main improvement we can make here is to get rid of the segmented cache and go to a single implementation with finer-grained locking. Best-effort semantics on the disk usage and eviction are likely okay, as over- or under-using the cache by a small percentage is probably acceptable. The current implementation is best-effort anyway as the size is enforced at each segment, and not the overall sum of the segments.

The lock that guards `FileCache.compute` is per-cache-segment, and therefore means unrelated keys can get stuck waiting for one another. This refactors the code to do the download outside of the cache operation, and uses a per-key latch mechanism to ensure that only requests for the exact same blob will block on each other. See [this issue][1] for details about the cache implementation. I think it is possible to re-work the cache so that locking would be much more precise and this change would not be necessary. However, that is a bigger change potentially with other tradeoffs, so I think this fix is a reasonable thing to do now. [1]: opensearch-project#6225 (comment) Signed-off-by: Andrew Ross <andrross@amazon.com>

reta · 2023-04-10T13:12:28Z

In my opinion, the segmented approach adds complexity and is still vulnerable to hot keys. I think the main improvement we can make here is to get rid of the segmented cache and go to a single implementation with finer-grained locking.

The complexity is certainly here and the hot keys could significantly reduce its effectiveness, +1 to look for simpler and more efficient way to (re)implement that.

aabukhalil added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 8, 2023

This was referenced Feb 8, 2023

[Searchable Snapshot] Design file caching mechanism for block based files #4964

Closed

File System Caching for remote store #5641

Merged

kotwanikunal added this to Searchable Snapshots Feb 9, 2023

github-project-automation bot moved this to Todo in Searchable Snapshots Feb 9, 2023

anasalkouz added distributed framework Search Search query, autocomplete ...etc labels Feb 9, 2023

andrross removed the untriaged label Feb 10, 2023

anasalkouz added Migration:ReadyToPick and removed Migration:ReadyToPick labels Mar 17, 2023

andrross mentioned this issue Apr 4, 2023

Remove LinkedDeque and replace with LinkedHashMap #6968

Merged

6 tasks

andrross mentioned this issue Apr 5, 2023

Use per-key latch to wait on file downloads #7015

Closed

6 tasks

andrross mentioned this issue Sep 15, 2023

[Discuss] Design for Tiered File Cache & Block Level fetch #9987

Closed

peterzhuamazon added this to Search Project Board Dec 19, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching #6225

[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching #6225

aabukhalil commented Feb 8, 2023 •

edited

Loading

aabukhalil commented Feb 8, 2023 •

edited

Loading

aabukhalil commented Feb 9, 2023

andrross commented Mar 10, 2023

reta commented Mar 13, 2023

andrross commented Apr 4, 2023

andrross commented Apr 4, 2023

reta commented Apr 10, 2023

[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching #6225

[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching #6225

Comments

aabukhalil commented Feb 8, 2023 • edited Loading

aabukhalil commented Feb 8, 2023 • edited Loading

aabukhalil commented Feb 9, 2023

andrross commented Mar 10, 2023

reta commented Mar 13, 2023

andrross commented Apr 4, 2023

andrross commented Apr 4, 2023

reta commented Apr 10, 2023

aabukhalil commented Feb 8, 2023 •

edited

Loading

aabukhalil commented Feb 8, 2023 •

edited

Loading