Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching #6225

Open
aabukhalil opened this issue Feb 8, 2023 · 7 comments
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@aabukhalil
Copy link
Contributor

aabukhalil commented Feb 8, 2023

Coming from #4964 and based on concerns raised from #5641 (review)

More context is in #4964 (comment), #4964 (comment) and #5641 (comment)

So far we have evaluated using Guava/Caffeine (#4964 (comment)) cache but they don't support custom eviction logic (prevent evicting entries when they have active usage) and they are not open for extension. That's why #5641 has modified version of Guava/Caffeine cache. A new suggestion on how to use Caffeine cache came here so we need to re-evaluate using Guava/Caffeine again.

Apache JCS was suggested in #5641 and it look promising because it is more open for extension rather than modification but still I'm not able to find a straight forward path/ perfectly fitting implementation for our cache usage.

@aabukhalil aabukhalil added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 8, 2023
@aabukhalil
Copy link
Contributor Author

aabukhalil commented Feb 8, 2023

When evaluating which cache implementation to use or if to stick with in house implementation, we need to think about near future use cases (within 1 year). I can think of:

  • Will we give used ability to pin/prefetch IndexInput and/or make IndexInput always available locally ? if yes then cache should support enabling/disabling specific entries by calling an API of the cache
  • Will we support more advanced smart eviction logic ?

@aabukhalil
Copy link
Contributor Author

this is exactly how we can utilize Caffeine cache #5641 (comment)

@andrross
Copy link
Member

I put together a quick commit on my fork that shows how we can replace the hand-rolled LinkedDeque with a LinkedHashMap.

@reta
Copy link
Collaborator

reta commented Mar 13, 2023

I put together a quick commit on my fork that shows how we can replace the hand-rolled LinkedDeque with a LinkedHashMap.

Have you run the benchmarks (#6610) against LinkedHashMap alternative? Just curious what would be the diff

@andrross
Copy link
Member

andrross commented Apr 4, 2023

Have you run the benchmarks (#6610) against LinkedHashMap alternative? Just curious what would be the diff

@reta I put a PR together here with the benchmark results: #6968

@andrross
Copy link
Member

andrross commented Apr 4, 2023

Just want to provide some updates and more context here:

The overall functionality provided by this cache feature is that remote index data is pulled down from the object store in chunks (8MiB by default) and written to local disk, and searches are served using the chunks from the local disk. The total size of the disk available for use as the "cache" is fixed and defined by a node settings. When that total size is exhausted, unused chunks are deleted from the disk to make room for new chunks.

This is implemented (after #6968) by using two Maps:

  • data: A plain HashMap that holds pointers to all chunks on disk that are actively being used. These cannot be deleted as ongoing searches hold references to them. If the total size of these chunks is larger than the disk cache size, then the amount of disk space being used will exceed the cache size setting. The intent is that the local disk cache is likely to be much larger than the size of the chunks that can be actively used by concurrent searches at any given time.
  • lru: A LinkedHashMap that holds pointers to chunks on disk that are not actively being used and therefore eligible for eviction. When a request adds a new chunk to be used, then if we are exceeding the disk cache size lru will be iterated over and entries will be removed (and therefore deleted from disk) until we are under the limit.

All operations on the cache are guarded by a simple ReentrantLock that prevents concurrent access. To provide additional concurrency, a segmented approach is taken where Runtime.getRuntime().availableProcessors() number of caches are created, and each cache is set to use disk cache size / Runtime.getRuntime().availableProcessors() as its capacity.

In my opinion, the segmented approach adds complexity and is still vulnerable to hot keys. I think the main improvement we can make here is to get rid of the segmented cache and go to a single implementation with finer-grained locking. Best-effort semantics on the disk usage and eviction are likely okay, as over- or under-using the cache by a small percentage is probably acceptable. The current implementation is best-effort anyway as the size is enforced at each segment, and not the overall sum of the segments.

andrross added a commit to andrross/OpenSearch that referenced this issue Apr 5, 2023
The lock that guards `FileCache.compute` is per-cache-segment, and
therefore means unrelated keys can get stuck waiting for one another.
This refactors the code to do the download outside of the cache
operation, and uses a per-key latch mechanism to ensure that only
requests for the exact same blob will block on each other.

See [this issue][1] for details about the cache implementation. I think
it is possible to re-work the cache so that locking would be much more
precise and this change would not be necessary. However, that is a
bigger change potentially with other tradeoffs, so I think this fix is a
reasonable thing to do now.

[1]: opensearch-project#6225 (comment)

Signed-off-by: Andrew Ross <andrross@amazon.com>
@reta
Copy link
Collaborator

reta commented Apr 10, 2023

In my opinion, the segmented approach adds complexity and is still vulnerable to hot keys. I think the main improvement we can make here is to get rid of the segmented cache and go to a single implementation with finer-grained locking.

The complexity is certainly here and the hot keys could significantly reduce its effectiveness, +1 to look for simpler and more efficient way to (re)implement that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Status: Todo
Development

No branches or pull requests

4 participants