-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use per-key latch to wait on file downloads #7015
Conversation
The lock that guards `FileCache.compute` is per-cache-segment, and therefore means unrelated keys can get stuck waiting for one another. This refactors the code to do the download outside of the cache operation, and uses a per-key latch mechanism to ensure that only requests for the exact same blob will block on each other. See [this issue][1] for details about the cache implementation. I think it is possible to re-work the cache so that locking would be much more precise and this change would not be necessary. However, that is a bigger change potentially with other tradeoffs, so I think this fix is a reasonable thing to do now. [1]: opensearch-project#6225 (comment) Signed-off-by: Andrew Ross <andrross@amazon.com>
@reta I'm interested in your thoughts here. There are performance implications that I hadn't considered in this discussion on a previous PR. |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Ah I see, we indeed would block on segment, not a key .... I honestly don't feel great about latches (in this particular case) for basically 3 reasons:
It seems like we need a reliable primitive to access the value behind the particular key since its computation takes time and we don't want to lock the whole segment. The idea I wanted to try out is to use I have not tried to prototype it yet, but wdyt? |
Gradle check is fixed now rerun at your will. |
@reta Yeah this is exactly right. I'm thinking the cached object itself should not be an IndexInput at all, but rather something capable of creating (or returning a previously created) IndexInput. It would need to know the length of the file (for the cache weigher function to work) and then provide a thread-safe mechanism for getting/creating the IndexInput (perhaps using a CompletableFuture). The signature changes might have a bit of a ripple effect but the overall change shouldn't be too complex. Does that sound right? |
Absolutely, it does |
Moved to draft until the idea in #7015 (comment) is implemented |
The lock that guards
FileCache.compute
is per-cache-segment, and therefore means unrelated keys can get stuck waiting for one another. This refactors the code to do the download outside of the cache operation, and uses a per-key latch mechanism to ensure that only requests for the exact same blob will block on each other.See this issue for details about the cache implementation. I think it is possible to re-work the cache so that locking would be much more precise and this change would not be necessary. However, that is a bigger change potentially with other tradeoffs, so I think this fix is a reasonable thing to do now.
Issues Resolved
Closes #7031
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.