-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Searchable Snapshots] Remove virtual file and fix duplicate clone #6345
[Searchable Snapshots] Remove virtual file and fix duplicate clone #6345
Conversation
A "virtual file" in this context is a small file whose entire contents is stored in the snapshot metadata as opposed to in a discrete file in the remote repository. OnDemandVirtualFileSnapshotIndexInput was a complicated wrapper that ultimately returned a ByteArrayIndexInput wrapping the file contents pulled from the metadata data. This change simplifies things a lot and just creates the ByteArrayIndexInput directly. The other change (which led to removal of the virtual file) is to remove a duplicate clone() call of the index input. The file cache design calls for keeping the "origin" IndexInput instance in the cache and always [returning clones][1]. OnDemandBlockIndexInput was incorrectly duplicating this clone operation. [1]: https://github.com/opensearch-project/OpenSearch/blob/0ca51a774211184835c4825dfeff38b23198352e/server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java#L105 Signed-off-by: Andrew Ross <andrross@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6345 +/- ##
============================================
- Coverage 70.73% 70.71% -0.02%
+ Complexity 59024 58995 -29
============================================
Files 4800 4799 -1
Lines 282453 282421 -32
Branches 40718 40717 -1
============================================
- Hits 199799 199727 -72
- Misses 66259 66316 +57
+ Partials 16395 16378 -17
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
WhiteSource check is failing for existing issues. No dependencies changed in this commit, so I'm going to merge. |
A "virtual file" in this context is a small file whose entire contents is stored in the snapshot metadata as opposed to in a discrete file in the remote repository. OnDemandVirtualFileSnapshotIndexInput was a complicated wrapper that ultimately returned a ByteArrayIndexInput wrapping the file contents pulled from the metadata data. This change simplifies things a lot and just creates the ByteArrayIndexInput directly. The other change (which led to removal of the virtual file) is to remove a duplicate clone() call of the index input. The file cache design calls for keeping the "origin" IndexInput instance in the cache and always [returning clones][1]. OnDemandBlockIndexInput was incorrectly duplicating this clone operation. [1]: https://github.com/opensearch-project/OpenSearch/blob/0ca51a774211184835c4825dfeff38b23198352e/server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java#L105 Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit 1cdff3b) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
A "virtual file" in this context is a small file whose entire contents is stored in the snapshot metadata as opposed to in a discrete file in the remote repository. OnDemandVirtualFileSnapshotIndexInput was a complicated wrapper that ultimately returned a ByteArrayIndexInput wrapping the file contents pulled from the metadata data. This change simplifies things a lot and just creates the ByteArrayIndexInput directly. The other change (which led to removal of the virtual file) is to remove a duplicate clone() call of the index input. The file cache design calls for keeping the "origin" IndexInput instance in the cache and always [returning clones][1]. OnDemandBlockIndexInput was incorrectly duplicating this clone operation. [1]: https://github.com/opensearch-project/OpenSearch/blob/0ca51a774211184835c4825dfeff38b23198352e/server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java#L105 (cherry picked from commit 1cdff3b) Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR #6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com>
In PR #6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. Signed-off-by: Andrew Ross <andrross@amazon.com> (cherry picked from commit 5e5c83b) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
In PR #6345 I did remove a duplicate clone, however this resulted in cloning the IndexInput in the wrong place. When requesting a file that needs to be downloaded, we have a mechanism to ensure that concurrent calls do not end up duplicating the download, which results in multiple threads being given the same instance. The clone must happen _after_ this point to ensure that each thread gets its own clone. (cherry picked from commit 5e5c83b) Signed-off-by: Andrew Ross <andrross@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
A "virtual file" in this context is a small file whose entire contents is stored in the snapshot metadata as opposed to in a discrete file in the remote repository. OnDemandVirtualFileSnapshotIndexInput was a complicated wrapper that ultimately returned a ByteArrayIndexInput wrapping the file contents pulled from the metadata data. This change simplifies things a lot and just creates the ByteArrayIndexInput directly.
The other change (which led to removal of the virtual file) is to remove a duplicate clone() call of the index input. The file cache design calls for keeping the "origin" IndexInput instance in the cache and always returning clones. OnDemandBlockIndexInput was incorrectly duplicating this clone operation.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.