Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshots] Remove virtual file and fix duplicate clone #6345

Merged
merged 1 commit into from
Feb 16, 2023

Conversation

andrross
Copy link
Member

A "virtual file" in this context is a small file whose entire contents is stored in the snapshot metadata as opposed to in a discrete file in the remote repository. OnDemandVirtualFileSnapshotIndexInput was a complicated wrapper that ultimately returned a ByteArrayIndexInput wrapping the file contents pulled from the metadata data. This change simplifies things a lot and just creates the ByteArrayIndexInput directly.

The other change (which led to removal of the virtual file) is to remove a duplicate clone() call of the index input. The file cache design calls for keeping the "origin" IndexInput instance in the cache and always returning clones. OnDemandBlockIndexInput was incorrectly duplicating this clone operation.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

A "virtual file" in this context is a small file whose entire contents
is stored in the snapshot metadata as opposed to in a discrete file in
the remote repository. OnDemandVirtualFileSnapshotIndexInput was a
complicated wrapper that ultimately returned a ByteArrayIndexInput
wrapping the file contents pulled from the metadata data. This change
simplifies things a lot and just creates the ByteArrayIndexInput
directly.

The other change (which led to removal of the virtual file) is to remove
a duplicate clone() call of the index input. The file cache design calls
for keeping the "origin" IndexInput instance in the cache and always
[returning clones][1]. OnDemandBlockIndexInput was incorrectly
duplicating this clone operation.

[1]: https://github.com/opensearch-project/OpenSearch/blob/0ca51a774211184835c4825dfeff38b23198352e/server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java#L105

Signed-off-by: Andrew Ross <andrross@amazon.com>
@kotwanikunal kotwanikunal added the backport 2.x Backport to 2.x branch label Feb 16, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #6345 (bef7da9) into main (7914c04) will decrease coverage by 0.02%.
The diff coverage is 50.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6345      +/-   ##
============================================
- Coverage     70.73%   70.71%   -0.02%     
+ Complexity    59024    58995      -29     
============================================
  Files          4800     4799       -1     
  Lines        282453   282421      -32     
  Branches      40718    40717       -1     
============================================
- Hits         199799   199727      -72     
- Misses        66259    66316      +57     
+ Partials      16395    16378      -17     
Impacted Files Coverage Δ
...tore/remote/directory/RemoteSnapshotDirectory.java 3.12% <0.00%> (ø)
...arch/index/store/remote/utils/TransferManager.java 3.03% <ø> (ø)
...dex/store/remote/file/OnDemandBlockIndexInput.java 73.01% <100.00%> (ø)
...adonly/AddIndexBlockClusterStateUpdateRequest.java 0.00% <0.00%> (-75.00%) ⬇️
...pensearch/client/cluster/RemoteConnectionInfo.java 0.00% <0.00%> (-73.18%) ⬇️
...a/org/opensearch/client/cluster/ProxyModeInfo.java 0.00% <0.00%> (-60.00%) ⬇️
...a/org/opensearch/client/cluster/SniffModeInfo.java 0.00% <0.00%> (-58.83%) ⬇️
...readonly/TransportVerifyShardIndexBlockAction.java 9.75% <0.00%> (-58.54%) ⬇️
.../java/org/opensearch/node/NodeClosedException.java 50.00% <0.00%> (-50.00%) ⬇️
...regations/metrics/AbstractHyperLogLogPlusPlus.java 51.72% <0.00%> (-44.83%) ⬇️
... and 479 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@andrross
Copy link
Member Author

andrross commented Feb 16, 2023

WhiteSource check is failing for existing issues. No dependencies changed in this commit, so I'm going to merge.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 16, 2023
A "virtual file" in this context is a small file whose entire contents
is stored in the snapshot metadata as opposed to in a discrete file in
the remote repository. OnDemandVirtualFileSnapshotIndexInput was a
complicated wrapper that ultimately returned a ByteArrayIndexInput
wrapping the file contents pulled from the metadata data. This change
simplifies things a lot and just creates the ByteArrayIndexInput
directly.

The other change (which led to removal of the virtual file) is to remove
a duplicate clone() call of the index input. The file cache design calls
for keeping the "origin" IndexInput instance in the cache and always
[returning clones][1]. OnDemandBlockIndexInput was incorrectly
duplicating this clone operation.

[1]: https://github.com/opensearch-project/OpenSearch/blob/0ca51a774211184835c4825dfeff38b23198352e/server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java#L105

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 1cdff3b)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@andrross andrross deleted the remove-virtual-file branch February 16, 2023 21:40
andrross pushed a commit that referenced this pull request Feb 16, 2023
A "virtual file" in this context is a small file whose entire contents
is stored in the snapshot metadata as opposed to in a discrete file in
the remote repository. OnDemandVirtualFileSnapshotIndexInput was a
complicated wrapper that ultimately returned a ByteArrayIndexInput
wrapping the file contents pulled from the metadata data. This change
simplifies things a lot and just creates the ByteArrayIndexInput
directly.

The other change (which led to removal of the virtual file) is to remove
a duplicate clone() call of the index input. The file cache design calls
for keeping the "origin" IndexInput instance in the cache and always
[returning clones][1]. OnDemandBlockIndexInput was incorrectly
duplicating this clone operation.

[1]: https://github.com/opensearch-project/OpenSearch/blob/0ca51a774211184835c4825dfeff38b23198352e/server/src/main/java/org/opensearch/index/store/remote/utils/TransferManager.java#L105


(cherry picked from commit 1cdff3b)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 18, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 18, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 18, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 18, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 18, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 18, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 18, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 20, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 20, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this pull request Feb 20, 2023
In PR opensearch-project#6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit that referenced this pull request Feb 20, 2023
In PR #6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 20, 2023
In PR #6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit 5e5c83b)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request Feb 20, 2023
In PR #6345 I did remove a duplicate clone, however this resulted in
cloning the IndexInput in the wrong place. When requesting a file that
needs to be downloaded, we have a mechanism to ensure that concurrent
calls do not end up duplicating the download, which results in multiple
threads being given the same instance. The clone must happen _after_
this point to ensure that each thread gets its own clone.


(cherry picked from commit 5e5c83b)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants