Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] [Repository Download Enhancement] Implement the enhanced download mechanism #9031

Closed
kotwanikunal opened this issue Aug 1, 2023 · 0 comments · Fixed by #9592 or #9694
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@kotwanikunal
Copy link
Member

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

  1. The existing Repository API (https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/blobstore/BlobContainer.java#L80-L92) of readBlob(blobName, position, length) will be utilized to create multiple streams for a single blob file (segment/metadata)

  2. The download path will be updated with an Observer/ActionListener pattern to enable async fetches for different parts of a blob in a non-blocking fashion

    1. At the file level, when all the streams have completed, the corresponding listener will be notified. (parallel streams for a file)
    2. At the segment restore level, when all of the files have notified download complete, the restore process will be continued forward. (parallel files)
  3. A metadata fetch will occur to calculate/load the hash as well as the length of the file stored within the repository before the streams are opened for a file.

    1. Parallel streams for a file will read data into a buffer, save it as a part of a local file in a temporary location.
    2. Once all the streams are marked as completed, the file will be merged and stored in the segment directory
  4. Vendor plugins will take the decision of determining individual part size for a given file size. An abstraction can take an input of part size determined by plugin to be most suitable for a given file size and provide a list of respective number of suppliers similar to the uploads implementation.

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment