[Remote Store] [Repository Download Enhancement] Implement the enhanced download mechanism #9031

kotwanikunal · 2023-08-01T17:21:09Z

Is your feature request related to a problem? Please describe.

Details on [Repository] Support multi-stream downloads within Repository #8596

Describe the solution you'd like

The existing Repository API (https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/blobstore/BlobContainer.java#L80-L92) of readBlob(blobName, position, length) will be utilized to create multiple streams for a single blob file (segment/metadata)
The download path will be updated with an Observer/ActionListener pattern to enable async fetches for different parts of a blob in a non-blocking fashion
1. At the file level, when all the streams have completed, the corresponding listener will be notified. (parallel streams for a file)
2. At the segment restore level, when all of the files have notified download complete, the restore process will be continued forward. (parallel files)
A metadata fetch will occur to calculate/load the hash as well as the length of the file stored within the repository before the streams are opened for a file.
1. Parallel streams for a file will read data into a buffer, save it as a part of a local file in a temporary location.
2. Once all the streams are marked as completed, the file will be merged and stored in the segment directory
Vendor plugins will take the decision of determining individual part size for a given file size. An abstraction can take an input of part size determined by plugin to be most suitable for a given file size and provide a list of respective number of suppliers similar to the uploads implementation.

Additional context

[POC] Support multi-stream downloads within Repository #8729

kotwanikunal added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 1, 2023

kotwanikunal self-assigned this Aug 1, 2023

kotwanikunal added distributed framework v2.10.0 and removed untriaged labels Aug 1, 2023

kotwanikunal mentioned this issue Aug 8, 2023

Add interface changes for async repository downloads #9182

Closed

6 tasks

kotwanikunal added this to Segment Replication Aug 10, 2023

github-project-automation bot moved this to Todo in Segment Replication Aug 10, 2023

kotwanikunal moved this from Todo to In Progress in Segment Replication Aug 10, 2023

kotwanikunal mentioned this issue Aug 21, 2023

[Remote Store] [Repository Download Enhancement] Add async/listener pattern support for download path #8930

Closed

kotwanikunal mentioned this issue Aug 28, 2023

Add async blob read and download support using multiple streams #9592

Merged

6 tasks

kotwanikunal closed this as completed in #9592 Sep 1, 2023

github-project-automation bot moved this from In Progress to Done in Segment Replication Sep 1, 2023

kotwanikunal mentioned this issue Sep 1, 2023

Add async read support for S3 plugin #9694

Merged

6 tasks

kotwanikunal removed the v2.10.0 label Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Remote Store] [Repository Download Enhancement] Implement the enhanced download mechanism #9031

[Remote Store] [Repository Download Enhancement] Implement the enhanced download mechanism #9031

kotwanikunal commented Aug 1, 2023

[Remote Store] [Repository Download Enhancement] Implement the enhanced download mechanism #9031

[Remote Store] [Repository Download Enhancement] Implement the enhanced download mechanism #9031

Comments

kotwanikunal commented Aug 1, 2023