-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimised prefix pattern per shard for remote store data and metadata files for higher throughput #12567
Labels
enhancement
Enhancement or improvement to existing feature or request
Storage:Performance
Storage:Resiliency
Issues and PRs related to the storage resiliency
v2.14.0
Comments
ashking94
added
enhancement
Enhancement or improvement to existing feature or request
untriaged
labels
Mar 8, 2024
ashking94
changed the title
[Feature Request] Optimised prefix pattern per shard for remote store data and metadata files for higher throughput
Optimised prefix pattern per shard for remote store data and metadata files for higher throughput
Mar 8, 2024
ashking94
added
the
Storage:Resiliency
Issues and PRs related to the storage resiliency
label
Mar 18, 2024
This was referenced Mar 19, 2024
[Remote Store] Update remote Store flows to support any path type with backward compatibility
#12790
Closed
This was referenced Mar 30, 2024
Closed
This was referenced Apr 11, 2024
Marking this closed since the feature has been successfully completed. |
github-project-automation
bot
moved this from 🏗 In progress
to ✅ Done
in Storage Project Board
May 2, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
Enhancement or improvement to existing feature or request
Storage:Performance
Storage:Resiliency
Issues and PRs related to the storage resiliency
v2.14.0
Is your feature request related to a problem? Please describe
With remote store feature, we upload 2 kinds of data to remote store - data and metadata against both translog and metadata. We have #5854 for allowing buffering of requests before uploading it after every 650ms (default value). This works well in steady state. However, I have faced issue where I am running performance test with single index and higher number of shards.
The current path structure looks like this ->
If we notice, the physical layout and logical layout of data is same. This structure allows some limits on number of GETs, PUTs, DELETEs, LISTs. However, the limits becomes bottleneck when there are too many shards for an index.
Describe the solution you'd like
A prefix pattern that is accepted by multiple repository providers like AWS S3, GCP storage. The general recommendation by the providers is to maximise the spread of data across as many prefixes as possible. This allows them to scale better.
So, the proposed prefix pattern is ->
With above prefix pattern, we ensure that the prefixes are as random but predictable. For the combination of translog-data, translog-metadata, segment-data, segment-metadata, the path would be fixed and will remain same throughout it's life.
We can also see this referred by multiple cloud providers below -
Related component
Storage:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: