Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent segment merges if the segments upload is lagging on the remote store #7477

Closed
ashking94 opened this issue May 9, 2023 · 9 comments
Closed
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs Storage:Durability Issues and PRs related to the durability framework v2.8.0 'Issues and PRs related to version v2.8.0'

Comments

@ashking94
Copy link
Member

As mentioned in #6851 section 6 - Execution plan point 4, we can prevent segment merges to stop increase in segments backlog to increase if there is any issue in remote store interaction like uploads.

Use case - Consider there is an ongoing issue with remote uploads. This has lead to remote store (& replicas) to lag behind the primary's local store by 1MB in terms of size. We are still accepting writes and backpressure is yet to kick in. Before the backpressure kicks in, there is a segment merge that happens. Now, all of a sudden the size of segments that is supposed to be upload becomes 5GB from 1MB. Once the issue recovers, the primary needs to upload 5GB worth of data which ultimately replica will need to download to get in sync with the primary in terms of search result freshness. Below we discuss the pros and cons of both the approaches -> 1. prevent segment merges explicitly if lag is present 2. Implicit segment merges on account of backpressure.

Prevent segment merges explicitly -

Pros

  1. Time to recovery is less. However, we still need to solve for subsequent segment merges that will cause the backlog to increase, but there will be no backpressure in case of normal upload and segment merge happening.

Cons

  1. In total, more data needs to be uploaded to remote store. 100MB + 5GB eventually.
  2. More resource footprint (like open file handles, disk utilisation) and relatively lesser performant searches in comparison to segment merged state.
  3. Complex code change as it touches background segment merges.

Implicit segment merges on account of backpressure -

Pros -

  1. Solving for [Remote Segments] Segment merges creates high backlog of segments to upload to remote store #7474 alleviates the time to recovery here too.
  2. Eventual Implicit segment merge prevention as the merges are triggered in write path and we have backpressure on remote segments upload lagging and will have back pressure on remote translog too - Back pressure and shard failure for remote translog upload in sync/async durability mode #6852.
  3. Lesser upload volume to reach in-sync state if a segment merge was to happen during remote store upload issue & before backpressure kicks in.
  4. Segment merges needs not be disturbed.
  5. Relatively lesser resource utilisation if segment merge to happen.

Cons -

  1. Relatively higher time to recover from issue and get in-sync.

Looking for suggestions from community on which approach seems better.

@ashking94 ashking94 added enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework Performance This is for any performance related enhancements or bugs v2.8.0 'Issues and PRs related to version v2.8.0' labels May 9, 2023
@ashking94
Copy link
Member Author

Tagging @Bukhtawar @sachinpkale @gbbafna @mch2 @andrross @linuxpi @itiyamas for your take on this.

@Bukhtawar
Copy link
Collaborator

Thanks @ashking94 I think we shouldn't restrict any background operations since the eventual checkpoint transfer would be the merged segment and the lagging remote store could use the eventual view of segments and upload rather than further constraining the system to upload un-merged segments followed by merged ones post segment merges.
The backpresssure should be for pushing back on the inflow of user's requests and not what the system is trying to internally transform(merge)

@shwetathareja
Copy link
Member

Thanks @ashking94 . The segment merges should eventually move closer to the storage layer and be isolated from the Indexing flow. +1 to @Bukhtawar thoughts that we shouldn't impact background operations like merging.

Couple of clarifications:
What happens if remote store is throwing errors like 429 as opposed to any issues with the data node? Will OpenSearch continue to reject the user requests? Are these remote store errors going to be tracked per shard/ index to fail user requests? This shouldn't cause shard failure as the issue is with remote store and not the node itself.

@shwetathareja
Copy link
Member

Also between segments and translog, would there be preference during remote upload as a large segment upload could potentially impact translog upload which in turn would impact request latencies.

@ashking94
Copy link
Member Author

Couple of clarifications: What happens if remote store is throwing errors like 429 as opposed to any issues with the data node? Will OpenSearch continue to reject the user requests?

If there are issues with remote store that is causing difference in the local store and remote store segment state, then based on the remote segment upload backpressure logic, OpenSearch will start rejecting requests.

Are these remote store errors going to be tracked per shard/ index to fail user requests?

To start with, the rejections would be tracked against the shard. We will be able to also aggregate the errors on index, repo, node or cluster level. The current backpressure is tracking issues against each shard and then doing rejections.

This shouldn't cause shard failure as the issue is with remote store and not the node itself.

Yes, that's true. The lag would increase until it goes beyond a dynamic limit. Post that, it would reject request but not fail shard. Existing mechanism for failing shards shall remain status quo.

Also between segments and translog, would there be preference during remote upload as a large segment upload could potentially impact translog upload which in turn would impact request latencies.

This is planned to be handled using dedicated thread pool limited by max size. Threadpool corresponding to translog upload would be higher than the segment upload threadpool.

@andrross
Copy link
Member

@ashking94 Can you explain exactly what "Implicit segment merges on account of backpressure" means? I agree with the feedback above that if the system isn't keeping it up it should apply backpressure on incoming requests and not on the internal operations (segment merges).

@ashking94
Copy link
Member Author

@ashking94 Can you explain exactly what "Implicit segment merges on account of backpressure" means? I agree with the feedback above that if the system isn't keeping it up it should apply backpressure on incoming requests and not on the internal operations (segment merges).

AFAIK, the background segment merges that happen today are actually triggered in the write path when the control reaches the IndexWriter.addDocuments(docs). When we start rejecting requests, the segment merges should stop itself as they will not be triggered.

@itiyama
Copy link

itiyama commented May 17, 2023

@ashking94 The background merges are triggered whenever there is a MergeTrigger. Merge triggers are either external merges or operations that result in creation of segments, which is REFRESH, FLUSH or MERGE. Lucene tries to figure out whether there is a pending merge based on the merge policy and then triggers a merge operation.

@itiyama
Copy link

itiyama commented May 17, 2023

I agree with the discussion in general. We should not prevent segment merges if segment upload is lagging. Instead, we should throttle requests so that merging will eventually be stopped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs Storage:Durability Issues and PRs related to the durability framework v2.8.0 'Issues and PRs related to version v2.8.0'
Projects
None yet
Development

No branches or pull requests

5 participants