-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent segment merges if the segments upload is lagging on the remote store #7477
Comments
Tagging @Bukhtawar @sachinpkale @gbbafna @mch2 @andrross @linuxpi @itiyamas for your take on this. |
Thanks @ashking94 I think we shouldn't restrict any background operations since the eventual checkpoint transfer would be the merged segment and the lagging remote store could use the eventual view of segments and upload rather than further constraining the system to upload un-merged segments followed by merged ones post segment merges. |
Thanks @ashking94 . The segment merges should eventually move closer to the storage layer and be isolated from the Indexing flow. +1 to @Bukhtawar thoughts that we shouldn't impact background operations like merging. Couple of clarifications: |
Also between segments and translog, would there be preference during remote upload as a large segment upload could potentially impact translog upload which in turn would impact request latencies. |
If there are issues with remote store that is causing difference in the local store and remote store segment state, then based on the remote segment upload backpressure logic, OpenSearch will start rejecting requests.
To start with, the rejections would be tracked against the shard. We will be able to also aggregate the errors on index, repo, node or cluster level. The current backpressure is tracking issues against each shard and then doing rejections.
Yes, that's true. The lag would increase until it goes beyond a dynamic limit. Post that, it would reject request but not fail shard. Existing mechanism for failing shards shall remain status quo.
This is planned to be handled using dedicated thread pool limited by max size. Threadpool corresponding to translog upload would be higher than the segment upload threadpool. |
@ashking94 Can you explain exactly what "Implicit segment merges on account of backpressure" means? I agree with the feedback above that if the system isn't keeping it up it should apply backpressure on incoming requests and not on the internal operations (segment merges). |
AFAIK, the background segment merges that happen today are actually triggered in the write path when the control reaches the |
@ashking94 The background merges are triggered whenever there is a MergeTrigger. Merge triggers are either external merges or operations that result in creation of segments, which is REFRESH, FLUSH or MERGE. Lucene tries to figure out whether there is a pending merge based on the merge policy and then triggers a merge operation. |
I agree with the discussion in general. We should not prevent segment merges if segment upload is lagging. Instead, we should throttle requests so that merging will eventually be stopped. |
As mentioned in #6851 section 6 - Execution plan point 4, we can prevent segment merges to stop increase in segments backlog to increase if there is any issue in remote store interaction like uploads.
Use case - Consider there is an ongoing issue with remote uploads. This has lead to remote store (& replicas) to lag behind the primary's local store by 1MB in terms of size. We are still accepting writes and backpressure is yet to kick in. Before the backpressure kicks in, there is a segment merge that happens. Now, all of a sudden the size of segments that is supposed to be upload becomes 5GB from 1MB. Once the issue recovers, the primary needs to upload 5GB worth of data which ultimately replica will need to download to get in sync with the primary in terms of search result freshness. Below we discuss the pros and cons of both the approaches -> 1. prevent segment merges explicitly if lag is present 2. Implicit segment merges on account of backpressure.
Prevent segment merges explicitly -
Pros
Cons
Implicit segment merges on account of backpressure -
Pros -
Cons -
Looking for suggestions from community on which approach seems better.
The text was updated successfully, but these errors were encountered: