Introduce new setting search.concurrent.max_slice to control the slice computation for concurrent segment search #8847

sohami · 2023-07-24T16:51:07Z

Description

Introduces a static setting to control slice computation using lucene default or max target slice mechanism for concurrent segment search.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#7358

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2023-07-24T16:59:24Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/20807/
CommitID: afc1337
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-07-24T17:06:32Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/20808/
CommitID: 677a72f
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

…e computation for concurrent segment search. It uses lucene default mechanism if the setting value is <=0 otherwise uses custom max target slice mechanism Signed-off-by: Sorabh Hamirwasia <sohami.apache@gmail.com>

github-actions · 2023-07-24T18:36:16Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/20821/
CommitID: 934de1b

andrross · 2023-07-24T23:30:58Z

server/src/main/java/org/opensearch/search/SearchBootstrapSettings.java

+import org.opensearch.common.settings.Settings;
+
+/**
+ * Keeps track of all the search related node level settings which can be accessed via static methods


Please add @opensearch.internal

andrross · 2023-07-24T23:32:55Z

server/src/main/java/org/opensearch/search/internal/ContextIndexSearcher.java

@@ -518,4 +537,19 @@ private boolean shouldReverseLeafReaderContexts() {
        }
        return false;
    }
+
+    // package-private for testing
+    LeafSlice[] slicesInternal(List<LeafReaderContext> leaves, int target_max_slices) {


targetMaxSlices to follow convention

andrross · 2023-07-24T23:37:36Z

server/src/main/java/org/opensearch/search/SearchBootstrapSettings.java

+        settings = openSearchSettings;
+    }
+
+    public static int getValueAsInt(String settingName, int defaultValue) {


This is a bit weird because any setting can be passed in here even if unrelated to search. I'd probably implement this as SearchBootstrapSettings.getTargetMaxSliceCount(). I'd consider using Optional<Integer> or a nullable Integer as the return type as opposed to using the magic value of -1.

I thought about adding the getTargetMaxSliceCount method but then it will not keep this class generic which can be utilized for other search settings. For each new setting we will need to add specific method. But I see your point about being able to access any settings. But that is true for any access to ClusterSettings from different components as well. Let me know if you still prefer adding explicit method getTargetMaxSliceCount here.

-1 is the default value for this setting as it doesn't allow null for it. When the setting will be converted to dynamic then it will return the default value when setting value is looked up using clusterSettings.get(CONCURRENT_SEGMENT_SEARCH_TARGET_MAX_SLICE_COUNT_SETTING) and it is not explicitly set. It will not return null so caller needs to handle this default of -1 to fallback to lucene behavior in that case. Hence keeping it as is without adding nullability check for now. The cases will be:

Not set explicitly --> default value will be returned so use lucene slice computation

Set explicitly to -1/0 --> use lucene slice computation

Set explicitly to >0 --> use custom slice computation

andrross · 2023-07-24T23:39:45Z

server/src/main/java/org/opensearch/search/SearchBootstrapSettings.java

+ * Keeps track of all the search related node level settings which can be accessed via static methods
+ */
+public class SearchBootstrapSettings {
+    // settings to configure maximum slice created per search request using OS custom slice computation mechanism. Default lucene


Is there any question as to whether we will switch away from the static method to the dynamic method when the next release of Lucene is available? If not, I'd go ahead and create a GitHub issue to track it and link the issue in a comment in the code where appropriate.

@sohami to this point, why this setting is static? As far as I can tell, it is used in non-static context and could be implemented as a regular search-related setting.

@reta The slices method here is called from the constructor of the IndexSearcher in 9.7. Due to this it doesn't allow to make it configurable using any member variable of ContextIndexSearcher. This is changed in the lucene PR here and will be available in 9.8 which is when we can move to a dynamic setting.

Thanks @sohami the main is moving to 9.8 #8668

thats interesting, do we usually update main with the lucene snapshot builds as well ? Was wondering if we take any dependency on a new change in unreleased lucene version and that gets changed in released lucene version then the change will break.

But nonetheless we can keep this change as is and backport to 2.x until 9.8 is officially released. Then add a follow-up commit to move it to dynamic setting as part of #8870

Since now main is already moved to 9.8.0-snapshot version of lucene, I will add a follow-up PR to clean this up in main and have a dynamic mechanism but not backport that to 2.x.

I am trying to understand why do you want to get this change in main? We know this is not the way to go forward, and we do have the solution, you will have to redo this work in two branches instead of just cherry picking one simple change into 2.x.

@reta Let me try explaining :) . I do get your solution and can use that but would like to first try explaining why I am trying to merge it in main as well.

As an example in this PR, we have this new class SearchBootstrapSettings which provides static access to this new node setting. If I were to create a separate PR to make this setting as dynamic this class will not be needed.

With approach 1, if I merge this PR (say pr_1) only in 2.x this class will be present in 2.x only. Now the new PR (say pr_2) with dynamic setting will be built independent of this PR (pr_1) and will not know anything about this class as well and will go to main. When me move this new PR (pr_2) from main to 2.x, we will need to ensure the cleanup like for SearchBootstrapSettings and other conflicts happens properly and not get missed (which I was thinking can be messy as these 2 PRs are not built on top of each other).

With approach 2, if we merge pr_1 both in 2.x and main. Then add pr_2 in main on top of pr_1, then all the cleanup will be done as part of this pr_2 itself and it can be merged in same way in 2.x as well. I was thinking this backport to 2.x will be cleaner compared to approach 1 and hence was going with this approach. I may be missing something here but thanks for the discussion ?

When me move this new PR (pr_2) from main to 2.x, we will need to ensure the cleanup like for SearchBootstrapSettings and other conflicts happens properly and not get missed (which I was thinking can be messy as these 2 PRs are not built on top of each other).

This is inevitable in any case - we will be backporting all changes related to Lucene 9.8.0 (as we did for all previous Apache Lucene versions), the argument here is: keep main clean by using the new Apache Lucene snapshots (this is why we've been always doing that, giving the ride to the feature before the release - bugs do happen), do backport as a best effort.

I will close this PR and open a new one against 2.x. Will make the dynamic setting change as separate PR for main branch. And will create another tracking backport issue for merging the dynamic setting change to 2.x when lucene 9.8 gets released. As part of that backport we can revert the commit for static setting in 2.x and then apply the commit from main.

@reta @andrross @jed326 Created PR #8884 for 2.x branch. Will be working on dynamic setting change for main branch.

andrross · 2023-07-24T23:41:06Z