Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17151 Check limits between calls to components in SearchHandler #2801

Merged
merged 6 commits into from
Nov 10, 2024

Conversation

gus-asf
Copy link
Contributor

@gus-asf gus-asf commented Oct 27, 2024

https://issues.apache.org/jira/browse/SOLR-17151

This change seeks to ensure that a request that has exceeded a query limit will not continue processing (potentially expensive) components before returning partial results in response to a query. At the point of the creation of this PR, there's a couple failing tests to track down, but there's a good chance those will just be some places where the tests expect book-keeping in debugMap and shardsInfo that has now been short-circuited (and may or may not actually need to be retained). Happy to accept feedback while I track those down.

Provides additional detail message from first failing shard
Tests for this feature pass, still debugging 3 failing tests relating to debugMap
and shardsInfo, so not quite ready yet.
@gus-asf gus-asf requested a review from sigram November 7, 2024 17:37
Copy link
Contributor

@sigram sigram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -615,17 +615,23 @@ protected int groupedDistributedProcess(ResponseBuilder rb) {
}

protected int regularDistributedProcess(ResponseBuilder rb) {
if (rb.stage < ResponseBuilder.STAGE_PARSE_QUERY) return ResponseBuilder.STAGE_PARSE_QUERY;
if (rb.stage < ResponseBuilder.STAGE_PARSE_QUERY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I never liked these one-liners.

shardInfo.add(shardInfoName, nl);
rsp.getValues().add(ShardParams.SHARDS_INFO, shardInfo);
private static String stageInEnglish(int nextStage) {
// This should probably be a enum, but that change should be its own ticket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to both.

+ " in "
+ components.stream()
.map(SearchComponent::getName)
.collect(Collectors.toList()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should build a list of all component names in initComponents() and keep it around?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The supplier is only evaluated if the exception is thrown.

@@ -70,9 +70,12 @@ public void testPrefixQuery() throws Exception {

// this time we should get a query cache hit and hopefully no exception? this may change in the
// future if time checks are put into other places.
assertJQ(req("q", q, "timeAllowed", "1", "sleep", sleep), assertionString);
// 2024-4-15: it did change..., and now this fails with 1 or 2 ms and passes with 3ms... I see
// no way this won't be terribly brittle. Maybe TestInjection of some sort to bring this back?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use CallerSpecificQueryLimit (example in TestQueryLimits.testQueryLimits().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that helps. The test wants to verify that the result was cached and is hoping that the request won't time out because of that, but that only worked because timeAllowed was only measured/triggered in a small area of code sensitive to the availability of the cache. Now we check it all over the place. If we raise the limit too much it's not a test, and if this test gets starved for resources it will fail spuriously.

Copy link

@fsparv fsparv Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we really need for this test is a cache admin command that can list the contents of the cache (and one would expect a manual clear capability too, though that's not needed here) One might be able to get by with a metrics call, looking for an increment in cache hits, but that's pretty heavy reliance on coincidence rather than explicit testing of the actual desired behavior.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at the ability to check directly. There is some hope since we could

    SolrCores solrCores = ExitableDirectoryReaderTest.h.getCoreContainer().solrCores;
    List<SolrCore> cores = solrCores.getCores();
    for (SolrCore core : cores) {
      // identify core we need here...
    }

but assuming we can identify the appropriate core, there isn't even an internal api to check caches. Would look for something steming from core.getSearcher().get().??? but we would need to add an accessor for the queryCache and then on SolrCache a way to inspect the keys of the cache... that seems way beyond what this change should attempt.

throw new SolrServerException(
"Time allowed to handle this request exceeded" + suffix, previousEx);
"The processing limits for to this request were exceeded, see cause for details",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@gus-asf
Copy link
Contributor Author

gus-asf commented Nov 10, 2024

Crave seems to be stalled entirely. Tests pass locally, merging

@gus-asf gus-asf merged commit 125fc3f into apache:main Nov 10, 2024
3 of 4 checks passed
gus-asf added a commit to gus-asf/solr that referenced this pull request Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants