Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the "highlight.max_analyzer_offset" request parameter with "plain" highlighter #10919

Conversation

drunken-monkey
Copy link
Contributor

@drunken-monkey drunken-monkey commented Oct 25, 2023

Description

Currently, the highlight.max_analyzer_offset query parameter is ignored when using the plain highlighter.

Note: At the moment, this includes only a failing test, no actual fix for the problem.

Related Issues

Resolves #10702.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added the bug Something isn't working label Oct 25, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 25, 2023

Compatibility status:

Checks if related components are compatible with change e469e62

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dblock
Copy link
Member

dblock commented Oct 30, 2023

Thanks @drunken-monkey! Looks like it failed in https://build.ci.opensearch.org/job/gradle-check/28988/. You're one step closer to a fix ;) Did it fail as expected? Aka "The length of [field1] field of [1] doc of [test1] index has exceeded [10] - maximum allowed to be analyzed for highlighting ....".

@drunken-monkey
Copy link
Contributor Author

@dblock Yes, it failed as expected. So you’re right, good first step.
Since I’m completely new to this project, I doubt I’d be able to provide an actual fix, though.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Dec 8, 2023
@ticheng-aws
Copy link
Contributor

Hi @drunken-monkey, Is this being worked upon? Pls free to reach out to maintainers for further reviews.

@drunken-monkey
Copy link
Contributor Author

Hi @drunken-monkey, Is this being worked upon? Pls free to reach out to maintainers for further reviews.

As written above, I’m completely new to this project and doubt I’d be able to provide an actual fix. Adding the test was easy enough, since it was just one self-explanatory YAML file, but I’d have no idea where to start for a fix. So, no, not being worked on, would be great if someone else would find the time.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 6, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Jan 28, 2024
Copy link
Contributor

❌ Gradle check result for 12adef4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@drunken-monkey
Copy link
Contributor Author

I think I managed to resolve this problem. Please review!

The previous test failure was this:

org.opensearch.test.rest.ClientYamlTestSuiteIT.test {p0=search.highlight/30_max_analyzed_offset/Plain highlighter on a field WITHOUT OFFSETS using max_analyzer_offset should SUCCEED}
java.lang.AssertionError: Failure at [search.highlight/30_max_analyzed_offset:92]: hits.hits.0.highlight.field1.0 didn't match expected value:
hits.hits.0.highlight.field1.0: expected String [The <em>quick</em> brown fox went to the forest and saw another fox.] but was String [The <em>quick</em> ]

However, when setting max_analyzer_offset: 10 I’d say it’s expected that the highlighted text will cut off after 10 characters. I therefore changed the test accordingly. The Unified highlighter seems to still return the whole string, just only with highlighting in the first 10 characters (I assume), but as that’s just working differently I hope it’s OK if the plain highlighter doesn’t replicate that behavior.

Copy link
Contributor

❌ Gradle check result for 48b1dcd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@drunken-monkey drunken-monkey force-pushed the 10702-max_analyzer_offset-ignored-for-plain-highlighter branch from 48b1dcd to 1529dfd Compare February 1, 2024 10:27
@drunken-monkey
Copy link
Contributor Author

@andrross Done.

Copy link
Contributor

github-actions bot commented Feb 1, 2024

❌ Gradle check result for 1529dfd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 1, 2024

✅ Gradle check result for 640f8b5: SUCCESS

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (03e415e) 71.45% compared to head (e469e62) 71.41%.

Files Patch % Lines
...rch/fetch/subphase/highlight/PlainHighlighter.java 14.28% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10919      +/-   ##
============================================
- Coverage     71.45%   71.41%   -0.05%     
- Complexity    59831    59835       +4     
============================================
  Files          4959     4959              
  Lines        281129   281135       +6     
  Branches      40857    40859       +2     
============================================
- Hits         200892   200771     -121     
- Misses        63580    63730     +150     
+ Partials      16657    16634      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…h plain highlighter.

Signed-off-by: Thomas Seidl <remus@gmx.net>
…ghter.

Signed-off-by: Thomas Seidl <remus@gmx.net>
Signed-off-by: Thomas Seidl <remus@gmx.net>
Signed-off-by: Thomas Seidl <remus@gmx.net>
@andrross andrross force-pushed the 10702-max_analyzer_offset-ignored-for-plain-highlighter branch from 640f8b5 to e469e62 Compare February 15, 2024 21:59
@andrross
Copy link
Member

@drunken-monkey Apologies for the delay and thanks for sticking with this issue. The PR looks good to me. I just rebased and fixed the conflicts.

Copy link
Contributor

❕ Gradle check result for e469e62: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@kotwanikunal kotwanikunal merged commit 5d327a2 into opensearch-project:main Feb 17, 2024
30 of 31 checks passed
@andrross andrross added the backport 2.x Backport to 2.x branch label Feb 17, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10919-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5d327a2f48f4c0fcd8a443fa6675dad8b18b5cb5
# Push it to GitHub
git push --set-upstream origin backport/backport-10919-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-10919-to-2.x.

andrross pushed a commit to andrross/OpenSearch that referenced this pull request Feb 17, 2024
andrross pushed a commit to andrross/OpenSearch that referenced this pull request Feb 17, 2024
…" highlighter (opensearch-project#10919)

(cherry picked from commit 5d327a2)
Signed-off-by: Andrew Ross <andrross@amazon.com>
drunken-monkey added a commit to drunken-monkey/OpenSearch that referenced this pull request Feb 17, 2024
@drunken-monkey drunken-monkey deleted the 10702-max_analyzer_offset-ignored-for-plain-highlighter branch February 17, 2024 17:02
reta pushed a commit that referenced this pull request Feb 19, 2024
…ter with "plain… (#12357)

* Fix the "highlight.max_analyzer_offset" request parameter with "plain" highlighter (#10919)

(cherry picked from commit 5d327a2)
Signed-off-by: Andrew Ross <andrross@amazon.com>

* Change bwc test to properly reflect version

Signed-off-by: Andrew Ross <andrross@amazon.com>

---------

Signed-off-by: Andrew Ross <andrross@amazon.com>
Co-authored-by: Thomas Seidl <drunken-monkey@users.noreply.github.com>
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…" highlighter (opensearch-project#10919)

Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed bug Something isn't working Search Search query, autocomplete ...etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] "max_analyzer_offset" highlight query setting not working for plain highlighter
5 participants