-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve explanation in rescore #30629
Improve explanation in rescore #30629
Conversation
Currently in a rescore request if window_size is smaller than the top N documents returned (N=size), explanation of scores could be incorrect for documents that were a part of topN and not part of rescoring. This PR corrects this, but saving in RescoreContext docIDs of documents for which rescoring was applied, and adding rescoring explanation only for these docIDs. Closes elastic#28725
@elasticmachine run gradle build tests |
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mayya-sharipova
I left a small comment regarding the creation of the rescore explanation.
@@ -92,7 +97,7 @@ public Explanation explain(int topLevelDocId, IndexSearcher searcher, RescoreCon | |||
|
|||
// NOTE: we don't use Lucene's Rescorer.explain because we want to insert our own description with which ScoreMode was used. Maybe | |||
// we should add QueryRescorer.explainCombine to Lucene? | |||
if (rescoreExplain != null && rescoreExplain.isMatch()) { | |||
if (rescoreContext.isRescored(topLevelDocId) && rescoreExplain != null && rescoreExplain.isMatch()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always build the rescore explanation even when rescoreContext.isRescored(topLevelDocId)
is false. We could avoid this by building the primary explanation first and return it directly if rescoreContext.isRescored(topLevelDocId)
is false ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine run gradle build tests |
@@ -27,6 +29,7 @@ | |||
public class RescoreContext { | |||
private final int windowSize; | |||
private final Rescorer rescorer; | |||
private Set<Integer> recroredDocs; //doc Ids for which rescoring was applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/recrored/rescored/
Currently in a rescore request if window_size is smaller than the top N documents returned (N=size), explanation of scores could be incorrect for documents that were a part of topN and not part of rescoring. This PR corrects this by saving in RescoreContext docIDs of documents for which rescoring was applied, and adding rescoring explanation only for these docIDs. Closes #28725
Currently in a rescore request if window_size is smaller than the top N documents returned (N=size), explanation of scores could be incorrect for documents that were a part of topN and not part of rescoring. This PR corrects this by saving in RescoreContext docIDs of documents for which rescoring was applied, and adding rescoring explanation only for these docIDs. Closes #28725
…ngs-to-true * elastic/master: (25 commits) [DOCS] Replace X-Pack terms with attributes [ML] Clean left behind model state docs (elastic#30659) Correct typos filters agg docs duplicated 'bucket' word removal (elastic#30677) top_hits doc example description update (elastic#30676) [Docs] Replace InetSocketTransportAddress with TransportAdress (elastic#30673) [TEST] Account for increase in ML C++ memory usage (elastic#30675) User proper write-once semantics for GCS repository (elastic#30438) Remove bogus file accidentally added Add detailed assert message to IndexAuditUpgradeIT (elastic#30669) Adjust fast forward for token expiration test (elastic#30668) Improve explanation in rescore (elastic#30629) Deprecate `nGram` and `edgeNGram` names for ngram filters (elastic#30209) Watcher: Fix watch history template for dynamic slack attachments (elastic#30172) Fix _cluster/state to always return cluster_uuid (elastic#30656) [Tests] Add debug information to CorruptedFileIT Preserve REST client auth despite 401 response (elastic#30558) [test] packaging: add windows boxes (elastic#30402) Make xpack modules instead of a meta plugin (elastic#30589) Mute ShrinkIndexIT ...
* 6.x: Mute testCorruptFileThenSnapshotAndRestore Plugins: Remove meta plugins (#30670) Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726) Docs: Add uptasticsearch to list of clients (#30738) [TEST] Reduce forecast overflow to disk test memory limit (#30727) [DOCS] Removes redundant index.asciidoc files (#30707) [DOCS] Moves X-Pack configurationg pages in table of contents (#30702) [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717) [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716) [Build] Add test admin when starting gradle run with trial license and [ML] provide tmp storage for forecasting and possibly any ml native jobs #30399 Tests: Fail if test watches could not be triggered (#30392) Watcher: Prevent duplicate watch triggering during upgrade (#30643) [ML] add version information in case of crash of native ML process (#30674) Add detailed assert message to IndexAuditUpgradeIT (#30669) Preserve REST client auth despite 401 response (#30558) Make TransportClusterStateAction abide to our style (#30697) [DOCS] Fixes edit URLs for stack overview (#30583) [DOCS] Add missing callout in IndicesClientDocumentationIT Backport get settings API changes to 6.x (#30494) Silence sleep based watcher test [DOCS] Replace X-Pack terms with attributes Improve explanation in rescore (#30629) [test] packaging: add windows boxes (#30402) [ML] Clean left behind model state docs (#30659) filters agg docs duplicated 'bucket' word removal (#30677) top_hits doc example description update (#30676) MovingFunction Pipeline agg backport to 6.x (#30658) [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673) [TEST] Account for increase in ML C++ memory usage (#30675) User proper write-once semantics for GCS repository (#30438) Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209) Watcher: Fix watch history template for dynamic slack attachments (#30172) Fix _cluster/state to always return cluster_uuid (#30656)
* master: Scripting: Remove getDate methods from ScriptDocValues (#30690) Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726) [Docs] Fix single page :docs:check invocation (#30725) Docs: Add uptasticsearch to list of clients (#30738) [DOCS] Removes out-dated x-pack/docs/en/index.asciidoc [DOCS] Removes redundant index.asciidoc files (#30707) [TEST] Reduce forecast overflow to disk test memory limit (#30727) Plugins: Remove meta plugins (#30670) [DOCS] Moves X-Pack configurationg pages in table of contents (#30702) TEST: Add engine log to testCorruptFileThenSnapshotAndRestore [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717) [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716) Default copy settings to true and deprecate on the REST layer (#30598) [Build] Add test admin when starting gradle run with trial license and This implementation lazily (on 1st forecast request) checks for available diskspace and creates a subfolder for storing data outside of Lucene indexes, but as part of the ES data paths. Tests: Fail if test watches could not be triggered (#30392) [ML] add version information in case of crash of native ML process (#30674) Make TransportClusterStateAction abide to our style (#30697) Change required version for Get Settings transport API changes to 6.4.0 (#30706) [DOCS] Fixes edit URLs for stack overview (#30583) Silence sleep based watcher test [TEST] Adjust version skips for movavg/movfn tests [DOCS] Replace X-Pack terms with attributes [ML] Clean left behind model state docs (#30659) Correct typos filters agg docs duplicated 'bucket' word removal (#30677) top_hits doc example description update (#30676) [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673) [TEST] Account for increase in ML C++ memory usage (#30675) User proper write-once semantics for GCS repository (#30438) Remove bogus file accidentally added Add detailed assert message to IndexAuditUpgradeIT (#30669) Adjust fast forward for token expiration test (#30668) Improve explanation in rescore (#30629) Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209) Watcher: Fix watch history template for dynamic slack attachments (#30172) Fix _cluster/state to always return cluster_uuid (#30656) [Tests] Add debug information to CorruptedFileIT # Conflicts: # test/framework/src/main/java/org/elasticsearch/indices/analysis/AnalysisFactoryTestCase.java
Currently in a rescore request if window_size is smaller than the top N documents returned (N=size), explanation of scores could be incorrect for documents that were a part of topN and not part of rescoring. This PR corrects this, but saving in RescoreContext docIDs of documents for which rescoring was applied, and adding rescoring explanation only for these docIDs. Closes elastic#28725
Currently in a rescore request if window_size is smaller than
the top N documents returned (N=size), explanation of scores could be incorrect
for documents that were a part of topN and not part of rescoring.
This PR corrects this by saving in RescoreContext docIDs of documents
for which rescoring was applied, and adding rescoring explanation
only for these docIDs.
Closes #28725