-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Lucene's FeatureField. #30618
Conversation
Lucene has a new `FeatureField` which gives the ability to record numeric features as term frequencies. Its main benefit is that it allows to boost queries with the values of these features and efficiently skip non-competitive documents at the same time using block-max WAND and indexed impacts.
Pinging @elastic/es-search-aggs |
@jpountz Thanks Adrien, this is a very interesting and necessary feature. Excited to have in elasticsearch! I am wondering if there an intention to index multiple values for a feature. With you current PR, if I try to index multiple values:
I am getting the following in the explanation of query score (looks like multiple values got converted to the max float value): "_explanation": {
"value": 88.72284,
"description": "Log function on the _feature field for the pagerank feature, computed as w * log(a + S) from:",
"details": [
{
"value": 1.0,
"description": "w, weight of this function",
"details": []
},
{
"value": 4.0,
"description": "a, scaling factor",
"details": []
},
{
"value": 3.4028235E38,
"description": "S, feature value",
"details": []
}
]
} |
Thanks @mayya-sharipova, this is a very good observation, we should reject multi-valued fields explicitly! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ways to modify the score, this query has the benefit of being able to | ||
efficiently skip non-competitive hits when | ||
<<search-uri-request,`track_total_hits`>> is set to `false`. Speedups may be | ||
spectacular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
@jpountz Thanks. Just one thing I want to clarify for myself. What does this phrase mean in the explanation below? What is
|
Thanks for testing @mayya-sharipova. It should be |
Actually I read too quickly, the current explanation is correct: the score is computed as
|
@jpountz Thanks for the detailed explanation |
* master: [DOCS] Fixes typos in security settings Fix GeoShapeQueryBuilder serialization after backport [DOCS] Splits auditing.asciidoc into smaller files Reintroduce mandatory http pipelining support (#30820) Painless: Types Section Clean Up (#30283) Add support for indexed shape routing in geo_shape query (#30760) [test] java tests for archive packaging (#30734) Revert "Make http pipelining support mandatory (#30695)" (#30813) [DOCS] Fix more edit URLs in Stack Overview (#30704) Use correct cluster state version for node fault detection (#30810) Change serialization version of doc-value fields. [DOCS] Fixes broken link for native realm [DOCS] Clarified audit.index.client.hosts (#30797) [TEST] Don't expect acks when isolating nodes Add a `format` option to `docvalue_fields`. (#29639) Fixes UpdateSettingsRequestStreamableTests mutate bug Mustes {p0=snapshot.get_repository/10_basic/*} YAML test Revert "Mutes MachineLearningTests.testNoAttributes_givenSameAndMlEnabled" Only allow x-pack metadata if all nodes are ready (#30743) Mutes MachineLearningTests.testNoAttributes_givenSameAndMlEnabled Use original settings on full-cluster restart (#30780) Only ack cluster state updates successfully applied on all nodes (#30672) Expose Lucene's FeatureField. (#30618) Fix a grammatical error in the 'search types' documentation. Remove http pipelining from integration test case (#30788)
Lucene has a new
FeatureField
which gives the ability to record numericfeatures as term frequencies. Its main benefit is that it allows to boost
queries with the values of these features and efficiently skip non-competitive
documents at the same time using block-max WAND and indexed impacts.