Skip to content

Commit

Permalink
Merge branch 'main' into feature/rank-vectors-plugin
Browse files Browse the repository at this point in the history
  • Loading branch information
benwtrent authored Dec 18, 2024
2 parents 239564d + 8c5f0d6 commit 996c6ec
Show file tree
Hide file tree
Showing 31 changed files with 1,330 additions and 111 deletions.
7 changes: 7 additions & 0 deletions docs/changelog/118585.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pr: 118585
summary: Add a generic `rescorer` retriever based on the search request's rescore
functionality
area: Ranking
type: feature
issues:
- 118327
6 changes: 6 additions & 0 deletions docs/changelog/119007.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 119007
summary: Block-writes cannot be added after read-only
area: Data streams
type: bug
issues:
- 119002
121 changes: 120 additions & 1 deletion docs/reference/search/retriever.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ A <<standard-retriever, retriever>> that replaces the functionality of a traditi
`knn`::
A <<knn-retriever, retriever>> that replaces the functionality of a <<search-api-knn, knn search>>.

`rescorer`::
A <<rescorer-retriever, retriever>> that replaces the functionality of the <<rescore, query rescorer>>.

`rrf`::
A <<rrf-retriever, retriever>> that produces top documents from <<rrf, reciprocal rank fusion (RRF)>>.

Expand Down Expand Up @@ -371,6 +374,122 @@ GET movies/_search
----
// TEST[skip:uses ELSER]

[[rescorer-retriever]]
==== Rescorer Retriever

The `rescorer` retriever re-scores only the results produced by its child retriever.
For the `standard` and `knn` retrievers, the `window_size` parameter specifies the number of documents examined per shard.

For compound retrievers like `rrf`, the `window_size` parameter defines the total number of documents examined globally.

When using the `rescorer`, an error is returned if the following conditions are not met:

* The minimum configured rescore's `window_size` is:
** Greater than or equal to the `size` of the parent retriever for nested `rescorer` setups.
** Greater than or equal to the `size` of the search request when used as the primary retriever in the tree.

* And the maximum rescore's `window_size` is:
** Smaller than or equal to the `size` or `rank_window_size` of the child retriever.

[discrete]
[[rescorer-retriever-parameters]]
===== Parameters

`rescore`::
(Required. <<rescore, A rescorer definition or an array of rescorer definitions>>)
+
Defines the <<rescore, rescorers>> applied sequentially to the top documents returned by the child retriever.

`retriever`::
(Required. <<retriever, retriever>>)
+
Specifies the child retriever responsible for generating the initial set of top documents to be re-ranked.

`filter`::
(Optional. <<query-dsl, query object or list of query objects>>)
+
Applies a <<query-dsl-bool-query, boolean query filter>> to the retriever, ensuring that all documents match the filter criteria without affecting their scores.

[discrete]
[[rescorer-retriever-example]]
==== Example

The `rescorer` retriever can be placed at any level within the retriever tree.
The following example demonstrates a `rescorer` applied to the results produced by an `rrf` retriever:

[source,console]
----
GET movies/_search
{
"size": 10, <1>
"retriever": {
"rescorer": { <2>
"rescore": {
"query": { <3>
"window_size": 50, <4>
"rescore_query": {
"script_score": {
"script": {
"source": "cosineSimilarity(params.queryVector, 'product-vector_final_stage') + 1.0",
"params": {
"queryVector": [-0.5, 90.0, -10, 14.8, -156.0]
}
}
}
}
}
},
"retriever": { <5>
"rrf": {
"rank_window_size": 100, <6>
"retrievers": [
{
"standard": {
"query": {
"sparse_vector": {
"field": "plot_embedding",
"inference_id": "my-elser-model",
"query": "films that explore psychological depths"
}
}
}
},
{
"standard": {
"query": {
"multi_match": {
"query": "crime",
"fields": [
"plot",
"title"
]
}
}
}
},
{
"knn": {
"field": "vector",
"query_vector": [10, 22, 77],
"k": 10,
"num_candidates": 10
}
}
]
}
}
}
}
}
----
// TEST[skip:uses ELSER]
<1> Specifies the number of top documents to return in the final response.
<2> A `rescorer` retriever applied as the final step.
<3> The definition of the `query` rescorer.
<4> Defines the number of documents to rescore from the child retriever.
<5> Specifies the child retriever definition.
<6> Defines the number of documents returned by the `rrf` retriever, which limits the available documents to

[[text-similarity-reranker-retriever]]
==== Text Similarity Re-ranker Retriever

Expand Down Expand Up @@ -777,4 +896,4 @@ When a retriever is specified as part of a search, the following elements are no
* <<search-after, `search_after`>>
* <<request-body-search-terminate-after, `terminate_after`>>
* <<search-sort-param, `sort`>>
* <<rescore, `rescore`>>
* <<rescore, `rescore`>> use a <<rescorer-retriever, rescorer retriever>> instead
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
public class EntitlementInitialization {

private static final String POLICY_FILE_NAME = "entitlement-policy.yaml";
private static final Module ENTITLEMENTS_MODULE = PolicyManager.class.getModule();

private static ElasticsearchEntitlementChecker manager;

Expand Down Expand Up @@ -92,7 +93,7 @@ private static PolicyManager createPolicyManager() throws IOException {
"server",
List.of(new Scope("org.elasticsearch.server", List.of(new ExitVMEntitlement(), new CreateClassLoaderEntitlement())))
);
return new PolicyManager(serverPolicy, pluginPolicies, EntitlementBootstrap.bootstrapArgs().pluginResolver());
return new PolicyManager(serverPolicy, pluginPolicies, EntitlementBootstrap.bootstrapArgs().pluginResolver(), ENTITLEMENTS_MODULE);
}

private static Map<String, Policy> createPluginPolicies(Collection<EntitlementBootstrap.PluginData> pluginData) throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import org.elasticsearch.logging.LogManager;
import org.elasticsearch.logging.Logger;

import java.lang.StackWalker.StackFrame;
import java.lang.module.ModuleFinder;
import java.lang.module.ModuleReference;
import java.util.ArrayList;
Expand All @@ -29,6 +30,10 @@
import java.util.stream.Collectors;
import java.util.stream.Stream;

import static java.lang.StackWalker.Option.RETAIN_CLASS_REFERENCE;
import static java.util.Objects.requireNonNull;
import static java.util.function.Predicate.not;

public class PolicyManager {
private static final Logger logger = LogManager.getLogger(ElasticsearchEntitlementChecker.class);

Expand Down Expand Up @@ -63,6 +68,11 @@ public <E extends Entitlement> Stream<E> getEntitlements(Class<E> entitlementCla

private static final Set<Module> systemModules = findSystemModules();

/**
* Frames originating from this module are ignored in the permission logic.
*/
private final Module entitlementsModule;

private static Set<Module> findSystemModules() {
var systemModulesDescriptors = ModuleFinder.ofSystem()
.findAll()
Expand All @@ -77,13 +87,18 @@ private static Set<Module> findSystemModules() {
.collect(Collectors.toUnmodifiableSet());
}

public PolicyManager(Policy defaultPolicy, Map<String, Policy> pluginPolicies, Function<Class<?>, String> pluginResolver) {
this.serverEntitlements = buildScopeEntitlementsMap(Objects.requireNonNull(defaultPolicy));
this.pluginsEntitlements = Objects.requireNonNull(pluginPolicies)
.entrySet()
public PolicyManager(
Policy defaultPolicy,
Map<String, Policy> pluginPolicies,
Function<Class<?>, String> pluginResolver,
Module entitlementsModule
) {
this.serverEntitlements = buildScopeEntitlementsMap(requireNonNull(defaultPolicy));
this.pluginsEntitlements = requireNonNull(pluginPolicies).entrySet()
.stream()
.collect(Collectors.toUnmodifiableMap(Map.Entry::getKey, e -> buildScopeEntitlementsMap(e.getValue())));
this.pluginResolver = pluginResolver;
this.entitlementsModule = entitlementsModule;
}

private static Map<String, List<Entitlement>> buildScopeEntitlementsMap(Policy policy) {
Expand Down Expand Up @@ -185,29 +200,51 @@ private static boolean isServerModule(Module requestingModule) {
return requestingModule.isNamed() && requestingModule.getLayer() == ModuleLayer.boot();
}

private static Module requestingModule(Class<?> callerClass) {
/**
* Walks the stack to determine which module's entitlements should be checked.
*
* @param callerClass when non-null will be used if its module is suitable;
* this is a fast-path check that can avoid the stack walk
* in cases where the caller class is available.
* @return the requesting module, or {@code null} if the entire call stack
* comes from modules that are trusted.
*/
Module requestingModule(Class<?> callerClass) {
if (callerClass != null) {
Module callerModule = callerClass.getModule();
if (systemModules.contains(callerModule) == false) {
// fast path
return callerModule;
}
}
int framesToSkip = 1 // getCallingClass (this method)
+ 1 // the checkXxx method
+ 1 // the runtime config method
+ 1 // the instrumented method
;
Optional<Module> module = StackWalker.getInstance(StackWalker.Option.RETAIN_CLASS_REFERENCE)
.walk(
s -> s.skip(framesToSkip)
.map(f -> f.getDeclaringClass().getModule())
.filter(m -> systemModules.contains(m) == false)
.findFirst()
);
Optional<Module> module = StackWalker.getInstance(RETAIN_CLASS_REFERENCE)
.walk(frames -> findRequestingModule(frames.map(StackFrame::getDeclaringClass)));
return module.orElse(null);
}

/**
* Given a stream of classes corresponding to the frames from a {@link StackWalker},
* returns the module whose entitlements should be checked.
*
* @throws NullPointerException if the requesting module is {@code null}
*/
Optional<Module> findRequestingModule(Stream<Class<?>> classes) {
return classes.map(Objects::requireNonNull)
.map(PolicyManager::moduleOf)
.filter(m -> m != entitlementsModule) // Ignore the entitlements library itself
.filter(not(systemModules::contains)) // Skip trusted JDK modules
.findFirst();
}

private static Module moduleOf(Class<?> c) {
var result = c.getModule();
if (result == null) {
throw new NullPointerException("Entitlements system does not support non-modular class [" + c.getName() + "]");
} else {
return result;
}
}

private static boolean isTriviallyAllowed(Module requestingModule) {
if (requestingModule == null) {
logger.debug("Entitlement trivially allowed: entire call stack is in composed of classes in system modules");
Expand Down
Loading

0 comments on commit 996c6ec

Please sign in to comment.