Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for deep paging for Facets #321

Merged
merged 25 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/sorting.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,16 @@ With the combination of `ISearchResult.Skip` and `maxResults`, we can tell Lucen
* Skip over a certain number of results without allocating them and tell Lucene
* only allocate a certain number of results after skipping

### Deep Paging
When using Lucene.NET as the Examine provider it is possible to more efficiently perform deep paging.
Steps:
1. Build and execute your query as normal.
2. Cast the ISearchResults from IQueryExecutor.Execute to ILuceneSearchResults
3. Store ILuceneSearchResults.SearchAfter (SearchAfterOptions) for the next page.
4. Create the same query as the previous request.
5. When calling IQueryExecutor.Execute. Pass in new LuceneQueryOptions(skip,take, SearchAfterOptions); Skip will be ignored, the next take documents will be retrieved after the SearchAfterOptions document.
6. Repeat Steps 2-5 for each page.

### Example

```cs
Expand Down
12 changes: 12 additions & 0 deletions docs/v2/articles/sorting.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,15 @@ var takeSevenHundredResults = searcher
```

By default when using [`Execute()`](xref:Examine.Search.IQueryExecutor#Examine_Search_IQueryExecutor_Execute_Examine_Search_QueryOptions_) or `Execute(QueryOptions.SkipTake(0))` where no take parameter is provided the take of the search will be set to [`QueryOptions.DefaultMaxResults`](xref:Examine.Search.QueryOptions#Examine_Search_QueryOptions_DefaultMaxResults) (500).

## Deep Paging

When using Lucene.NET as the Examine provider it is possible to more efficiently perform deep paging.
Steps:

1. Build and execute your query as normal.
2. Cast the ISearchResults from IQueryExecutor.Execute to ILuceneSearchResults
3. Store ILuceneSearchResults.SearchAfter (SearchAfterOptions) for the next page.
4. Create the same query as the previous request.
5. When calling IQueryExecutor.Execute. Pass in new LuceneQueryOptions(skip,take, SearchAfterOptions); Skip will be ignored, the next take documents will be retrieved after the SearchAfterOptions document.
6. Repeat Steps 2-5 for each page.
4 changes: 2 additions & 2 deletions src/Examine.Core/Search/QueryOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,12 @@ public QueryOptions(int skip, int? take = null)
}

/// <summary>
/// The ammount of items to skip
/// The number of documents to skip in the result set.
/// </summary>
public int Skip { get; }

/// <summary>
/// The ammount of items to take
/// The number of documents to take in the result set.
/// </summary>
public int Take { get; }
}
Expand Down
19 changes: 19 additions & 0 deletions src/Examine.Lucene/Search/ILuceneSearchResults.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
namespace Examine.Lucene.Search
{
/// <summary>
/// Lucene.NET Search Results
/// </summary>
public interface ILuceneSearchResults : ISearchResults
{
/// <summary>
/// Options for Searching After. Used for efficent deep paging.
/// </summary>
SearchAfterOptions SearchAfter { get; }

/// <summary>
/// Returns the maximum score value encountered. Note that in case
/// scores are not tracked, this returns <see cref="float.NaN"/>.
/// </summary>
float MaxScore { get; }
}
}
41 changes: 41 additions & 0 deletions src/Examine.Lucene/Search/LuceneQueryOptions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
using Examine.Search;

namespace Examine.Lucene.Search
{
/// <summary>
/// Lucene.NET specific query options
/// </summary>
public class LuceneQueryOptions : QueryOptions
{
/// <summary>
/// Constructor
/// </summary>
/// <param name="skip">Number of result documents to skip.</param>
/// <param name="take">Optional number of result documents to take.</param>
/// <param name="searchAfter">Optionally skip to results after the results from the previous search execution. Used for efficent deep paging.</param>
/// <param name="trackDocumentMaxScore">Whether to track the maximum document score. For best performance, if not needed, leave false.</param>
/// <param name="trackDocumentScores">Whether to Track Document Scores. For best performance, if not needed, leave false.</param>
public LuceneQueryOptions(int skip, int? take = null, SearchAfterOptions searchAfter = null, bool trackDocumentScores = false, bool trackDocumentMaxScore = false)
: base(skip, take)
{
TrackDocumentScores = trackDocumentScores;
TrackDocumentMaxScore = trackDocumentMaxScore;
SearchAfter = searchAfter;
}

/// <summary>
/// Whether to Track Document Scores. For best performance, if not needed, leave false.
/// </summary>
public bool TrackDocumentScores { get; }

/// <summary>
/// Whether to track the maximum document score. For best performance, if not needed, leave false.
/// </summary>
public bool TrackDocumentMaxScore { get; }

/// <summary>
/// Options for Searching After. Used for efficent deep paging.
/// </summary>
public SearchAfterOptions SearchAfter { get; }
}
}
144 changes: 113 additions & 31 deletions src/Examine.Lucene/Search/LuceneSearchExecutor.cs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ namespace Examine.Lucene.Search
public class LuceneSearchExecutor
{
private readonly QueryOptions _options;
private readonly LuceneQueryOptions _luceneQueryOptions;
private readonly IEnumerable<SortField> _sortField;
private readonly ISearchContext _searchContext;
private readonly Query _luceneQuery;
Expand All @@ -29,6 +30,7 @@ public class LuceneSearchExecutor
internal LuceneSearchExecutor(QueryOptions? options, Query query, IEnumerable<SortField> sortField, ISearchContext searchContext, ISet<string>? fieldsToLoad, IEnumerable<IFacetField> facetFields, FacetsConfig facetsConfig)
{
_options = options ?? QueryOptions.Default;
_luceneQueryOptions = _options as LuceneQueryOptions;
_luceneQuery = query ?? throw new ArgumentNullException(nameof(query));
_fieldsToLoad = fieldsToLoad;
_sortField = sortField ?? throw new ArgumentNullException(nameof(sortField));
Expand Down Expand Up @@ -89,59 +91,139 @@ public ISearchResults Execute()

var maxResults = Math.Min((_options.Skip + 1) * _options.Take, MaxDoc);
maxResults = maxResults >= 1 ? maxResults : QueryOptions.DefaultMaxResults;
int numHits = maxResults;

ICollector topDocsCollector;
SortField[] sortFields = _sortField as SortField[] ?? _sortField.ToArray();
if (sortFields.Length > 0)
{
topDocsCollector = TopFieldCollector.Create(
new Sort(sortFields), maxResults, false, false, false, false);
}
else
{
topDocsCollector = TopScoreDocCollector.Create(maxResults, true);
}
Sort sort = null;
FieldDoc scoreDocAfter = null;
Filter filter = null;

using (ISearcherReference searcher = _searchContext.GetSearcher())
{
FacetsCollector? facetsCollector;
if(_facetFields.Any())
if (sortFields.Length > 0)
{
facetsCollector = new FacetsCollector();
searcher.IndexSearcher.Search(_luceneQuery, MultiCollector.Wrap(topDocsCollector, facetsCollector));
sort = new Sort(sortFields);
sort.Rewrite(searcher.IndexSearcher);
}
else
if (_luceneQueryOptions != null && _luceneQueryOptions.SearchAfter != null)
{
facetsCollector = null;
searcher.IndexSearcher.Search(_luceneQuery, topDocsCollector);
//The document to find results after.
scoreDocAfter = GetScoreDocAfter(_luceneQueryOptions);

// We want to only collect only the actual number of hits we want to take after the last document. We don't need to collect all previous/next docs.
numHits = _options.Take >= 1 ? _options.Take : QueryOptions.DefaultMaxResults;
}

TopDocs topDocs;
ICollector topDocsCollector;
bool trackMaxScore = _luceneQueryOptions == null ? false : _luceneQueryOptions.TrackDocumentMaxScore;
bool trackDocScores = _luceneQueryOptions == null ? false : _luceneQueryOptions.TrackDocumentScores;

if (sortFields.Length > 0)
{
topDocs = ((TopFieldCollector)topDocsCollector).GetTopDocs(_options.Skip, _options.Take);
bool fillFields = true;
topDocsCollector = TopFieldCollector.Create(sort, numHits, scoreDocAfter, fillFields, trackDocScores, trackMaxScore, false);
}
else
{
topDocs = ((TopScoreDocCollector)topDocsCollector).GetTopDocs(_options.Skip, _options.Take);
topDocsCollector = TopScoreDocCollector.Create(numHits, scoreDocAfter, true);
}
FacetsCollector facetsCollector = null;
if (_facetFields.Any())
{
facetsCollector = new FacetsCollector();
}

if (scoreDocAfter != null && sort != null)
{
if (facetsCollector != null)
{
topDocs = FacetsCollector.SearchAfter(searcher.IndexSearcher, scoreDocAfter, _luceneQuery, filter, _options.Take, sort, MultiCollector.Wrap(topDocsCollector, facetsCollector));
}
else
{
topDocs = searcher.IndexSearcher.SearchAfter(scoreDocAfter, _luceneQuery, filter, _options.Take, sort, trackDocScores, trackMaxScore);
}
}
else if (scoreDocAfter != null && sort == null)
{
if (facetsCollector != null)
{
topDocs = facetsCollector.SearchAfter(searcher.IndexSearcher, scoreDocAfter, _luceneQuery, _options.Take, MultiCollector.Wrap(topDocsCollector, facetsCollector));
}
else
{
topDocs = searcher.IndexSearcher.SearchAfter(scoreDocAfter, _luceneQuery, _options.Take);
}
}
else
{
searcher.IndexSearcher.Search(_luceneQuery, MultiCollector.Wrap(topDocsCollector, facetsCollector));
if (sortFields.Length > 0)
{
topDocs = ((TopFieldCollector)topDocsCollector).GetTopDocs(_options.Skip, _options.Take);
}
else
{
topDocs = ((TopScoreDocCollector)topDocsCollector).GetTopDocs(_options.Skip, _options.Take);
}
}

var totalItemCount = topDocs.TotalHits;

var results = new List<ISearchResult>();
var results = new List<ISearchResult>(topDocs.ScoreDocs.Length);
for (int i = 0; i < topDocs.ScoreDocs.Length; i++)
{
var result = GetSearchResult(i, topDocs, searcher.IndexSearcher);
if(result != null)
if (result != null)
{
results.Add(result);
}
}

var searchAfterOptions = GetSearchAfterOptions(topDocs);
float maxScore = topDocs.MaxScore;
var facets = ExtractFacets(facetsCollector, searcher);

return new LuceneSearchResults(results, totalItemCount, facets);
return new LuceneSearchResults(results, totalItemCount, facets, maxScore, searchAfterOptions);
}
}

private static FieldDoc GetScoreDocAfter(LuceneQueryOptions luceneQueryOptions)
{
FieldDoc scoreDocAfter;
var searchAfter = luceneQueryOptions.SearchAfter;

object[] searchAfterSortFields = new object[0];
if (luceneQueryOptions.SearchAfter.Fields != null && luceneQueryOptions.SearchAfter.Fields.Length > 0)
{
searchAfterSortFields = luceneQueryOptions.SearchAfter.Fields;
}
if (searchAfter.ShardIndex != null)
{
scoreDocAfter = new FieldDoc(searchAfter.DocumentId, searchAfter.DocumentScore, searchAfterSortFields, searchAfter.ShardIndex.Value);
}
else
{
scoreDocAfter = new FieldDoc(searchAfter.DocumentId, searchAfter.DocumentScore, searchAfterSortFields);
}

return scoreDocAfter;
}

private static SearchAfterOptions GetSearchAfterOptions(TopDocs topDocs)
{
if (topDocs.TotalHits > 0)
{
if (topDocs.ScoreDocs.LastOrDefault() is FieldDoc lastFieldDoc && lastFieldDoc != null)
{
return new SearchAfterOptions(lastFieldDoc.Doc, lastFieldDoc.Score, lastFieldDoc.Fields?.ToArray(), lastFieldDoc.ShardIndex);
}
if (topDocs.ScoreDocs.LastOrDefault() is ScoreDoc scoreDoc && scoreDoc != null)
{
return new SearchAfterOptions(scoreDoc.Doc, scoreDoc.Score, new object[0], scoreDoc.ShardIndex);
}
}
return null;
}

private IReadOnlyDictionary<string, IFacetResult> ExtractFacets(FacetsCollector? facetsCollector, ISearcherReference searcher)
Expand All @@ -156,15 +238,15 @@ private IReadOnlyDictionary<string, IFacetResult> ExtractFacets(FacetsCollector?

SortedSetDocValuesReaderState? sortedSetReaderState = null;

foreach(var field in facetFields)
foreach (var field in facetFields)
{
var valueType = _searchContext.GetFieldValueType(field.Field);
if(valueType is IIndexFacetValueType facetValueType)
if (valueType is IIndexFacetValueType facetValueType)
{
var facetExtractionContext = new LuceneFacetExtractionContext(facetsCollector, searcher, _facetsConfig);

var fieldFacets = facetValueType.ExtractFacets(facetExtractionContext, field);
foreach(var fieldFacet in fieldFacets)
foreach (var fieldFacet in fieldFacets)
{
// overwrite if necessary (no exceptions thrown in case of collision)
facets[fieldFacet.Key] = fieldFacet.Value;
Expand Down Expand Up @@ -198,8 +280,8 @@ private IReadOnlyDictionary<string, IFacetResult> ExtractFacets(FacetsCollector?
doc = luceneSearcher.Doc(docId);
}
var score = scoreDoc.Score;
var result = CreateSearchResult(doc, score);

var shardIndex = scoreDoc.ShardIndex;
var result = CreateSearchResult(doc, score, shardIndex);
return result;
}

Expand All @@ -209,7 +291,7 @@ private IReadOnlyDictionary<string, IFacetResult> ExtractFacets(FacetsCollector?
/// <param name="doc">The doc to convert.</param>
/// <param name="score">The score.</param>
/// <returns>A populated search result object</returns>
private ISearchResult CreateSearchResult(Document doc, float score)
private LuceneSearchResult CreateSearchResult(Document doc, float score, int shardIndex)
{
var id = doc.Get("id");

Expand All @@ -218,7 +300,7 @@ private ISearchResult CreateSearchResult(Document doc, float score)
id = doc.Get(ExamineFieldNames.ItemIdFieldName);
}

var searchResult = new SearchResult(id, score, () =>
var searchResult = new LuceneSearchResult(id, score, () =>
{
//we can use lucene to find out the fields which have been stored for this particular document
var fields = doc.Fields;
Expand Down Expand Up @@ -247,7 +329,7 @@ private ISearchResult CreateSearchResult(Document doc, float score)
}

return resultVals;
});
}, shardIndex);

return searchResult;
}
Expand Down
14 changes: 13 additions & 1 deletion src/Examine.Lucene/Search/LuceneSearchExtensions.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
using System;
using System;
using Examine.Search;
using Lucene.Net.Search;

Expand Down Expand Up @@ -50,5 +50,17 @@ public static BooleanOperation ToBooleanOperation(this Occur o)
return BooleanOperation.Or;
}
}
/// <summary>
/// Executes the query
/// </summary>
public static ILuceneSearchResults ExecuteWithLucene(this IQueryExecutor queryExecutor, QueryOptions options = null)
{
var results = queryExecutor.Execute(options);
if (results is ILuceneSearchResults luceneSearchResults)
{
return luceneSearchResults;
}
throw new NotSupportedException("QueryExecutor is not Lucene.NET");
}
}
}
19 changes: 19 additions & 0 deletions src/Examine.Lucene/Search/LuceneSearchResult.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Examine.Lucene.Search
{
public class LuceneSearchResult : SearchResult, ISearchResult
{
public LuceneSearchResult(string id, float score, Func<IDictionary<string, List<string>>> lazyFieldVals, int shardId)
: base(id, score, lazyFieldVals)
{
ShardIndex = shardId;
}

public int ShardIndex { get; }
}
}
Loading