Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding option to codec to disable patching in Lucene's PFOR encoding #12696

Closed
slow-J opened this issue Oct 18, 2023 · 24 comments · Fixed by #12741
Closed

Adding option to codec to disable patching in Lucene's PFOR encoding #12696

slow-J opened this issue Oct 18, 2023 · 24 comments · Fixed by #12741

Comments

@slow-J
Copy link
Contributor

slow-J commented Oct 18, 2023

Description

Background: In https://github.com/Tony-X/search-benchmark-game we were comparing performance of Tantivy and Lucene. "One difference between Lucene and Tantivy is Lucene uses the "patch" FOR, meaning the large values in a block are held out as exceptions so that the remaining values can use a smaller number of bits to encode, a tradeoff of CPU for lower storage space." In Tony-X/search-benchmark-game#46 , I disable the patching in Lucene, to match how Tantivy encodes and run the search-benchmark-game to test the change.

Lucene modifications for testing: I cloned the pforUtil and removed all logic related to patching the exceptions. I modified the Lucene90PostingsReader + Writer to use the util with no patching logic, see sample code slow-J@83ec5a8
Hardware used: EC2 Graviton2 instance, m6g.4xlarge

Results from the search-benchmark-game: Tony-X/search-benchmark-game#46 (comment)
We saw Lucene's latency improve: -2% in COUNT, -2% in TOP_10_COUNT, -2.07% in TOP_100.

I then ran a Lucene benchmark with luceneutil python3 src/python/localrun.py -source wikimediumall -r
Hardware used: EC2 instance, m5.12xlarge

Posting results below

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            BrowseDateSSDVFacets        0.95      (4.5%)        0.95      (5.9%)   -1.0% ( -10% -    9%) 0.566
               HighTermMonthSort     2558.85      (2.6%)     2539.66      (4.6%)   -0.7% (  -7% -    6%) 0.526
     BrowseRandomLabelSSDVFacets        2.48      (4.3%)        2.47      (1.4%)   -0.7% (  -6% -    5%) 0.486
                         Prefix3      139.77      (1.7%)      139.41      (3.1%)   -0.3% (  -4% -    4%) 0.735
     BrowseRandomLabelTaxoFacets        3.38      (1.5%)        3.37      (1.8%)   -0.2% (  -3% -    3%) 0.689
                          Fuzzy2       50.20      (2.0%)       50.11      (2.1%)   -0.2% (  -4% -    3%) 0.776
                         MedTerm      439.13      (3.9%)      438.94      (4.3%)   -0.0% (  -7% -    8%) 0.974
           BrowseMonthSSDVFacets        3.48      (1.8%)        3.48      (1.5%)    0.0% (  -3% -    3%) 0.949
                         Respell       37.97      (2.9%)       37.99      (2.7%)    0.1% (  -5% -    5%) 0.945
                         LowTerm      323.73      (4.2%)      323.93      (3.9%)    0.1% (  -7% -    8%) 0.960
                          Fuzzy1       45.01      (2.5%)       45.06      (2.6%)    0.1% (  -4% -    5%) 0.896
                        PKLookup      154.24      (2.6%)      154.41      (2.1%)    0.1% (  -4% -    4%) 0.886
                        Wildcard       71.96      (1.4%)       72.04      (1.9%)    0.1% (  -3% -    3%) 0.823
       BrowseDayOfYearSSDVFacets        3.32      (1.8%)        3.33      (1.9%)    0.2% (  -3% -    4%) 0.789
                          IntNRQ       27.77      (8.5%)       27.84      (9.6%)    0.3% ( -16% -   19%) 0.929
                    OrHighNotLow      183.77      (6.9%)      184.66      (5.4%)    0.5% ( -11% -   13%) 0.805
            HighTermTitleBDVSort        4.81      (3.0%)        4.83      (3.0%)    0.5% (  -5% -    6%) 0.566
                        HighTerm      496.37      (5.5%)      499.12      (5.2%)    0.6% (  -9% -   11%) 0.745
                      TermDTSort      109.66      (1.4%)      110.31      (1.3%)    0.6% (  -1% -    3%) 0.155
                HighSloppyPhrase       23.39      (4.2%)       23.56      (3.8%)    0.7% (  -7% -    9%) 0.580
           BrowseMonthTaxoFacets        3.88      (2.5%)        3.91      (1.7%)    0.8% (  -3% -    5%) 0.261
       BrowseDayOfYearTaxoFacets        3.89      (2.5%)        3.92      (0.9%)    0.8% (  -2% -    4%) 0.171
            BrowseDateTaxoFacets        3.88      (2.4%)        3.91      (0.8%)    0.9% (  -2% -    4%) 0.117
                   OrHighNotHigh      270.96      (5.3%)      273.88      (4.2%)    1.1% (  -7% -   11%) 0.474
                    OrHighNotMed      202.84      (6.3%)      205.09      (4.7%)    1.1% (  -9% -   12%) 0.531
            HighIntervalsOrdered        4.36      (4.2%)        4.41      (4.3%)    1.1% (  -7% -   10%) 0.399
               HighTermTitleSort       20.12      (5.5%)       20.35      (5.6%)    1.2% (  -9% -   12%) 0.511
            MedTermDayTaxoFacets        9.42      (3.2%)        9.53      (4.3%)    1.2% (  -6% -    8%) 0.325
                      OrHighHigh       12.07      (4.9%)       12.22      (4.9%)    1.3% (  -8% -   11%) 0.415
                    HighSpanNear        9.92      (3.5%)       10.06      (3.3%)    1.3% (  -5% -    8%) 0.212
                   OrNotHighHigh      356.64      (4.5%)      361.84      (3.3%)    1.5% (  -6% -    9%) 0.242
             MedIntervalsOrdered        3.09      (2.9%)        3.14      (3.0%)    1.5% (  -4% -    7%) 0.119
          OrHighMedDayTaxoFacets        1.54      (4.6%)        1.56      (5.3%)    1.5% (  -8% -   11%) 0.338
                    OrNotHighMed      346.18      (3.3%)      351.65      (3.6%)    1.6% (  -5% -    8%) 0.144
                 LowSloppyPhrase       21.58      (2.6%)       21.94      (2.0%)    1.7% (  -2% -    6%) 0.025
                       OrHighMed       48.51      (3.4%)       49.50      (3.5%)    2.0% (  -4% -    9%) 0.062
                     MedSpanNear       26.76      (1.4%)       27.35      (2.2%)    2.2% (  -1% -    5%) 0.000
           HighTermDayOfYearSort      223.99      (2.7%)      228.97      (2.1%)    2.2% (  -2% -    7%) 0.004
                       OrHighLow      208.96      (3.2%)      213.65      (3.0%)    2.2% (  -3% -    8%) 0.022
                     AndHighHigh       13.93      (3.6%)       14.25      (3.7%)    2.4% (  -4% -   10%) 0.042
                 MedSloppyPhrase       19.45      (2.5%)       19.92      (2.0%)    2.4% (  -1% -    7%) 0.001
        AndHighHighDayTaxoFacets        2.12      (3.9%)        2.17      (5.2%)    2.4% (  -6% -   11%) 0.094
                     LowSpanNear        8.99      (1.6%)        9.24      (1.4%)    2.7% (   0% -    5%) 0.000
             LowIntervalsOrdered        7.46      (2.6%)        7.67      (3.1%)    2.8% (  -2% -    8%) 0.002
                      HighPhrase       12.54      (2.8%)       12.91      (3.3%)    2.9% (  -3% -    9%) 0.003
                       MedPhrase       25.45      (1.8%)       26.22      (2.1%)    3.0% (   0% -    6%) 0.000
         AndHighMedDayTaxoFacets       16.49      (1.7%)       17.08      (2.3%)    3.6% (   0% -    7%) 0.000
                      AndHighMed       91.80      (2.2%)       95.26      (2.3%)    3.8% (   0% -    8%) 0.000
                       LowPhrase      144.04      (1.5%)      150.03      (2.0%)    4.2% (   0% -    7%) 0.000
                    OrNotHighLow      315.83      (1.8%)      330.04      (2.6%)    4.5% (   0% -    9%) 0.000
                      AndHighLow      352.57      (2.7%)      379.44      (3.7%)    7.6% (   1% -   14%) 0.000

The tasks at the bottom of the table had the larger QPS improvement.
AndHighLow has a QPS improvement of +7.6%!

Size of test candidate index: 17.626 GiB total
Size of test baseline index: 18.401 GiB total

This change would bring a 4.39691% increase in index size.

Proposal

I propose adding an option to Lucene's codec that allows users to disable patching in the PFOR encoding, providing the option to leverage the performance benefits observed here at a cost of index size.
I would appreciate all feedback and further evaluation of this idea by the Lucene community.

@gsmiller
Copy link
Contributor

These results are really interesting! As another option, I wonder if it's worth thinking about this problem as a new codec (sandbox module to start?) that biases towards query speed instead of index size? There may be other decisions we would make differently in a codec biasing in that direction beyond FOR patching. I dunno... maybe that's a terrible idea with a "slippery slope" problem. But I also worry a bit about adding configuration switches to a default codec that need to be maintained. One-size-fits-all solutions are indeed challenging...

@mikemccand
Copy link
Member

That's a neat idea (separate codec that trades off index size for faster search performance). Maybe it could also fold in the fully in RAM FST term dictionary that @Tony-X is working on, if that is a nice speedup.

But some of our Codec formats, e.g. for stored fields, have two options to make the tradeoff an explicit choice by the user (BEST_COMPRESSION vs BEST_SPEED). Maybe, if this new sandbox Codec works out, some of its tradeoffs could be folded into the default Codec with similar constants to make the tradeoff explicit.

Also, "no patching" is something we already support at read-time since some blocks today will legitimately have no patching.

@mikemccand
Copy link
Member

Posting results below

The results are impressive! Conjunctive (-like) queries see sizable gains.

Did you turn off patching for all encoded int[] blocks (docs, freqs, positions)?

This change would bring a 4.39691% increase in index size.

I'm curious: did you just force no patching at write time, but still write a header into each block saying "there are 0 patches"? If so, we could save a bit of space by removing that header entirely since it'll always be 0), and perhaps gain a bit of performance by not having to check that header at read time.

@mikemccand
Copy link
Member

Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this).

Today, our skipper is forced to align to block boundaries, so when we skip to a given docid, we go to the block that may contain this docid, decode all 128 int[], then linearly scan within those 128 ints. This is quite a bit of overhead for each skip request!

If we could lower that linear scan cost to maybe 16 or 8 or something, the conjunctive queries should get even faster. But perhaps it becomes trickier to take advantage of SIMD optimizations if we are decoding a subset of ints, not sure.

@slow-J
Copy link
Contributor Author

slow-J commented Oct 19, 2023

Did you turn off patching for all encoded int[] blocks (docs, freqs, positions)?

Yes, I think so. All uses of pforUtil in the postingsReader and writer were replaced with the no patching util.

I'm curious: did you just force no patching at write time, but still write a header into each block saying "there are 0 patches"? If so, we could save a bit of space by removing that header entirely since it'll always be 0), and perhaps gain a bit of performance by not having to check that header at read time.

I essentially set the number of exceptions to 0 and removed related logic, still keeping the header with 0 patches. I'll iterate and remove this to see the difference.
Since exceptions are set to 0, the exceptions footer is not written as it is length 0, this is confirmed as we are not decoding it since that logic was removed.

@rmuir
Copy link
Member

rmuir commented Oct 21, 2023

Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is more expensive because then we have to make sure indexes encoded both ways can be read, it only adds more complexity imo

@mikemccand
Copy link
Member

Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is more expensive because then we have to make sure indexes encoded both ways can be read, it only adds more complexity imo

+1 to remove patching entirely!

@slow-J
Copy link
Contributor Author

slow-J commented Oct 23, 2023

If we want to remove the patching entirely, which Lucene version (and which Codec) should we implement this in? Would this be a potential change for Lucene 9.9 or perhaps 10.0?

Are there any additional corpora that we should also test this with?

@mikemccand
Copy link
Member

Are there any additional corpora that we should also test this with?

Maybe the NYC taxis? This is a more sparse, and tiny docs (vs dense and medium/large docs in enwiki). The tooling for indexing the NYC taxis corpus is already in luceneutil (it runs nightly: https://home.apache.org/~mikemccand/lucenebench/sparseResults.html). This is a nice counter-point to enwiki.

Would this be a potential change for Lucene 9.9 or perhaps 10.0?

That's a good question. It is a very low level index format change, and no API change. It would be fully back-compat whether we release in 9.9 vs 10.0. I don't see why we should withhold the change until 10.0, so maybe 9.9?

@Tony-X
Copy link
Contributor

Tony-X commented Oct 23, 2023

It would be fully back-compat whether we release in 9.9 vs 10.0.

Hmm, can you elaborate how it can be fully backwards-compatible on with the indexes that have patching?

Is the assumption that we will introduce an option to disable patching? I thought we are thinking to remove it entirely. Maybe I missed something...

@Tony-X
Copy link
Contributor

Tony-X commented Oct 23, 2023

It is a lot of complexity, especially to vectorize.

+1. I recalled that @gsmiller was playing with some SIMD algos for decoding blocks of delta-encoded ints. Even if that is fruitful it'd be tricky to apply it because of the patching.

@msokolov
Copy link
Contributor

Hmm, can you elaborate how it can be fully backwards-compatible on with the indexes that have patching?

I think the idea is that because we always maintain readers that can read prior index versions, we will be able to read the old patched indexes, but we would only write new ones lacking patching (which might not be readable by old readers, i.e. forwards-compatible, if we changed the metadata).

@gsmiller
Copy link
Contributor

gsmiller commented Oct 23, 2023

I like the idea of removing the complexity associated with patching if we're convinced it's the right trade-off (and +1 to the pain of vectorizing with patching going away).

Also +1 to releasing with 9.9 and not waiting. To address the back-compat questions a little bit, along with some of Mike's earlier questions, the number of patches is written into the same byte used to encode the number of bits-per-value used for the FOR block. We only need 5 bits to encode the b.p.v., and we reserve the remaining 3 to encode the number of patches (which is why we have an upper bound of 7 patches currently). So we can always "write" 0 patches in this byte and remain fully backwards compatible on read, which is great (but it also means we can't claw back some saving by getting rid of the space in the index needed to encode the patches... to answer Mike's earlier question a bit).

So I think the roll out might look something like this:

  1. In 9.9 and later, always write zero patches when creating the index. Leave the code in place to handle patches if present.
  2. In 10.x, continue to support patches on read (as in 9.x) so 10.x remains compatible with 9.x indices.
  3. In 11.0, remove all patching logic which will, a) simplify the code a bit, and b) remove the (likely minor) overhead on read of looking up the number of patches in a block, which is always 0.

This strategy gives us the added benefit of keeping the patching code around for a little while in case we get some unexpected flood of user complaints after patching is removed (would make it easier to reverse the decision). I doubt that will happen, but it's a nice side effect.

@gsmiller
Copy link
Contributor

+1. I recalled that @gsmiller was playing with some SIMD algos for decoding blocks of delta-encoded ints. Even if that is fruitful it'd be tricky to apply it because of the patching.

Yes, that's right. There was some experimentation with this algorithm for vectorized prefix sum.

@Tony-X
Copy link
Contributor

Tony-X commented Oct 23, 2023

In 11.0, remove all patching logic which will, a) simplify the code a bit, and b) remove the (likely minor) overhead on read of looking up the number of patches in a block, which is always 0.

I like to think of it as an index-level property rather than a block-level property. We can just check once instead of doing it per-block.

Maybe write something in the index header to indicate if patching is there (default to yes - in 9.x ). Then new indexes will write additional header to indicate there is not patching in the whole index (10.x behavior). Then remove the header as well as the patching support altogether in 11.x.

@gsmiller
Copy link
Contributor

Maybe write something in the index header to indicate if patching is there (default to yes - in 9.x ). Then new indexes will write additional header to indicate there is not patching in the whole index (10.x behavior). Then remove the header as well as the patching support altogether in 11.x.

@Tony-X would the goal here be to eliminate overhead of having to read the number of patches when decoding each block? Or is there more to it than that?

@Tony-X
Copy link
Contributor

Tony-X commented Oct 23, 2023

would the goal here be to eliminate overhead of having to read the number of patches when decoding each block?

Yes. This means we could know upfront at segment opening time which code to use (FOR or PatchedFOR), as opposed to use PatchedFOR code to deal with blocks that don't have patches.

@jpountz
Copy link
Contributor

jpountz commented Oct 25, 2023

For reference, Lucene used to use FOR for postings and PFOR for positions in 8.x. This was changed in 9.0 via #69 to use PFOR for both postings and positions. This PR says it made the index 3% smaller with no performance impact, but I can believe that we are noticing an impact now as many things changed in the meantime. I'm +1 to switching back to FOR if it yields better performance.

I have a preference for keeping PFOR for positions and only moving postings to FOR (essentially reverting #69). The benchmark in this issue description used wikimedium, which by design doesn't have much position data since all documents are truncated. Using PFOR for positions and FOR for postings sounds like a good trade-off to me as positions are less important for performance typically. And if someone wants better performance for their phrase queries, it would likely be a better idea to use a CommonGramsFilter than to switch positions from PFOR to FOR?

I remember we observed a 15% reduction of our inverted indexes for logs when Lucene moved from FOR to PFOR at Elastic, but I don't think it should block this change, Elasticsearch can maintain its own postings format that uses PFOR for postings. I'm just mentioning it as a way to highlight that I'm expecting that some users will observe an increased disk usage that is more than 3%.

Regarding backward compatibility, let's do it with codecs as usual: fork Lucene90PostingsFormat into a new Lucene99PostingsFormat that uses PFOR for postings. Then the codec infrastructure will make sure to keep using the old postings format for existing segments and the new postings format for new segments (including merged ones).

@jpountz
Copy link
Contributor

jpountz commented Oct 27, 2023

FWIW I could reproduce the speedup from disabling patching locally on wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                HighSloppyPhrase        1.58      (4.9%)        1.55      (9.5%)   -1.9% ( -15% -   13%) 0.441
                       CountTerm    13139.34      (3.0%)    12937.65      (3.1%)   -1.5% (  -7% -    4%) 0.111
               HighTermMonthSort     4393.07      (1.5%)     4371.13      (2.1%)   -0.5% (  -4% -    3%) 0.396
                         Prefix3      270.78      (3.4%)      271.17      (3.2%)    0.1% (  -6% -    7%) 0.889
                          Fuzzy1      100.16      (1.1%)      100.42      (0.9%)    0.3% (  -1% -    2%) 0.420
                          Fuzzy2       70.62      (1.1%)       70.84      (1.0%)    0.3% (  -1% -    2%) 0.340
                      HighPhrase       16.41      (3.8%)       16.49      (5.3%)    0.5% (  -8% -    9%) 0.738
                         Respell       53.35      (1.8%)       53.62      (1.6%)    0.5% (  -2% -    4%) 0.351
                        HighTerm      418.58      (9.3%)      421.91      (7.7%)    0.8% ( -14% -   19%) 0.770
                 LowSloppyPhrase        9.87      (2.5%)        9.96      (5.9%)    0.9% (  -7% -    9%) 0.544
                        Wildcard       94.17      (2.8%)       95.04      (3.3%)    0.9% (  -5% -    7%) 0.341
                         MedTerm      553.18      (8.3%)      559.16      (6.9%)    1.1% ( -13% -   17%) 0.656
                         LowTerm      784.38      (7.0%)      793.23      (5.7%)    1.1% ( -10% -   14%) 0.575
                        PKLookup      264.32      (2.8%)      267.44      (2.2%)    1.2% (  -3% -    6%) 0.138
                    HighSpanNear        5.37      (3.2%)        5.44      (3.3%)    1.3% (  -5% -    8%) 0.213
                       OrHighLow      590.78      (3.0%)      598.60      (2.6%)    1.3% (  -4% -    7%) 0.132
                       LowPhrase       27.97      (3.7%)       28.42      (4.9%)    1.6% (  -6% -   10%) 0.245
                     LowSpanNear       11.09      (2.1%)       11.30      (2.2%)    1.8% (  -2% -    6%) 0.007
                     MedSpanNear        7.39      (3.3%)        7.53      (3.5%)    1.9% (  -4% -    8%) 0.079
                      AndHighLow      819.00      (2.9%)      835.98      (2.4%)    2.1% (  -3% -    7%) 0.015
                       MedPhrase       91.27      (3.4%)       93.33      (4.6%)    2.3% (  -5% -   10%) 0.078
           HighTermDayOfYearSort      447.47      (1.8%)      457.76      (1.8%)    2.3% (  -1% -    5%) 0.000
                 MedSloppyPhrase       16.79      (2.4%)       17.25      (4.6%)    2.7% (  -4% -    9%) 0.017
                       OrHighMed      157.99      (2.2%)      162.38      (2.3%)    2.8% (  -1% -    7%) 0.000
                      OrHighHigh       67.77      (1.7%)       69.71      (1.9%)    2.9% (   0% -    6%) 0.000
                     AndHighHigh       48.92      (1.8%)       50.67      (2.1%)    3.6% (   0% -    7%) 0.000
                      AndHighMed      174.71      (2.3%)      181.03      (2.6%)    3.6% (  -1% -    8%) 0.000
                CountAndHighHigh       38.11      (4.4%)       39.65      (5.1%)    4.1% (  -5% -   14%) 0.007
                 CountOrHighHigh       56.07     (16.2%)       58.48     (18.5%)    4.3% ( -26% -   46%) 0.435
                  CountOrHighMed       86.95     (16.0%)       90.94     (18.3%)    4.6% ( -25% -   46%) 0.398
                 CountAndHighMed      116.35      (3.4%)      121.78      (4.8%)    4.7% (  -3% -   13%) 0.000
                     CountPhrase        3.26     (11.2%)        3.43     (13.1%)    5.1% ( -17% -   33%) 0.187
                          IntNRQ      146.09     (35.1%)      171.55     (36.7%)   17.4% ( -40% -  137%) 0.124

.doc files were 12% larger overall (2.64GB to 2.96GB), .pos files were 11% larger (11.03GB to 12.24GB), and the index was 9.7% larger (15.66GB to 17.18GB).

@mikemccand
Copy link
Member

Thanks for testing @jpountz.

I think at some point we also enabled patching for the freq blocks inside .doc file too?

Normally the IntNRQ (1D points numeric range query) is very noisy, but maybe this gain is real? p-value seems to think it could be close to real?

The conjunctive and disjunctive gains are awesome.

Regarding backward compatibility, let's do it with codecs as usual: fork Lucene90PostingsFormat into a new Lucene99PostingsFormat that uses PFOR for postings. Then the codec infrastructure will make sure to keep using the old postings format for existing segments and the new postings format for new segments (including merged ones).

+1

I have a preference for keeping PFOR for positions and only moving postings to FOR (essentially reverting #69).

+1

This change might make SIMD decoding more palatable for the unpached int[] blocks (doc, freq).

@jpountz
Copy link
Contributor

jpountz commented Oct 31, 2023

Normally the IntNRQ (1D points numeric range query) is very noisy, but maybe this gain is real? p-value seems to think it could be close to real?

I'm not sure how it could not be noise, since it never needs to decode postings?

Maybe we should also look into the trade-off of keeping freqs on PFOR vs. switching them to FOR, given that:

  • Non-scoring queries never need freqs, e.g. the counting queries would still see the same speedups.
  • Conjunctions only need to decode freq blocks when all clauses agree on a doc.
  • Both disjunctions and conjunctions can skip decoding freq blocks if max scores are not high enough for a hit to be competitive.

@mikemccand
Copy link
Member

This change would also make it easier to skip within blocks, which might be a sizable win for conjunctions and NOT clauses.

I'm not sure how it could not be noise, since it never needs to decode postings?

Oh, you are right! It's entirely handled by the BKD tree. Phew. How noisy this query is indeed! Hmm how do we encode the int[] blocks in BKD ...

Maybe we should also look into the trade-off of keeping freqs on PFOR vs. switching them to FOR, given that:

+1 -- maybe we leave freqs with patching, and only remove patching for doc blocks.

@slow-J
Copy link
Contributor Author

slow-J commented Oct 31, 2023

Thanks all for the feedback. Will proceed with removing patching only for doc blocks (reverting some of #69)

All the changes needed to create a Lucene99PostingsFormat has made the PR quite large, so I am starting it as a draft PR, #12741

@slow-J
Copy link
Contributor Author

slow-J commented Nov 2, 2023

Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this).

Today, our skipper is forced to align to block boundaries, so when we skip to a given docid, we go to the block that may contain this docid, decode all 128 int[], then linearly scan within those 128 ints. This is quite a bit of overhead for each skip request!

If we could lower that linear scan cost to maybe 16 or 8 or something, the conjunctive queries should get even faster. But perhaps it becomes trickier to take advantage of SIMD optimizations if we are decoding a subset of ints, not sure.

Thanks @mikemccand, I have created #12749 to explore this idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants