-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding option to codec to disable patching in Lucene's PFOR encoding #12696
Comments
These results are really interesting! As another option, I wonder if it's worth thinking about this problem as a new codec (sandbox module to start?) that biases towards query speed instead of index size? There may be other decisions we would make differently in a codec biasing in that direction beyond FOR patching. I dunno... maybe that's a terrible idea with a "slippery slope" problem. But I also worry a bit about adding configuration switches to a default codec that need to be maintained. One-size-fits-all solutions are indeed challenging... |
That's a neat idea (separate codec that trades off index size for faster search performance). Maybe it could also fold in the fully in RAM FST term dictionary that @Tony-X is working on, if that is a nice speedup. But some of our Codec formats, e.g. for stored fields, have two options to make the tradeoff an explicit choice by the user ( Also, "no patching" is something we already support at read-time since some blocks today will legitimately have no patching. |
The results are impressive! Conjunctive (-like) queries see sizable gains. Did you turn off patching for all encoded
I'm curious: did you just force no patching at write time, but still write a header into each block saying "there are 0 patches"? If so, we could save a bit of space by removing that header entirely since it'll always be 0), and perhaps gain a bit of performance by not having to check that header at read time. |
Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this). Today, our skipper is forced to align to block boundaries, so when we skip to a given If we could lower that linear scan cost to maybe 16 or 8 or something, the conjunctive queries should get even faster. But perhaps it becomes trickier to take advantage of SIMD optimizations if we are decoding a subset of ints, not sure. |
Yes, I think so. All uses of
|
Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is more expensive because then we have to make sure indexes encoded both ways can be read, it only adds more complexity imo |
+1 to remove patching entirely! |
If we want to remove the patching entirely, which Lucene version (and which Codec) should we implement this in? Would this be a potential change for Lucene 9.9 or perhaps 10.0? Are there any additional corpora that we should also test this with? |
Maybe the NYC taxis? This is a more sparse, and tiny docs (vs dense and medium/large docs in
That's a good question. It is a very low level index format change, and no API change. It would be fully back-compat whether we release in 9.9 vs 10.0. I don't see why we should withhold the change until 10.0, so maybe 9.9? |
Hmm, can you elaborate how it can be fully backwards-compatible on with the indexes that have patching? Is the assumption that we will introduce an option to disable patching? I thought we are thinking to remove it entirely. Maybe I missed something... |
+1. I recalled that @gsmiller was playing with some SIMD algos for decoding blocks of delta-encoded ints. Even if that is fruitful it'd be tricky to apply it because of the patching. |
I think the idea is that because we always maintain readers that can read prior index versions, we will be able to read the old patched indexes, but we would only write new ones lacking patching (which might not be readable by old readers, i.e. forwards-compatible, if we changed the metadata). |
I like the idea of removing the complexity associated with patching if we're convinced it's the right trade-off (and +1 to the pain of vectorizing with patching going away). Also +1 to releasing with 9.9 and not waiting. To address the back-compat questions a little bit, along with some of Mike's earlier questions, the number of patches is written into the same byte used to encode the number of bits-per-value used for the FOR block. We only need 5 bits to encode the b.p.v., and we reserve the remaining 3 to encode the number of patches (which is why we have an upper bound of 7 patches currently). So we can always "write" 0 patches in this byte and remain fully backwards compatible on read, which is great (but it also means we can't claw back some saving by getting rid of the space in the index needed to encode the patches... to answer Mike's earlier question a bit). So I think the roll out might look something like this:
This strategy gives us the added benefit of keeping the patching code around for a little while in case we get some unexpected flood of user complaints after patching is removed (would make it easier to reverse the decision). I doubt that will happen, but it's a nice side effect. |
Yes, that's right. There was some experimentation with this algorithm for vectorized prefix sum. |
I like to think of it as an index-level property rather than a block-level property. We can just check once instead of doing it per-block. Maybe write something in the index header to indicate if patching is there (default to yes - in 9.x ). Then new indexes will write additional header to indicate there is not patching in the whole index (10.x behavior). Then remove the header as well as the patching support altogether in 11.x. |
@Tony-X would the goal here be to eliminate overhead of having to read the number of patches when decoding each block? Or is there more to it than that? |
Yes. This means we could know upfront at segment opening time which code to use (FOR or PatchedFOR), as opposed to use PatchedFOR code to deal with blocks that don't have patches. |
For reference, Lucene used to use FOR for postings and PFOR for positions in 8.x. This was changed in 9.0 via #69 to use PFOR for both postings and positions. This PR says it made the index 3% smaller with no performance impact, but I can believe that we are noticing an impact now as many things changed in the meantime. I'm +1 to switching back to FOR if it yields better performance. I have a preference for keeping PFOR for positions and only moving postings to FOR (essentially reverting #69). The benchmark in this issue description used wikimedium, which by design doesn't have much position data since all documents are truncated. Using PFOR for positions and FOR for postings sounds like a good trade-off to me as positions are less important for performance typically. And if someone wants better performance for their phrase queries, it would likely be a better idea to use a I remember we observed a 15% reduction of our inverted indexes for logs when Lucene moved from FOR to PFOR at Elastic, but I don't think it should block this change, Elasticsearch can maintain its own postings format that uses PFOR for postings. I'm just mentioning it as a way to highlight that I'm expecting that some users will observe an increased disk usage that is more than 3%. Regarding backward compatibility, let's do it with codecs as usual: fork |
FWIW I could reproduce the speedup from disabling patching locally on wikibigall:
|
Thanks for testing @jpountz. I think at some point we also enabled patching for the freq blocks inside Normally the The conjunctive and disjunctive gains are awesome.
+1
+1 This change might make SIMD decoding more palatable for the unpached |
I'm not sure how it could not be noise, since it never needs to decode postings? Maybe we should also look into the trade-off of keeping freqs on PFOR vs. switching them to FOR, given that:
|
This change would also make it easier to skip within blocks, which might be a sizable win for conjunctions and NOT clauses.
Oh, you are right! It's entirely handled by the BKD tree. Phew. How noisy this query is indeed! Hmm how do we encode the
+1 -- maybe we leave freqs with patching, and only remove patching for doc blocks. |
Thanks @mikemccand, I have created #12749 to explore this idea! |
Description
Background: In https://github.com/Tony-X/search-benchmark-game we were comparing performance of Tantivy and Lucene. "One difference between Lucene and Tantivy is Lucene uses the "patch" FOR, meaning the large values in a block are held out as exceptions so that the remaining values can use a smaller number of bits to encode, a tradeoff of CPU for lower storage space." In Tony-X/search-benchmark-game#46 , I disable the patching in Lucene, to match how Tantivy encodes and run the search-benchmark-game to test the change.
Lucene modifications for testing: I cloned the pforUtil and removed all logic related to patching the exceptions. I modified the Lucene90PostingsReader + Writer to use the util with no patching logic, see sample code slow-J@83ec5a8
Hardware used: EC2 Graviton2 instance, m6g.4xlarge
Results from the search-benchmark-game: Tony-X/search-benchmark-game#46 (comment)
We saw Lucene's latency improve: -2% in COUNT, -2% in TOP_10_COUNT, -2.07% in TOP_100.
I then ran a Lucene benchmark with luceneutil
python3 src/python/localrun.py -source wikimediumall -r
Hardware used: EC2 instance, m5.12xlarge
Posting results below
The tasks at the bottom of the table had the larger QPS improvement.
AndHighLow has a QPS improvement of +7.6%!
Size of test candidate index: 17.626 GiB total
Size of test baseline index: 18.401 GiB total
This change would bring a 4.39691% increase in index size.
Proposal
I propose adding an option to Lucene's codec that allows users to disable patching in the PFOR encoding, providing the option to leverage the performance benefits observed here at a cost of index size.
I would appreciate all feedback and further evaluation of this idea by the Lucene community.
The text was updated successfully, but these errors were encountered: