Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many commits #1

Open
fulmicoton opened this issue Mar 8, 2020 · 2 comments
Open

Too many commits #1

fulmicoton opened this issue Mar 8, 2020 · 2 comments

Comments

@fulmicoton
Copy link

Tantivy is not meant to commit after every document.

It will get a tad better in the next version of tantivy but still not a viable solution.
Ideally pallet should accept a short lag (100ms) between to moment a document is an inserted and the moment when it is available for search.

@kardeiz
Copy link
Owner

kardeiz commented Mar 10, 2020

@fulmicoton Good to know; thanks for the issue!

Is the issue commit-ing frequently, or not re-using the IndexWriter? It doesn't look like calling commit itself is too costly.

For example, if I were to put a global IndexWriter into a Mutex<IndexWriter> on the pallet::search::Index object, and then used that in the database operations (e.g. create/update) calling commit at the end of each), would that resolve the issue?

Also, it looks like calling IndexWriter::commit joins on all the worker threads, so it shouldn't be necessary to wait for a document to be available, should it?

I'd like to be able to provide some level of consistency between the database and the index, by calling commit inside the sled transactions in the database operations, but I'll need to look into this more.

Thanks for your help!

@tv42
Copy link

tv42 commented Jul 15, 2021

by calling commit inside the sled transactions in the database operations

That means you've reduced sled write performance to the level of Tantivy, which is designed for large batch updates, not quick commits. That's not ideal, it'd be better to run Tantivy in "catch-up mode" where in practice newly-added data is found by search, but updates are not delayed to make that a guarantee.

Here's a blog post from the author of Tantivy (EDIT: who I now realize is the person who created this issue!) that talks about what happens with small commits: https://fulmicoton.com/posts/behold-tantivy-part2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants