-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk indexing #62
Comments
I'd like us to do some digging on exactly what Elasticsearch tells us on partial indexing. I haven't run into a case of partial index failure. Going to split out the max post size/max amount of data problem into a separate issue. I think we can focus initially on building a bulk index with a fairly conservative amount of initial posts that is customizable to allow for smaller chunks if syncing fails. |
Yea I'd love to know how much of a performance difference we are talking about since the max POST size problem introduces some serious complexities. |
Let's focus this issue on simply implementing a conservative default bulk index. #66 can focus on implementing more intelligent size of bulk index. |
200 sounds like a pretty good conservative number to me. Still some benchmarking information would be useful. |
Rather than submit posts for indexing one at a time, it would be better to use the Elasticsearch bulk index functionality.
We will have to consider error handling when using the bulk api because it's possible that some posts don't get indexed and others do. Elasticsearch will tell us what happened but we will have to resubmit those posts for indexing.
Additionally, we'll need to be aware of the max post size on the server and the max amount of data elasticsearch/java can receive.
The text was updated successfully, but these errors were encountered: