Bulk indexing #62

mattonomics · 2014-08-06T16:49:56Z

Rather than submit posts for indexing one at a time, it would be better to use the Elasticsearch bulk index functionality.

We will have to consider error handling when using the bulk api because it's possible that some posts don't get indexed and others do. Elasticsearch will tell us what happened but we will have to resubmit those posts for indexing.

Additionally, we'll need to be aware of the max post size on the server and the max amount of data elasticsearch/java can receive.

AaronHolbrook · 2014-08-11T11:59:37Z

I'd like us to do some digging on exactly what Elasticsearch tells us on partial indexing. I haven't run into a case of partial index failure.

Going to split out the max post size/max amount of data problem into a separate issue. I think we can focus initially on building a bulk index with a fairly conservative amount of initial posts that is customizable to allow for smaller chunks if syncing fails.

tlovett1 · 2014-08-11T15:41:15Z

Yea I'd love to know how much of a performance difference we are talking about since the max POST size problem introduces some serious complexities.

AaronHolbrook · 2014-08-13T14:39:59Z

Let's focus this issue on simply implementing a conservative default bulk index.

#66 can focus on implementing more intelligent size of bulk index.

AaronHolbrook · 2014-08-15T13:05:32Z

@tlovett1 Was mentioning default bulk index size over in #69; bringing that conversation over here.

Unless someone has another idea, how about 200 as the default chunk size?

tlovett1 · 2014-08-15T13:58:39Z

200 sounds like a pretty good conservative number to me. Still some benchmarking information would be useful.

mattonomics added the enhancement label Aug 6, 2014

AaronHolbrook added this to the Version 1.0 milestone Aug 6, 2014

AaronHolbrook assigned AaronHolbrook and tlovett1 and unassigned AaronHolbrook Aug 6, 2014

AaronHolbrook mentioned this issue Aug 11, 2014

Intelligent POST size / Elasticsearch Memory limit detection #66

Closed

AaronHolbrook modified the milestones: Version 1.0, Version 0.9 Aug 13, 2014

tlovett1 assigned mattonomics and unassigned tlovett1 Aug 15, 2014

AaronHolbrook added the benchmarking needed label Aug 15, 2014

AaronHolbrook modified the milestones: Version 0.9, Version 0.9.1 Aug 20, 2014

AaronHolbrook modified the milestones: Version 0.9.2, Version 0.9.1 Sep 5, 2014

AaronHolbrook closed this as completed Sep 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk indexing #62

Bulk indexing #62

mattonomics commented Aug 6, 2014

AaronHolbrook commented Aug 11, 2014

tlovett1 commented Aug 11, 2014

AaronHolbrook commented Aug 13, 2014

AaronHolbrook commented Aug 15, 2014

tlovett1 commented Aug 15, 2014

Bulk indexing #62

Bulk indexing #62

Comments

mattonomics commented Aug 6, 2014

AaronHolbrook commented Aug 11, 2014

tlovett1 commented Aug 11, 2014

AaronHolbrook commented Aug 13, 2014

AaronHolbrook commented Aug 15, 2014

tlovett1 commented Aug 15, 2014