Build index in parallel #458

casey · 2022-09-02T20:57:13Z

The ordinal index is currently single threaded. It would be nice to figure out how to build it in parallel. One thing that will reduce parallelism is the fact that redb writers must be serialized.

The two sources of parallelism I can think of are:

Process blocks in parallel
Process transactions in parallel

Straw man proposal:

Create queue of transactions to process
Create a pool of workers that grabs transactions from the queue, figures out which ranges are in the inputs, which ranges are in the outputs
Workers then either commit those changes to the database, or communicate them to a writer thread that writes them to the database

This is blocked on #111, benchmarks, since you can't optimize what you can't benchmark.

veryordinally · 2022-10-29T09:41:40Z

Started looking at literature after our discussion last night. Found an interesting PhD thesis https://tuprints.ulb.tu-darmstadt.de/19668/1/Identification_of_Suitable_Parallelization_Patterns_for_Sequential_Programs_Ul_Huda.pdf

casey · 2022-11-15T20:56:54Z

I think this is either impossible or will complicate the codebase enormously, so closing for now.

casey added good first issue labels Sep 2, 2022

casey closed this as completed Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build index in parallel #458

Build index in parallel #458

casey commented Sep 2, 2022

veryordinally commented Oct 29, 2022

casey commented Nov 15, 2022

Build index in parallel #458

Build index in parallel #458

Comments

casey commented Sep 2, 2022

veryordinally commented Oct 29, 2022

casey commented Nov 15, 2022