Use iteration instead of indexing in `Threads.@threads` #40788

sostock · 2021-05-11T11:33:28Z

The code created by Threads.@threads for x = y; ...; end currently indexes into y. Therefore, it does not work for iterators which do not support indexing. This PR changes that by using Iterators.drop and Iterators.take to pick the correct chunk of y for each thread.

Fixes #40704.

simeonschaub · 2021-05-11T17:19:19Z

This will iterate through the entire vector O(nthreads) times, which will have a large penalty. If we wanted to support Threads.@threads for more iterators, we would want some kind of generic API for splitting collections. There is SplittablesBase which has something like this. Perhaps something like it could be moved into Base, but that would have to be well thought through.
It doesn't make much sense in my mind to try to support any kind of iterator, since this will inherently never work for stateful iterators and be very slow for Iterators where iterate is expensive.

sostock · 2021-05-12T05:37:28Z

We could use indexing for AbstractArrays to get the old performance back and only iterate over the collection as a fallback for non-AbstractArray iterators. Then it will at least work for generic non-stateful iterators, despite being slower in those cases.

I think, in typical applications the actual computation (i.e., the loop body) will be more expensive than the iteration itself, so iterating over the collection nthreads times does not seem like a bad tradeoff to support arbitrary (non-stateful) iterators.

tkf · 2021-05-12T08:24:01Z

While I understand the desire to make @threads more usable, I strongly suggest avoiding this approach. This Iterators.drop-based approach incurs O(length(input) * nthreads()) overhead which is not negligible for a large class of collections including Iterators.map, Iterators.filter, strings, hash tables, trees, lists, and so on.

SplittingBase takes a simple approach but it turned out to be very effective. Although I still want to explore better APIs for handling important non-standard cases, I believe something like this combined with loop composition based on higher-order function is a much better approach for providing extensible parallel loop infrastructure.

Seelengrab · 2021-05-13T06:03:44Z

I think a good solution would be to have the @threads macro support work stealing or lazy iteration for non-indexables. It does support switching scheduling strategy since 1.5, but only :static scheduling is available right now. That would also prevent redundant consumption of the given iterable (which may have side effects anyway). This sadly would make it much more complicated :/

tkf · 2021-05-13T08:29:24Z

For the class of parallel code that include for loops, it's already possible to implement even continuation stealing (the "better" strategy of work-stealing) in Julia. It also already supports generic collections that implement SplittablesBase interface.

Use iteration instead of indexing in at-threads

6275c50

sostock mentioned this pull request May 11, 2021

Add firstindex method for Enumerator objects #40772

Open

Use view for AbstractArrays

8ca2d2c

JeffBezanson closed this May 14, 2021

aminnj mentioned this pull request Sep 3, 2021

make branch basket buffer threadlocal JuliaHEP/UnROOT.jl#76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use iteration instead of indexing in `Threads.@threads` #40788

Use iteration instead of indexing in `Threads.@threads` #40788

sostock commented May 11, 2021

simeonschaub commented May 11, 2021

sostock commented May 12, 2021 •

edited

Loading

tkf commented May 12, 2021

Seelengrab commented May 13, 2021 •

edited

Loading

tkf commented May 13, 2021

Use iteration instead of indexing in Threads.@threads #40788

Use iteration instead of indexing in Threads.@threads #40788

Conversation

sostock commented May 11, 2021

simeonschaub commented May 11, 2021

sostock commented May 12, 2021 • edited Loading

tkf commented May 12, 2021

Seelengrab commented May 13, 2021 • edited Loading

tkf commented May 13, 2021

Use iteration instead of indexing in `Threads.@threads` #40788

Use iteration instead of indexing in `Threads.@threads` #40788

sostock commented May 12, 2021 •

edited

Loading

Seelengrab commented May 13, 2021 •

edited

Loading