Support "worker pool" pattern in actor builder and other related operators #172

elizarov · 2017-11-29T13:41:50Z

actor builder should natively support "worker pool" pattern via an additional optional parameter ~~parallelism~~ concurrency that defaults to 1, so that to you if you have a list of of some requests, then can be all processed concurrently with a specified limited concurrency with a simple code like this:

val reqs: List<Request> = ...
val workers = actor(concurrency = n) { 
    for (it in channel) processeRequest(it)
}

This particular pattern seems to be quite common, with requests being stored in either a list of requests of receive from some other channel, so the proposal is to add concurrency to map, and cosumeEach, too, to be able to write something like:

incomingRequests.consumeEach(concurrency = n) { processRequest(it) }

UPDATE: We will consistently call it concurrency here. We can have dozens of concurrent coroutines which run on a single CPU core. We will reserve the name parallelism to denote limits on the number of CPU cores that are used.

The text was updated successfully, but these errors were encountered:

SolomonSun2010 · 2018-02-11T02:23:30Z

oops, this title let me think it as (-: Support Kotlin native "worker pool", see also:
https://github.com/JetBrains/kotlin-native/blob/master/samples/workers/README.md
https://blog.jetbrains.com/kotlin/2018/02/application-development-in-kotlinnative/

enleur · 2018-03-07T21:28:01Z

Is this too naive implementation of map?

fun <E, R> ReceiveChannel<E>.map(
    context: CoroutineContext = kotlinx.coroutines.experimental.Unconfined,
    parallelism: Int = 1,
    transform: suspend (E) -> R
): ReceiveChannel<R> = produce(context, capacity = parallelism) {
    (0 until parallelism).map {
        launch(context) {
            consumeEach {
                send(transform(it))
            }
        }
    }.forEach { it.join() }
}

elizarov · 2018-03-09T14:14:27Z

@enleur This is close. However, I'd like to have a slightly more efficient implementation that launches up to n coroutines only as they are needed, so that it starts up efficiently even for very large values of n.

dobriy-eeh · 2018-03-15T21:04:25Z

As a proposal for an alternative implementation:

suspend fun <T> forkJoin(
        context: CoroutineContext = DefaultDispatcher,
        start: CoroutineStart = CoroutineStart.DEFAULT,
        outerBlock: (fork: (suspend () -> T) -> Unit) -> Unit
): List<T> {
    val deferreds = ArrayList<Deferred<T>>()
    outerBlock({ deferreds.add(async(context, start) { it() }) })
    return deferreds.map { it.await() }
}

Usage example 1:

val stream = listOf(1, 2, 3).stream()
val results = forkJoin<Int> { fork ->
    stream.forEach { fork { suspendFunc(it) } }
}

Usage example 2:

val results = forkJoin<Int> { fork ->
    for (i in 1..5) {
        if (i % 2 == 0)
            continue

        fork { suspendFunc(i) }
    }
}

The main advantage: this is quite flexible with respect to the outer "looping" code.
You are not limited some strict interface for outgoing data: for example stream only or channel only.
You can use any language features to organize fork loop: for, if, streams and so on.

Also you are not limited exactly one 'request' parameter for processing function, you may use function with any number of parameters.

fvasco · 2018-07-18T08:24:32Z

Does concurrent map preserve the order?

Should we introduce a optional parameter preserveOrder : Boolean = true for some operators? (ie map, filter, ...)

elizarov · 2018-07-18T11:27:55Z

Sometimes you need an order preserved, sometimes you do not. I wonder what should be the default and whether it should be controlled by a boolean of there should be separate operators.

elizarov · 2018-07-20T10:26:41Z

Note, that an alternative design approach to solve the use-case of parallel processing is to introduce a dedicated parallel (?) combinator, so that channel.parallel().map { transform(it) } would perform transform in parallel for all incoming elements without preserving the order.

fvasco · 2018-07-20T11:07:23Z

I am considering the follow signature, this encapsulates the parallel blocks and allows to reuse all current operators.

suspend fun <E, R> ReceiveChannel<E>.parallel(
        parallelism: Int,
        block: suspend ProducerScope<R>.(ReceiveChannel<E>) -> Unit
): ReceiveChannel<R>

or

suspend fun <E, R> ReceiveChannel<E>.parallel(
        parallelism: Int,
        block: suspend ReceiveChannel<E>.() -> ReceiveChannel<R>
): ReceiveChannel<R>

fvasco · 2018-07-21T06:10:08Z

I take some time to expose my previous message.

The idea behind is to use a regular fork/join strategy, fork and join using Channels is pretty easy, so it is possible use paralel pipelines to process items.

Multiple coroutines receive items from a single source ReceiveChannel and send results to the output channel.

suspend fun <E, R> ReceiveChannel<E>.pipelines(
        parallelism: Int,
        block: suspend ReceiveChannel<E>.() -> ReceiveChannel<R>
): ReceiveChannel<R>


val ids: ReceiveChannel<Int> = loadIds()
val largeItem = ids
        .pipelines(5) {
            map { loadItem(it) }
                    .filter { it.active }
        }
        .maxBy { it.size }
}

Unfortunately using this syntax is difficult consume data in parallel, ie consumeEach.

So an alternative syntax can be:

suspend fun <E, R> ReceiveChannel<E>.fork(
        parallelism: Int,
        block: suspend (ReceiveChannel<E>) -> R
): List<R>


val largeItem = ids
        .fork(5) {
            it.map { loadItem(it) }
                    .filter { it.active }
                    .maxBy { it.size }
        }
        .filterNotNull()
        .maxBy { it.size }

Obviously consuming items in the fork function produces a List<Unit> and does not requires the join phase.

I suspect that both operators are useful.

gildor · 2018-10-23T02:09:16Z

I want to bump this issue.
This pattern is so often, I see questions about implementation at least each week on Kotlin Slack #coroutines channel also all fast ad-hoc implementations often have problems (a similar problem we had before awaitAll extensions, when simple extension functions just use map { it.await() } which leak coroutines in case of error)

rnett · 2019-01-27T09:04:56Z

A potential implementation of consumeEach with the spin up:

suspend inline fun <E> ReceiveChannel<E>.consumeEach(
    maxConcurrency: Int,
    initialConcurrency: Int = 10,
    coroutineContext: CoroutineContext = EmptyCoroutineContext,
    crossinline action: suspend (E) -> Unit
) =
    withContext(coroutineContext) {

        if (maxConcurrency <= 0)
            if (initialConcurrency > maxConcurrency)
                throw IllegalArgumentException("initialConcurrency must be less than or equal to maxConcurrency")
            else if (initialConcurrency < 0)
                throw IllegalArgumentException("Can not have a negative initialConcurrency")


        val busy = AtomicInteger(0)

        val workers = MutableList(min(maxConcurrency, initialConcurrency)) {
            launch {
                while (isActive && !(isClosedForReceive && isEmpty)) {
                    busy.incrementAndGet()
                    action(this@consumeEach.receive())
                    busy.decrementAndGet()
                }
            }
        }

        if (maxConcurrency > initialConcurrency || maxConcurrency <= 0) {
            while (isActive && !(isClosedForReceive && isEmpty) && (workers.size < maxConcurrency || maxConcurrency <= 0)) {
                if (busy.get() == workers.size) {
                    val received = receive()

                    workers += launch {
                        busy.incrementAndGet()
                        action(received)
                        busy.decrementAndGet()

                        while (isActive && !(isClosedForReceive && isEmpty)) {
                            busy.incrementAndGet()
                            action(this@consumeEach.receive())
                            busy.decrementAndGet()
                        }
                    }
                }
                delay(10)
            }
        }
        
        workers.joinAll()
    }

I really dislike that while loop to check sizes. It may be possible to do some kind of fake-observable on busy and only launch a watcher coroutine when it hits max (and cancel it when it drops down).

Either way, it shouldn't be to terrible as it quits once the spin up is done, and will often be waiting on receive().

I'm also not sure if the joinAll() at the end is necessary, as afaik the couroutineScope should do any clean up, but I'm not sure enough to leave it off.

rnett · 2019-01-27T09:08:18Z

This pattern is common enough even outside of actors (e.g. make a lot of web requests, but only have 10 going at a time) that it seems like it might be worth having a separate api for launching n amount of coroutines, and use that here, rather than vise versa. At the very least there should be something similar for produce.

Something like:

coroutineScope{
    limitedConcurrency(concurrency = 10){
        (1..100).forEach{
            launch{ doThing() }
        }
    }
}

Only 10 doThings would be executing at any given time.

Where any launches would be redirected to either a worker thread, forced to be lazy and started once there is room, or just have the block held until there is room, then launched.

ultra-taco · 2019-02-16T01:35:06Z

I agree, coming from RxJava I really wish there was something like flatMap() with maxConcurrency without requiring channels

LDVSOFT · 2023-07-29T08:28:42Z

It's been quite long without updates, is it somewhere on the roadmap?

singhmanu · 2024-09-07T19:19:42Z

How does this work for handling errors that occur? If you have something one of the transformations fail does there need to be a means of stopping the other transformations and if so how? Or maybe this is not matter since that should be a part of the transformation block

elizarov changed the title ~~Suppor "worker pool" pattern in actor builder and related operators~~ Support "worker pool" pattern in actor builder and other related operators Nov 29, 2017

elizarov added the enhancement label Nov 29, 2017

elizarov mentioned this issue Nov 30, 2017

Introduce awaitAll and joinAll extensions for collections for Deferreds/Jobs #171

Closed

This was referenced Mar 16, 2018

Problems with flatMap function on ReceiveChannel<T> #180

Closed

Provide abstraction for cold streams #254

Closed

qwwdfsad mentioned this issue Mar 26, 2018

How to do conditional async execution? #304

Closed

qwwdfsad mentioned this issue Apr 12, 2018

[Critical] All Channel implementations leak every passed object #326

Closed

fvasco mentioned this issue Jul 17, 2018

awaitFirst function #424

Closed

elizarov mentioned this issue Jul 20, 2018

Multiple, parallel Channel consumers #441

Closed

elizarov mentioned this issue Mar 1, 2019

Need way to use parallel decomposition of list without saturating dispatcher queue #1022

Open

elizarov mentioned this issue Apr 12, 2019

Semaphore #1088

Closed

elizarov mentioned this issue Apr 26, 2019

Parallel flow processing #1147

Open

qwwdfsad mentioned this issue Aug 7, 2019

#1044 follow up for CompletableDeferred #1092

Closed

elizarov mentioned this issue Oct 3, 2019

How to distribute work among limited coroutines in a most efficient way? #1594

Closed

Minirogue mentioned this issue Nov 2, 2020

Extension functions for bulk collection operations #2357

Open

qwwdfsad mentioned this issue Jun 11, 2021

Propose shortcut for Iterable<Deferred<T>> → Flow<T> e.g. map { it::await.asFlow() }.merge() #2752

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support "worker pool" pattern in actor builder and other related operators #172

Support "worker pool" pattern in actor builder and other related operators #172

elizarov commented Nov 29, 2017 •

edited

Loading

SolomonSun2010 commented Feb 11, 2018

enleur commented Mar 7, 2018

elizarov commented Mar 9, 2018

dobriy-eeh commented Mar 15, 2018

fvasco commented Jul 18, 2018

elizarov commented Jul 18, 2018

elizarov commented Jul 20, 2018 •

edited

Loading

fvasco commented Jul 20, 2018 •

edited

Loading

fvasco commented Jul 21, 2018

gildor commented Oct 23, 2018

rnett commented Jan 27, 2019

rnett commented Jan 27, 2019

ultra-taco commented Feb 16, 2019 •

edited

Loading

LDVSOFT commented Jul 29, 2023

singhmanu commented Sep 7, 2024 •

edited

Loading

Support "worker pool" pattern in actor builder and other related operators #172

Support "worker pool" pattern in actor builder and other related operators #172

Comments

elizarov commented Nov 29, 2017 • edited Loading

SolomonSun2010 commented Feb 11, 2018

enleur commented Mar 7, 2018

elizarov commented Mar 9, 2018

dobriy-eeh commented Mar 15, 2018

fvasco commented Jul 18, 2018

elizarov commented Jul 18, 2018

elizarov commented Jul 20, 2018 • edited Loading

fvasco commented Jul 20, 2018 • edited Loading

fvasco commented Jul 21, 2018

gildor commented Oct 23, 2018

rnett commented Jan 27, 2019

rnett commented Jan 27, 2019

ultra-taco commented Feb 16, 2019 • edited Loading

LDVSOFT commented Jul 29, 2023

singhmanu commented Sep 7, 2024 • edited Loading

elizarov commented Nov 29, 2017 •

edited

Loading

elizarov commented Jul 20, 2018 •

edited

Loading

fvasco commented Jul 20, 2018 •

edited

Loading

ultra-taco commented Feb 16, 2019 •

edited

Loading

singhmanu commented Sep 7, 2024 •

edited

Loading