fix #1177 Operators to capture last visible Scheduler and go back to it #1644

simonbasle · 2019-04-03T13:52:33Z

No description provided.

simonbasle · 2019-04-03T13:53:49Z

codecov-io · 2019-04-03T16:31:01Z

Codecov Report

Merging #1644 into master will increase coverage by 2.4%.
The diff coverage is 86.66%.

@@             Coverage Diff             @@
##             master    #1644     +/-   ##
===========================================
+ Coverage     81.87%   84.27%   +2.4%     
- Complexity     3794     3939    +145     
===========================================
  Files           362      363      +1     
  Lines         30037    30070     +33     
  Branches       5592     5592             
===========================================
+ Hits          24592    25342    +750     
+ Misses         3899     3095    -804     
- Partials       1546     1633     +87

Impacted Files	Coverage Δ	Complexity Δ
...ore/src/main/java/reactor/core/publisher/Flux.java	`98.25% <100%> (ø)`	`513 <2> (+2)`	⬆️
...ain/java/reactor/core/publisher/FluxPublishOn.java	`86.63% <81.25%> (+0.24%)`	`7 <0> (+1)`	⬆️
.../reactor/core/publisher/FluxPublishOnCaptured.java	`86.95% <86.95%> (ø)`	`2 <2> (?)`
...a/reactor/core/publisher/FluxSchedulerCapture.java	`89.47% <89.47%> (ø)`	`6 <6> (?)`
...main/java/reactor/core/publisher/FluxGenerate.java	`69.32% <0%> (-11.66%)`	`7% <0%> (ø)`
...n/java/reactor/core/publisher/MonoCollectList.java	`86.36% <0%> (-8.72%)`	`2% <0%> (ø)`
...va/reactor/core/publisher/FluxRepeatPredicate.java	`91.66% <0%> (-5.56%)`	`2% <0%> (ø)`
...eactor/core/publisher/MonoToCompletableFuture.java	`50% <0%> (-4.17%)`	`5% <0%> (-1%)`
...re/src/main/java/reactor/util/function/Tuple4.java	`95.83% <0%> (-4.17%)`	`17% <0%> (-1%)`
...ain/java/reactor/core/publisher/MonoPublishOn.java	`85.71% <0%> (-3.18%)`	`3% <0%> (ø)`
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb72730...a886878. Read the comment docs.

vy · 2019-04-04T07:25:34Z

Great work @simonbasle! Thanks so much for taking time to work on the issue!

smaldini · 2019-05-03T20:59:24Z

I'm not too sure about the current form for this and we will surely evaluate it further in the current 3.3 cycle. Being explicit on the scheduler switch is also useful. Loom would provide us a better option obviously here but I'd love to see the current use case for this too now that can't be achieved with subscribeOn+publishOn.

vy · 2019-05-03T21:25:32Z

@smaldini, the usecase where subscribeOn+publishOn falls short of was detailed in #1177. In a nutshell, you flatMap to a library function which subscribes using its own scheduler but needs to publish it back to your scheduler so that you wouldn't be occupying its dedicated scheduler. The problem is 1) as a caller, you don't know that the library us switching schedulers, and 2) as a callee, library doesn't know how to release its thread and let the execution continue in the caller scheduler. Does this help?

bsideup · 2019-05-04T07:17:11Z

@vy

needs to publish back to your scheduler so that you wouldn't be occupying its dedicated scheduler

There are two scenarios:

you need to do some blocking/CPU-heavy call after it.
In this case, you would use publishOn and friends anyways
you need to perform some non-blocking, non-CPU-heavy call.
here IMO it does not make any sense to move it to a separate scheduler.

Keep in mind that publishOn is an expensive operation

as a caller, you don't know that the library us switching schedulers

You don't need to, really.

as a callee, library doesn't know how to release its thread and let the execution continue in the caller scheduler.

You shouldn't care. The only thing you care is the thread starvation (caused by either IO or CPU-heavy tasks), but there is a common consumer's rule - if you need to perform some heavy operation, explicitly move it to a dedicated scheduler.

vy · 2019-05-04T17:14:43Z

@bsideup Thanks for taking time to check the issue.

you need to do some blocking/CPU-heavy call after it.
In this case, you would use publishOn and friends anyways

How would you use publishOn and friends given you don't know the scheduler of the caller or the callee?

Let me explain the situation using a concrete example:

The service receives HTTP requests, in particular of type assortment. HTTP requests are handled by Netty I/O loop. In order to not block that, you pass the received HTTP request to assortment-scheduler. (Assortment is not the only handler in that I/O loop, hence the dedicated scheduler.)
Later on you need to fetch a row from the database. You call db.find(assortment). Since there are many other components of the application reaching out for db, it strives to do the following:
1. Execute the operation in its own dedicated jdbc-scheduler to not block the caller scheduler.
2. Once the result is ready, pass the execution back to the caller scheduler to not occupy itself for other requests coming from other components of the application.

Here the last step is not possible since db doesn't have an access to the caller scheduler.

Does this make more sense? Is my reasoning incorrect?

bsideup · 2019-05-04T19:13:47Z

Once the result is ready, pass the execution back to the caller scheduler to not occupy itself for other requests coming from other components of the application.

to not occupy

Why would you "occupy" the thread if the only next thing you will do is a non-blocking, lightweight operation?

Reactive programming abstracts the threading, and almost the only reason you need to care about the thread is because of the blocking I/O and heavy computations.

Consider the following example:

doSomethingInDB().flatMap(user -> fetchFriends(user))

user will be emitted on DB's thread, the flatMap' mapper will be executed in that thread, but, since it will only push to the queue and maybe trigger the queue draining, there is no good reason to switch the threads back to Netty's scheduler. In fact, it will only increase the amount of work to do, and the amount of consumed CPU.

Another example:

doSomethingInDB().publishOn(blockingFriendlyScheduler).map(user -> fetchFriendsSync(user))

Here we know that fetchFriendsSync is a blocking operation, and make sure that the processing will be done on a special thread pool for the blocking operations. We only care about the rule "do not run blocking/heavyweight operations on non-blocking threads"

vy · 2019-05-04T19:42:04Z

Consider the following: .flatMap(db::fetchUsingDedicatedSched).map(bar::blockingCall). Wouldn't bar::blockingCall will be occupying the dedicated db sched? That is, its resources will be used by a blockingCall which has nothing to do with db. How can db opt out from this?

bsideup · 2019-05-04T19:44:28Z

Consider the following: .flatMap(db::fetchUsingDedicatedSched).map(bar::blockingCall)

I would say this code smells. The blocking calls should not be made without a strict control of the schedulers, and here you implicitly inherit the dedicated scheduler.

vy · 2019-05-04T19:49:17Z

I think this is a pretty reasonable case, even though it conflicts with your sense of good smell. Now replace db::fetchUsingDedicatedSched with library::doSth and there goes your perfectly valid case for needing to return back to the caller scheduler. The problem is library does not know the context it has been called from and does not want to spend its scheduler time on irrelevant tasks.

bsideup · 2019-05-04T19:55:22Z

I think this is a pretty reasonable case, even though it conflicts with your sense of good smell.

Well, this is not my personal opinion. Running blocking operations on non-blocking threads is a dangerous, performance critical mistake. This is why we created BlockHound which helps detecting such calls. Every blocking call should be explicitly redirected to a dedicated scheduler.

Now replace db::fetchUsingDedicatedSched with library::doSth and there goes your perfectly valid case for needing to return back to the caller scheduler

Sorry, I still don't see why we need to return to the caller scheduler here.

The problem is library does not know the context it has been called from and does not want to spend its scheduler time on irrelevant tasks.

This is not library's concern, really. As long as library's threads are not blocked, we're good. You can have tens of schedulers but the CPUs count will remain static, and it makes little to no sense to move the non-blocking lightweight operation to another pool which will run on the same CPU

ttddyy · 2019-05-09T00:44:05Z

Hi, I just stumbled upon this usecase and thought about this ticket.

I am writing a library which we intend to use in both spring-mvc and spring-webflux environment.
We have a spring-security AuthenticationManager bean. To allow reusing this bean in reactive env, spring-security provides ReactiveAuthenticationManagerAdapter.
The problem(?) is that this auth-manager-adapter performs authentication logic in separate schedular(elastic, line-43).
This means: every web request coming to the server initially handled by reactor-http-nio-* threads.
However, most of the request go through auth flow, and eventually they will switch to elastic scheduler, and then rest of the controller/service logic will be performed on elastic scheduler.

I'm not sure after auth logic, whether the control should go back to reactor-http-nio-* threads to process the rest of controller/service logic or not.
But, if we have a notion of the original scheduler and a way to go back, then, controller/service logic can be processed on original scheduler.

bsideup · 2019-05-09T06:55:10Z

@ttddyy

I'm not sure after auth logic, whether the control should go back to reactor-http-nio-* threads to process the rest of controller/service logic or not.

It shouldn't. Spring Security stores the auth info in the context, not thread.
For the other things, see my previous comments in this issue :)

smaldini

For now we have by design work-stealing everywhere and blocking paths are exceptional, however coming back to original thread. To reevaluate with Loom - Kotlin Coroutine integration ?

simonbasle changed the title ~~wip #1177 Operators to capture last visible Scheduler and go back to it~~ fix #1177 Operators to capture last visible Scheduler and go back to it Apr 3, 2019

simonbasle requested review from smaldini and bsideup April 3, 2019 13:52

simonbasle added this to the 3.3.0.RELEASE milestone Apr 3, 2019

simonbasle added missing tests labels Apr 3, 2019

simonbasle mentioned this pull request Apr 8, 2019

How to return execution to initial scheduler spring-cloud/spring-cloud-gateway#983

Closed

simonbasle added 3 commits April 11, 2019 15:55

wip #1177 Operators to capture last visible Scheduler and go back to it

b573812

use concatMap in test instead of flatMap

1a88349

switch smokeTest to concatMap too

a886878

simonbasle force-pushed the 1177-captureAndReturnScheduler branch from 6973b12 to a886878 Compare April 11, 2019 13:55

simonbasle modified the milestones: 3.3.0.RELEASE, 3.3.0.M1 Apr 16, 2019

simonbasle removed missing tests labels Apr 16, 2019

simonbasle modified the milestones: 3.3.0.M1, Backlog May 3, 2019

smaldini added the status/on-hold This was set aside or cannot be worked on yet label May 3, 2019

ttddyy mentioned this pull request May 9, 2019

Make scheduler configurable on ReactiveAuthenticationManagerAdapter spring-projects/spring-security#6852

Merged

smaldini suggested changes Jun 3, 2019

View reviewed changes

simonbasle added the status/declined We feel we shouldn't currently apply this change/suggestion label Nov 27, 2019

simonbasle removed this from the Backlog milestone Nov 27, 2019

simonbasle closed this Nov 27, 2019

chemicL deleted the 1177-captureAndReturnScheduler branch November 30, 2023 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #1177 Operators to capture last visible Scheduler and go back to it #1644

fix #1177 Operators to capture last visible Scheduler and go back to it #1644

simonbasle commented Apr 3, 2019

simonbasle commented Apr 3, 2019

codecov-io commented Apr 3, 2019 •

edited

Loading

vy commented Apr 4, 2019

smaldini commented May 3, 2019

vy commented May 3, 2019

bsideup commented May 4, 2019 •

edited

Loading

vy commented May 4, 2019

bsideup commented May 4, 2019

vy commented May 4, 2019

bsideup commented May 4, 2019

vy commented May 4, 2019

bsideup commented May 4, 2019

ttddyy commented May 9, 2019

bsideup commented May 9, 2019

smaldini left a comment

fix #1177 Operators to capture last visible Scheduler and go back to it #1644

fix #1177 Operators to capture last visible Scheduler and go back to it #1644

Conversation

simonbasle commented Apr 3, 2019

simonbasle commented Apr 3, 2019

codecov-io commented Apr 3, 2019 • edited Loading

Codecov Report

vy commented Apr 4, 2019

smaldini commented May 3, 2019

vy commented May 3, 2019

bsideup commented May 4, 2019 • edited Loading

vy commented May 4, 2019

bsideup commented May 4, 2019

vy commented May 4, 2019

bsideup commented May 4, 2019

vy commented May 4, 2019

bsideup commented May 4, 2019

ttddyy commented May 9, 2019

bsideup commented May 9, 2019

smaldini left a comment

Choose a reason for hiding this comment

codecov-io commented Apr 3, 2019 •

edited

Loading

bsideup commented May 4, 2019 •

edited

Loading