Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #1177 Operators to capture last visible Scheduler and go back to it #1644

Closed
wants to merge 3 commits into from

Conversation

simonbasle
Copy link
Member

No description provided.

@simonbasle simonbasle changed the title wip #1177 Operators to capture last visible Scheduler and go back to it fix #1177 Operators to capture last visible Scheduler and go back to it Apr 3, 2019
@simonbasle simonbasle added this to the 3.3.0.RELEASE milestone Apr 3, 2019
@simonbasle
Copy link
Member Author

cc @vy

@codecov-io
Copy link

codecov-io commented Apr 3, 2019

Codecov Report

Merging #1644 into master will increase coverage by 2.4%.
The diff coverage is 86.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master    #1644     +/-   ##
===========================================
+ Coverage     81.87%   84.27%   +2.4%     
- Complexity     3794     3939    +145     
===========================================
  Files           362      363      +1     
  Lines         30037    30070     +33     
  Branches       5592     5592             
===========================================
+ Hits          24592    25342    +750     
+ Misses         3899     3095    -804     
- Partials       1546     1633     +87
Impacted Files Coverage Δ Complexity Δ
...ore/src/main/java/reactor/core/publisher/Flux.java 98.25% <100%> (ø) 513 <2> (+2) ⬆️
...ain/java/reactor/core/publisher/FluxPublishOn.java 86.63% <81.25%> (+0.24%) 7 <0> (+1) ⬆️
.../reactor/core/publisher/FluxPublishOnCaptured.java 86.95% <86.95%> (ø) 2 <2> (?)
...a/reactor/core/publisher/FluxSchedulerCapture.java 89.47% <89.47%> (ø) 6 <6> (?)
...main/java/reactor/core/publisher/FluxGenerate.java 69.32% <0%> (-11.66%) 7% <0%> (ø)
...n/java/reactor/core/publisher/MonoCollectList.java 86.36% <0%> (-8.72%) 2% <0%> (ø)
...va/reactor/core/publisher/FluxRepeatPredicate.java 91.66% <0%> (-5.56%) 2% <0%> (ø)
...eactor/core/publisher/MonoToCompletableFuture.java 50% <0%> (-4.17%) 5% <0%> (-1%)
...re/src/main/java/reactor/util/function/Tuple4.java 95.83% <0%> (-4.17%) 17% <0%> (-1%)
...ain/java/reactor/core/publisher/MonoPublishOn.java 85.71% <0%> (-3.18%) 3% <0%> (ø)
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb72730...a886878. Read the comment docs.

@vy
Copy link

vy commented Apr 4, 2019

Great work @simonbasle! Thanks so much for taking time to work on the issue!

@simonbasle simonbasle force-pushed the 1177-captureAndReturnScheduler branch from 6973b12 to a886878 Compare April 11, 2019 13:55
@simonbasle simonbasle modified the milestones: 3.3.0.RELEASE, 3.3.0.M1 Apr 16, 2019
@simonbasle simonbasle modified the milestones: 3.3.0.M1, Backlog May 3, 2019
@smaldini smaldini added the status/on-hold This was set aside or cannot be worked on yet label May 3, 2019
@smaldini
Copy link
Contributor

smaldini commented May 3, 2019

I'm not too sure about the current form for this and we will surely evaluate it further in the current 3.3 cycle. Being explicit on the scheduler switch is also useful. Loom would provide us a better option obviously here but I'd love to see the current use case for this too now that can't be achieved with subscribeOn+publishOn.

@vy
Copy link

vy commented May 3, 2019

@smaldini, the usecase where subscribeOn+publishOn falls short of was detailed in #1177. In a nutshell, you flatMap to a library function which subscribes using its own scheduler but needs to publish it back to your scheduler so that you wouldn't be occupying its dedicated scheduler. The problem is 1) as a caller, you don't know that the library us switching schedulers, and 2) as a callee, library doesn't know how to release its thread and let the execution continue in the caller scheduler. Does this help?

@bsideup
Copy link
Contributor

bsideup commented May 4, 2019

@vy

needs to publish back to your scheduler so that you wouldn't be occupying its dedicated scheduler

There are two scenarios:

  1. you need to do some blocking/CPU-heavy call after it.
    In this case, you would use publishOn and friends anyways
  2. you need to perform some non-blocking, non-CPU-heavy call.
    here IMO it does not make any sense to move it to a separate scheduler.

Keep in mind that publishOn is an expensive operation

  1. as a caller, you don't know that the library us switching schedulers

You don't need to, really.

  1. as a callee, library doesn't know how to release its thread and let the execution continue in the caller scheduler.

You shouldn't care. The only thing you care is the thread starvation (caused by either IO or CPU-heavy tasks), but there is a common consumer's rule - if you need to perform some heavy operation, explicitly move it to a dedicated scheduler.

@vy
Copy link

vy commented May 4, 2019

@bsideup Thanks for taking time to check the issue.

you need to do some blocking/CPU-heavy call after it.
In this case, you would use publishOn and friends anyways

How would you use publishOn and friends given you don't know the scheduler of the caller or the callee?

Let me explain the situation using a concrete example:

  1. The service receives HTTP requests, in particular of type assortment. HTTP requests are handled by Netty I/O loop. In order to not block that, you pass the received HTTP request to assortment-scheduler. (Assortment is not the only handler in that I/O loop, hence the dedicated scheduler.)
  2. Later on you need to fetch a row from the database. You call db.find(assortment). Since there are many other components of the application reaching out for db, it strives to do the following:
    1. Execute the operation in its own dedicated jdbc-scheduler to not block the caller scheduler.
    2. Once the result is ready, pass the execution back to the caller scheduler to not occupy itself for other requests coming from other components of the application.

Here the last step is not possible since db doesn't have an access to the caller scheduler.

Does this make more sense? Is my reasoning incorrect?

@bsideup
Copy link
Contributor

bsideup commented May 4, 2019

Once the result is ready, pass the execution back to the caller scheduler to not occupy itself for other requests coming from other components of the application.

to not occupy

Why would you "occupy" the thread if the only next thing you will do is a non-blocking, lightweight operation?

Reactive programming abstracts the threading, and almost the only reason you need to care about the thread is because of the blocking I/O and heavy computations.

Consider the following example:

doSomethingInDB().flatMap(user -> fetchFriends(user))

user will be emitted on DB's thread, the flatMap' mapper will be executed in that thread, but, since it will only push to the queue and maybe trigger the queue draining, there is no good reason to switch the threads back to Netty's scheduler. In fact, it will only increase the amount of work to do, and the amount of consumed CPU.

Another example:

doSomethingInDB().publishOn(blockingFriendlyScheduler).map(user -> fetchFriendsSync(user))

Here we know that fetchFriendsSync is a blocking operation, and make sure that the processing will be done on a special thread pool for the blocking operations. We only care about the rule "do not run blocking/heavyweight operations on non-blocking threads"

@vy
Copy link

vy commented May 4, 2019

Consider the following: .flatMap(db::fetchUsingDedicatedSched).map(bar::blockingCall). Wouldn't bar::blockingCall will be occupying the dedicated db sched? That is, its resources will be used by a blockingCall which has nothing to do with db. How can db opt out from this?

@bsideup
Copy link
Contributor

bsideup commented May 4, 2019

Consider the following: .flatMap(db::fetchUsingDedicatedSched).map(bar::blockingCall)

I would say this code smells. The blocking calls should not be made without a strict control of the schedulers, and here you implicitly inherit the dedicated scheduler.

@vy
Copy link

vy commented May 4, 2019

I think this is a pretty reasonable case, even though it conflicts with your sense of good smell. Now replace db::fetchUsingDedicatedSched with library::doSth and there goes your perfectly valid case for needing to return back to the caller scheduler. The problem is library does not know the context it has been called from and does not want to spend its scheduler time on irrelevant tasks.

@bsideup
Copy link
Contributor

bsideup commented May 4, 2019

I think this is a pretty reasonable case, even though it conflicts with your sense of good smell.

Well, this is not my personal opinion. Running blocking operations on non-blocking threads is a dangerous, performance critical mistake. This is why we created BlockHound which helps detecting such calls. Every blocking call should be explicitly redirected to a dedicated scheduler.

Now replace db::fetchUsingDedicatedSched with library::doSth and there goes your perfectly valid case for needing to return back to the caller scheduler

Sorry, I still don't see why we need to return to the caller scheduler here.

The problem is library does not know the context it has been called from and does not want to spend its scheduler time on irrelevant tasks.

This is not library's concern, really. As long as library's threads are not blocked, we're good. You can have tens of schedulers but the CPUs count will remain static, and it makes little to no sense to move the non-blocking lightweight operation to another pool which will run on the same CPU

@ttddyy
Copy link
Contributor

ttddyy commented May 9, 2019

Hi, I just stumbled upon this usecase and thought about this ticket.

I am writing a library which we intend to use in both spring-mvc and spring-webflux environment.
We have a spring-security AuthenticationManager bean. To allow reusing this bean in reactive env, spring-security provides ReactiveAuthenticationManagerAdapter.
The problem(?) is that this auth-manager-adapter performs authentication logic in separate schedular(elastic, line-43).
This means: every web request coming to the server initially handled by reactor-http-nio-* threads.
However, most of the request go through auth flow, and eventually they will switch to elastic scheduler, and then rest of the controller/service logic will be performed on elastic scheduler.

I'm not sure after auth logic, whether the control should go back to reactor-http-nio-* threads to process the rest of controller/service logic or not.
But, if we have a notion of the original scheduler and a way to go back, then, controller/service logic can be processed on original scheduler.

@bsideup
Copy link
Contributor

bsideup commented May 9, 2019

@ttddyy

I'm not sure after auth logic, whether the control should go back to reactor-http-nio-* threads to process the rest of controller/service logic or not.

It shouldn't. Spring Security stores the auth info in the context, not thread.
For the other things, see my previous comments in this issue :)

Copy link
Contributor

@smaldini smaldini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we have by design work-stealing everywhere and blocking paths are exceptional, however coming back to original thread. To reevaluate with Loom - Kotlin Coroutine integration ?

@simonbasle simonbasle added the status/declined We feel we shouldn't currently apply this change/suggestion label Nov 27, 2019
@simonbasle simonbasle removed this from the Backlog milestone Nov 27, 2019
@simonbasle simonbasle closed this Nov 27, 2019
@chemicL chemicL deleted the 1177-captureAndReturnScheduler branch November 30, 2023 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/declined We feel we shouldn't currently apply this change/suggestion status/on-hold This was set aside or cannot be worked on yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants