Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HystrixRequestContext in case of async requests #895

Closed
amitcse opened this issue Sep 13, 2015 · 31 comments
Closed

HystrixRequestContext in case of async requests #895

amitcse opened this issue Sep 13, 2015 · 31 comments

Comments

@amitcse
Copy link

amitcse commented Sep 13, 2015

Hi,
In case when HystrixObservableCommand is used to wrap a non-blocking observable, and the non-blocking request is done on ExecutorService, (the subscription is also done on the executor service) will the HystrixRequestContext have to be explicitly copied to the newly created threads for the HystrixObservableCollapser to work ??
The HystrixObservableCollapser is used in Rx composition (flatMap).

@restfulhead
Copy link
Contributor

Hi Amit,

I have a similar question and found your post.

From what I understand, the request context is not automatically copied or inherited by child threads. If you're scheduling with RxJava (e.g. hystrixObservableCommand.toObservable().subscribeOn(Schedulers.io()).subscribe()) then you can do the following:

  • Implement a RxJavaSchedulersHook
  • Overwrite onSchedule() and wrap the action with your own action
  • In your action, make use of HystrixRequestContext.setContextOnCurrentThread before delegating to the original action

Now, the problem that I have with this is that it does not copy anything; It simply references the parent request context. And now comes my question:

We initialize the hystrix request context with a servlet filter as mentioned in the wiki. Using the above method works fine, if all child threads complete before the filter calls HystrixRequestContext.shutdown() at the end. However, if we just asynchronously subscribe to an observable and walk away, then the filter might shutdown the context before the asynchronous completed. The action then fails, because the context is gone.

Can anyone from the Netflix team advise on how to use the context with asynchronous actions? Is there a way to copy the context? Or is there a way to "remember" that there are still actions using the context and only close it when all actions completed?

@amitcse
Copy link
Author

amitcse commented Sep 16, 2015

Hi ruhkopf,

I believe I am also facing similar issue, not sure though. You can find the details of the issue I am facing here. https://groups.google.com/d/msg/hystrixoss/-bJmnhgmJwU/ZX2-8EPrBQAJ
Just to understand your case, how are you making sure that the servlet filter calls shutdown after all the async threads are done with processing ? In my case, I am blocking the server request thread by calling observable.toBlocking()

@restfulhead
Copy link
Contributor

how are you making sure that the servlet filter calls shutdown after all the async threads are done with processing

That is the question I was hoping to get an answer for. :-) I'm currently playing around with a map that remembers every async thread per request context and only really shuts down the context if there are no more threads running. It looks like it might work, but it adds overhead.

In my case, I am blocking the server request thread by calling observable.toBlocking()

Yes, blocking at the end of the request works. However, what if we don't want to do this? In my case I have a task that takes some time. I don't want to block the request until it's finished...

@mattrjacobs
Copy link
Contributor

@amitcse / @ruhkopf Sorry for the delay in answering. I still need to work up an example for HystrixObservableCollapser, but here's what happens for HystrixCollapser. Note that you must not be using a globally-scoped collapser if you want a HystrixRequestContext.

Each invocation of a HystrixCollapser puts the arg in a batch (com.netflix.hystrix.RequestCollapser.CollapsedTask). The first item in a batch copies the HystrixRequestContext onto the contained Callable. See: https://github.com/Netflix/Hystrix/blob/master/hystrix-core/src/main/java/com/netflix/hystrix/collapser/RequestCollapser.java#L137

Once the timer ticks, it executes this task, which then has the proper HystrixRequestContext.

This should work in the same way for HystrixObservableCollapser - I will write a unit test to confirm (or find a bug).

@mattrjacobs
Copy link
Contributor

Also, for the question of how to fit the async model into a blocking model (like Servlets), the general solution is to represent the entire "work" of the request as an Observable<Response> by using RxJava to compose all of the individual work you need to complete.

Then, as you put that work on non-Tomcat threadpools, you can block the Tomcat thread via responseObservable.toBlocking().single(), or something equivalent.

Here's one of @benjchristensen's presentations that goes into more detail on this topic:
https://speakerdeck.com/benjchristensen/applying-reactive-programming-with-rxjava-at-goto-chicago-2015

@restfulhead
Copy link
Contributor

@amitcse I've created a simple unit test that demonstrates the ObservableCollapser. It also shows that a RxJavaPlugin is necessary when scheduling asynchronously. I hope it's useful. It seems to work well: https://github.com/ruhkopf/ExampleAsyncHystrixObservableCollapserExample

@mattrjacobs Thanks for providing some details around the request collapser and the link to Ben's presentation. It's a great intro!

About blocking: That's what we're pretty much doing right now. Our service layer is based on Observables and then we block in our REST layer. This works fine.

However, I now have the use case were a (synchronous) web service call should subscribe to an Observable asynchronously and then walk away (i.e. never blocking/waiting). I'm not sure how/if I could still leverage the benefits of the request context (e.g. request collapsing). I'm going to add another unit test soon, that demonstrates this.

@amitcse
Copy link
Author

amitcse commented Sep 18, 2015

@ruhkopf Thanks for the test. I also did something similar, wherein I copy the HystrixRequestContext to the new thread from my executor service, on which the async request happens. I am not seeing any arbitrary exceptions about RequestContext now.

@mattrjacobs This HystrixRequestContext for async requests seems to be solved by copying the HystrixRequestContext on the new thread/scheduler, but now I am facing an issue with the semaphores wherein semaphores are not being released in case of exceptions properly. I will create a new thread for that. Need your help there.

@mattrjacobs
Copy link
Contributor

I was able to recreate the situation you described by adding logging to an existing unit test: mattrjacobs@c6ffedb

The output of HystrixObservableCollapserTest.testTwoRequestsWhichShouldEachEmitTwice is:
Thread which calls observe() on collapser instances is [main] and has ReqCtx : com.netflix.hystrix.strategy.concurrency.HystrixRequestContext@5f70bea5
add listener: com.netflix.hystrix.collapser.RequestCollapser$CollapsedTask@44d3db51
ExecutionCount 0 => Time: 10 Delay: 10
Executing task ...
Thread creating batch command is [main] and has ReqCtx : com.netflix.hystrix.strategy.concurrency.HystrixRequestContext@5f70bea5
executionCount: 1
Thread constructing batch HystrixObservableCommand is [hystrix-OWNER_ONE-1] and has ReqCtx : com.netflix.hystrix.strategy.concurrency.HystrixRequestContext@5f70bea5
Thread executing batch HystrixObservableCommand is [RxComputationThreadPool-1] and has ReqCtx : null
tasks: [com.netflix.hystrix.HystrixCollapserTest$TestCollapserTimer$ATask@13b06041]
**** clear TimerListener: tasks.size => 1

Which matches your description exactly. I need to think about how best to handle this. The workaround described by @ruhkopf absolutely works, but I'd like to see if Hystrix can handle this case better. Will update this thread as I make progress on that.

@restfulhead
Copy link
Contributor

Thanks for looking into this @mattrjacobs. It would be great to find a better way to handle this, because unfortunately the workaround does not work for all scenarios: Since I pass the request context from the calling thread to the scheduled thread, it breaks if the scheduled action runs after the caller has already completed. (In my specified case, the caller is a http servlet request and the Hystrix servlet filter then closes the Hystrix Context at the end).

I've added another test that demonstrates this. Please see the test demonstrateSyncRequestThatSchedulesAnAsyncTask in IssueWithBackgroundTasks.

The observable that runs in the background fails with:

java.lang.IllegalStateException: HystrixRequestContext.initializeContext() must be called at the beginning of each request before RequestVariable functionality can be used.
    at com.netflix.hystrix.strategy.concurrency.HystrixRequestVariableDefault.get(HystrixRequestVariableDefault.java:76)
    at com.netflix.hystrix.strategy.concurrency.HystrixRequestVariableHolder.get(HystrixRequestVariableHolder.java:68)
    at com.netflix.hystrix.collapser.RequestCollapserFactory.getCollapserForUserRequest(RequestCollapserFactory.java:133)
    at com.netflix.hystrix.collapser.RequestCollapserFactory.getRequestCollapser(RequestCollapserFactory.java:88)
    at com.netflix.hystrix.HystrixObservableCollapser.toObservable(HystrixObservableCollapser.java:422)
    at com.netflix.hystrix.HystrixObservableCollapser.toObservable(HystrixObservableCollapser.java:398)
    at name.ruhkopf.rx.hystrix.example.IssueWithBackgroundTasks.lambda$1(IssueWithBackgroundTasks.java:56)
    at name.ruhkopf.rx.hystrix.example.IssueWithBackgroundTasks$$Lambda$1/1844169442.call(Unknown Source)

I understand why it happens. Because we don't block and wait for the background task to finish, the parent completes first and removes the context. Then the background task runs and can't find the context (it was already removed). It all makes sense, but I don't know how to solve it. If only there was a way to identify actions that should run in the background vs. actions that must complete in one request. Then it would be easy to just create a new request context for the background actions and inherit the current one for foreground actions. Do you know what I mean?

@mattrjacobs
Copy link
Contributor

A quick note (don't have a full solution yet): A class already exists (com.netflix.hystrix.strategy.concurrency.HystrixContextScheduler), which allows you to pass a HystrixRequestContext through a scheduling event. This is how Hystrix manages executing work in Hystrix thread pools with the appropriate context. This is likely a good alternative to affecting all Rx scheduling in your application.

@mattrjacobs
Copy link
Contributor

My comment above (#895 (comment)) was based on a unrealistic unit test.

I changed the unit test in #902 in the following ways:

  • Run it on a real HystrixTimer (Which uses a HystrixTimer thread for command execution)
  • Configure the batch command to be Semaphore-isolated
  • Don't use subscribeOn(Schedulers.computation()) at the end of the construct() method.

When those are put together, the execution of the batch command is semaphore-isolated, and so runs on the calling thread. The HystrixTimer thread executes the command, so the batch command runs there. No thread-hopping is needed in user-specified code, and Hystrix manages all of the passing around of HystrixRequestContext, such that it exists everywhere.

I'll now play around with the tests provided by @ruhkopf to understand those

@mattrjacobs
Copy link
Contributor

@ruhkopf Take a look at https://github.com/mattrjacobs/ExampleAsyncHystrixObservableCollapserExample/commit/3032249a88580304ef7bf2377fcf043e1ee2baaa.

Here, the unit test is passing without the use of RxJavaPlugins.

If you use that code as-is, then the HystrixObservableCommand is executed on the HystrixTimer thread. Without thread-hopping, the subscribe in collapser.observe().subscribe() executes in the main thread. This is generally fine, as the OnSubscribe here is just putting arguments into a batch.

Please let me know if this helps, or if there are any other cases I haven't considered. Having your repo to code against is very helpful, so feel free to illustrate with examples. Also, once you're satisfied with how things look, would you consider submitting some of those examples to hystrix-examples? It's been on my TODO list for awhile to get HystrixObservableCollapser examples there.

@mattrjacobs
Copy link
Contributor

@ruhkopf Here's a different solution to the lack of context.

https://github.com/mattrjacobs/ExampleAsyncHystrixObservableCollapserExample/commit/e1fad9dd49fc3c3ff6e71812b947802717c85816

In this case, it uses a HystrixContextScheduler, and applies it via `collapser.observe().subscribeOn(HystrixContextScheduler).subscribe().

@amitcse
Copy link
Author

amitcse commented Sep 19, 2015

@mattrjacobs I like the idea of executing the HystrixObservableCommand on the HystrixTimer thread. It gets me into thinking that maybe I don't need to call Jersey Reactive Client with a custom ExecutorService, and maybe simple Jersey client will be enough. I will look into it.
One question: Is the default isolation strategy for `HystrixObservableCommand' SEMAPHORE ? According to Ben's comment, thread isolation is not even necessary here, so is semaphore isolation the default strategy ?

@mattrjacobs
Copy link
Contributor

@amitcse Yes, the defaults are for batch commands (either HystrixCommand or HystrixObservableCommand) to be executed from the HystrixTimer thread.

HystrixObservableCommand does default to SEMAPHORE-isolation. See https://github.com/Netflix/Hystrix/blob/master/hystrix-core/src/main/java/com/netflix/hystrix/HystrixObservableCommand.java#L194.

Part of my confusion was that both of these were being overridden in the unit test I added logging to.

@restfulhead
Copy link
Contributor

Thanks for your help @mattrjacobs The example that shows how to use the HystrixContextScheduler is very useful. With this, the RX plugin is no longer necessary.

I still have an issue when scheduling a background task. It's more like an architecture issue/question. Could you please have a look at the other test class (IssueWithBackgoundTasks)? I'm interested to hear your thoughts. I guess the background task should rather use the global context instead of a request context...

I'd be happy to add the examples. Just let me know how (fork and pull request?)

@mattrjacobs
Copy link
Contributor

@ruhkopf Will look at that example on Monday and give you feedback.

For contributing, fork and pull request is exactly right

@restfulhead
Copy link
Contributor

Thanks, what branch should I base my changes on?

@mattrjacobs
Copy link
Contributor

1.4.x. I'll forward-port to master

On Mon, Sep 21, 2015 at 3:29 PM, Patrick Ruhkopf notifications@github.com
wrote:

Thanks, what branch should I base my changes on?


Reply to this email directly or view it on GitHub
#895 (comment).

restfulhead added a commit to restfulhead/Hystrix that referenced this issue Sep 22, 2015
This was based on the discussion in Netflix#895. I've modified the original example to be compatible with JDK 6, removed the dependency to ICU and refactored the test class to be part of the command class to match the pattern used for other examples.
@restfulhead
Copy link
Contributor

Done: #905

Any thoughts yet on my other question? Let me add some more background information. Here's what I currently have:

  • an API (let's say a product service) that exposes Observables
  • an implementation of the API based on HystrixObservableCollapser / HystrixObservableCommand
    • these commands by default operate on the request context (for collapsing and the request log)
  • the Hystrix servlet filter that initializes and shuts down the context

All is fine when the work is completed inside one request (i.e. block at the end). However, now let's say we add auditing, for example, when someone accesses a product, we also want to store this request in the database. This should happen in the background, i.e. we don't want to wait until this action completes. Instead we return the product immediately and asynchronously subscribe to the auditing service. Without modifying any code, we then run into an issue, because the Hystrix servlet shuts down the context and then the asynchronous command to the auditing service fails (Hystrix context not initialized). This can be seen in IssueWithBackgroundTasks.demonstrateSyncRequestThatSchedulesAnAsyncTask.

One solution would be to change the collapser mode to global instead of request (and disable request log) for this command instance. However, from a design perspective this is not great, because it's invasive. My service layer at the moment does not expose any commands, only observables. That makes it impossible to change any settings of one command instance, without modifying the signature of the API. I am wondering if there is a way to a) make Hystrix request log/collapser smarter to detect this (I'm not sure how) or use a specific scheduler that would provide the request context for this case...

@mattrjacobs
Copy link
Contributor

My first thought is that converting the background task from collapser -> command should work. Collapsing requires a HystrixRequestContext, but a HystrixCommand/HystrixObservableCommand does not. Since you're explicitly trying to do work outside of a request scope, semantically having no request scope makes sense.

Let me see if I can get a working code example that follows this approach.

@mattrjacobs
Copy link
Contributor

Here's a working example where I replaced the background task with a HystrixObservableCommand instead of a HystrixObservableCollapser. https://github.com/mattrjacobs/ExampleAsyncHystrixObservableCollapserExample/commit/76040a71e4a64c12fedbfb06646909caf27d9d69. This works because the HystrixObservableCommand does not require a HystrixRequestContext.

A different approach would be to continue to use the HystrixObservableCollapser, but make it GLOBAL-scoped (as you mentioned above).

Putting it differently, you can't use a REQUEST-scoped HystrixObservableCollapser outside of a request scope, so you have to change that code somehow.

@restfulhead
Copy link
Contributor

Ok, thanks for getting back. I was afraid this would be the answer. ;-)

Unfortunately this means I will have to change my API. Currently, I only expose Observable. In fact, the caller is not even aware that Hystrix is used in the implementation. But in order to change the command settings (GLOBAL) or to use an entire differently command (HystrixObservableCommand instead of HystrixObservableCollapser), I need to either expose those, or perhaps add a flag boolean useRequestScope or similar to the method signature. (To be clear: I don't want to change the command default settings; 99% of the cases we don't use it in the background.)

I noticed another thing, too: I was surprised to see my real world code fail, even when not using the collapser. I compared it with your example (thanks, btw, for providing it) and could not find a difference. Then I started debugging and saw the following code in AbstractCommand:

if (concurrencyStrategy instanceof HystrixConcurrencyStrategyDefault) {
 // if we're using the default we support only optionally using a request context
 if (HystrixRequestContext.isCurrentThreadInitialized()) {
   currentRequestLog = HystrixRequestLog.getCurrentRequest(concurrencyStrategy);
 } else {
   currentRequestLog = null;
 }
} else {
 // if it's a custom strategy it must ensure the context is initialized
 if (HystrixRequestLog.getCurrentRequest(concurrencyStrategy) != null) {
   currentRequestLog = HystrixRequestLog.getCurrentRequest(concurrencyStrategy);
 } else {
   currentRequestLog = null;
 }
}

I am using a custom HystrixConcurrencyStrategy. (My project has it's own version of a context, that we need to propagate). So the command fails, because it always tries to use the request log. In the source code it says:

// if it's a custom strategy it must ensure the context is initialized

I'm confused by this comment. Why must it be ensured? Why not optionally? Unforuntately this breaks my code. Do you think this is something that could be changed?

@restfulhead
Copy link
Contributor

Or maybe at least allow to extend from HystrixConcurrencyStrategyDefault?

@amitcse
Copy link
Author

amitcse commented Sep 24, 2015

I'm confused by this comment. Why must it be ensured? Why not optionally? Unforuntately this breaks my code. Do you think this is something that could be changed?

@ruhkopf Refer to this comment from Ben. Also, you can modify your custom strategy as suggested by pbetkier on the same thread.

@restfulhead
Copy link
Contributor

Thanks @amitcse, that works.

@mattrjacobs
Copy link
Contributor

Thanks @amitcse for that pointer. That's exactly right.

@ruhkopf - I will add this to the plugin docs, so that others don't get bitten by this.

Regarding IssueWithBackgroundTasks, since HystrixObservableCommand<T> and HystrixObservableCollapser<T> are both capable of returning Observable<T>, is there any way to create an interface with just Observables in the signatures, and then have concrete implementations use commands/collapsers underneath. Just a thought - I'm sure your real usecase is much more complex than the IssueWithBackgroundTasks example.

@restfulhead
Copy link
Contributor

FYI, I actually needed to implement initialValue and shutdown, too. Here's the snippet:

@Override
    public <T> HystrixRequestVariable<T> getRequestVariable(final HystrixRequestVariableLifecycle<T> rv)
    {
        return new HystrixRequestVariableDefault<T>()
        {
            @Override
            public T initialValue()
            {
                return rv.initialValue();
            }

            @Override
            public void shutdown(final T value)
            {
                rv.shutdown(value);
            }

            @Override
            public T get()
            {
                if (!HystrixRequestContext.isCurrentThreadInitialized())
                {
                    return null;
                }
                return super.get();
            }
        };
    }

Re use case: My API has a function like Observable<Response> deleteItem(String id). We are able to batch multiple delete requests. Most consumers of the API call and block, so we can make use of the request context. Other consumers might not want to block. At the moment the API implementation does not know how it's used. It therefore can't make the decision of whether to use the Collapser or not. I could change this to Observable<Response> deleteItem(String id, boolean enableRequestCtx) to make it explicit, but was hoping to be able to avoid this.

mattrjacobs pushed a commit to mattrjacobs/Hystrix that referenced this issue Sep 27, 2015
This was based on the discussion in Netflix#895. I've modified the original example to be compatible with JDK 6, removed the dependency to ICU and refactored the test class to be part of the command class to match the pattern used for other examples.
@mattrjacobs
Copy link
Contributor

Given the work done in #951, I think this is better-handled now. OK to close this, @ruhkopf ?

@restfulhead
Copy link
Contributor

Yep, I would say so. Thanks again for your help.

@mattrjacobs
Copy link
Contributor

Great, thanks for following up. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants