ExecutionIsolationStrategy.SEMAPHORE and interrupt-on-timeout #835

Illapikov · 2015-07-20T09:54:10Z

Hello,

Agenda:
I have Hystrix command, which shouldn't allow more than 10 concurrent requests. At the same time, I want to add timeout to that requests.

Issue:
If I have property (executionTimeoutInMilliseconds) set up to 500 and request lasts 5000 - I will have execution lasts more than 5000. And only then request will be marked as timed out.

Question:
Is there any other ways to do all that (limit concurrent requests and have a timeout interruption at same time) ?

Thanks, Sergey

mattrjacobs · 2015-07-22T20:56:36Z

In production at Netflix, almost all commands run using THREAD isolation: https://github.com/Netflix/Hystrix/wiki/Configuration#thread-or-semaphore.

This allows for both concurrent restrictions (via sizing the thread pool appropriately) and the ability to interrupt the thread on timeout. Then it's up to the implementation of the HystrixCommand to respect the interruption appropriately.

rikcarve · 2015-07-28T10:04:13Z

The above answer does not satisfy our needs. We do have the requirement of using semaphore and having a timeout set. But the timeout (as stated in the issue) does not interrupt the execute() command and therefore does not work for us. I'm wondering if this is something that is going to be implemented?

"Timeouts now apply to semaphore-isolated commands as well as thread-isolated commands. Before 1.4.x, semaphore-isolated commands could not timeout. They now have a timeout registered on another (HystrixTimer) thread, which triggers the timeout flow. If you use semaphore-isolated commands, they will now see timeouts"

This apparently isn't fully correct...

mattrjacobs · 2015-08-03T17:27:11Z

@rikcarve If you're using a semaphore-isolated command, then the caller thread runs the HystrixCommand/HystrixObservableCommand. Since this thread is doing arbitrary work outside of Hystrix, interrupting it has consequences outside of Hystrix (which we don't want).

mdjnewman · 2015-08-27T06:34:25Z

@mattrjacobs that makes sense, but does this mean the paragraph that @rikcarve posted above (from the change log and migration guide) is incorrect?

I see in logs that the fallback is called after the execution timeoutInMilliseconds is reached, but commands that are triggered with .execute() don't return at that point. Does the above paragraph only refer to the fallback being called and the timeout being recorded in metrics calculation?

mattrjacobs · 2015-08-28T23:40:35Z

I wrote a unit test that I believe demonstrates the case you're asking about here: mattrjacobs@de6a194.

In this unit test, the SEMAPHORE command sleeps for 2000ms, and has a timeout configured for 100ms. The test asserts that the timeout fires, the command hits fallback and returns the fallback result

mdjnewman · 2015-08-29T05:34:59Z

Thanks for looking into this Matt.

I think the command in that test is running with THREAD isolation - after the change in mdjnewman@83e42b3209052c1e29119e076484668b1ecbbbff, the test fails with

Unsuccessful Execution and fallback took : 2036

java.lang.AssertionError
    at org.junit.Assert.fail(Assert.java:92)
    at org.junit.Assert.assertTrue(Assert.java:43)
    at org.junit.Assert.assertTrue(Assert.java:54)
    at com.netflix.hystrix.HystrixCommandTest.testSemaphoreExecutionWithTimeoutAndFallback(HystrixCommandTest.java:3552)
    ...

This makes sense given that Hystrix won't interrupt threads it doesn't control, but I think the confusion (or my confusion at least) was caused by the wording in the documentation.

Sorry if I've missed anything, I'm relatively new to the internals of Hystrix.

mattrjacobs · 2015-08-29T15:47:03Z

Yeah, my mistake on the unit tests - thanks for pointing that out. The confusion lies in the distinction between HystrixCommand and HystrixObservableCommand.

In a HystrixCommand, a semaphore-isolated command runs directly on the caller thread, and so there is, as you correctly pointed out, no good way to interrupt that work without possibly disrupting other application-level work.

When that same HystrixCommand uses thread-isolation instead, that thread may be interrupted without any danger to the application threads, so a timeout attempts a forcible interrupt (which is configurable).

On the other hand, when a HystrixObservableCommand is used, an Observable is provided in the construct() method. This implies that the work if there is I/O work, it's already happening somewhere else (and not in the thread running the construct() method).

In this case, there's not much point to using thread-isolation (see #805 (comment)). When using semaphore-isolation here, it is OK to attempt to unsubscribe from the work being done. See HystrixObservableCommandTest.testSemaphoreIsolatedObserveTimeoutWithSuccessAndFallback for an example of how this works.

I agree that the docs don't have this level of nuance. I'll take a pass and get that cleaned up.

billyy · 2016-09-07T16:17:17Z

Matt,

I just saw this thread and confirm my finding in a separate issue. If I would replay it, it means that if the timeout is set to 1 second and the call actually took 5 seconds. The time for the call to return is really 5 second, not 1 second. I understand what you stated about not wanting to interrupt the caller thread. So can we just simply state that there is no thread interrupt for semaphore?

mattrjacobs · 2016-09-07T19:56:33Z

Yep - you got it :)

bhuvangu · 2019-11-18T18:33:54Z

If one want to interupt the user thread and want to use semaphore, then i am using Timer to register a timer task, which calls the interupt and if actual task gets finish i call the cancel on the returned TimerTask
is it a good idea? what can be the hidden caveats?

public TimerTask registerInterupt(Thread currentThread, int executionTimeoutInMilliseconds) {
        TimerTask tmerTask = new TimerTask() {
            @Override
            public void run() {
                if(!currentThread.isInterrupted()){
                    currentThread.interrupt();
                }
            }
        };
        timer.schedule(tmerTask, executionTimeoutInMilliseconds);
        return tmerTask;
    }

mattrjacobs closed this as completed Nov 13, 2015

ryanjbaxter mentioned this issue Mar 22, 2017

Zuul won't stop retry after hystrix is timed out spring-cloud/spring-cloud-netflix#1772

Closed

RobWin mentioned this issue May 23, 2017

Timeout Decorator resilience4j/resilience4j#142

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExecutionIsolationStrategy.SEMAPHORE and interrupt-on-timeout #835

ExecutionIsolationStrategy.SEMAPHORE and interrupt-on-timeout #835

Illapikov commented Jul 20, 2015

mattrjacobs commented Jul 22, 2015

rikcarve commented Jul 28, 2015

mattrjacobs commented Aug 3, 2015

mdjnewman commented Aug 27, 2015

mattrjacobs commented Aug 28, 2015

mdjnewman commented Aug 29, 2015

mattrjacobs commented Aug 29, 2015

billyy commented Sep 7, 2016

mattrjacobs commented Sep 7, 2016

bhuvangu commented Nov 18, 2019

ExecutionIsolationStrategy.SEMAPHORE and interrupt-on-timeout #835

ExecutionIsolationStrategy.SEMAPHORE and interrupt-on-timeout #835

Comments

Illapikov commented Jul 20, 2015

mattrjacobs commented Jul 22, 2015

rikcarve commented Jul 28, 2015

mattrjacobs commented Aug 3, 2015

mdjnewman commented Aug 27, 2015

mattrjacobs commented Aug 28, 2015

mdjnewman commented Aug 29, 2015

mattrjacobs commented Aug 29, 2015

billyy commented Sep 7, 2016

mattrjacobs commented Sep 7, 2016

bhuvangu commented Nov 18, 2019