Failsafe is a lightweight, zero-dependency library for handling failures in Java 8+, with a concise API for handling everyday use cases and the flexibility to handle everything else. It works by wrapping executable logic with one or more resilience policies, which can be combined and composed as needed. These policies include:
It also provides features that allow you to integrate with various scenarios, including:
- Configurable schedulers
- Event listeners and Execution context
- Strong typing
- Asynchronous API integration
- CompletionStage and functional interface integration
- Execution tracking
- Policy SPI
Add the latest Failsafe Maven dependency to your project.
Failsafe 2.0 has API and behavior changes from 1.x. See the CHANGES doc for more details.
To start, we'll create a RetryPolicy that defines which failures should be handled and when retries should be performed:
RetryPolicy<Object> retryPolicy = new RetryPolicy<>()
.handle(ConnectException.class)
.withDelay(Duration.ofSeconds(1))
.withMaxRetries(3);
We can then execute a Runnable
or Supplier
with retries:
// Run with retries
Failsafe.with(retryPolicy).run(() -> connect());
// Get with retries
Connection connection = Failsafe.with(retryPolicy).get(() -> connect());
We can also execute a Runnable
or Supplier
asynchronously with retries:
// Run with retries asynchronously
CompletableFuture<Void> future = Failsafe.with(retryPolicy).runAsync(() -> connect());
// Get with retries asynchronously
CompletableFuture<Connection> future = Failsafe.with(retryPolicy).getAsync(() -> connect());
Multiple policies can be arbitrarily composed to add additional layers of resilience or to handle different failures in different ways:
CircuitBreaker<Object> circuitBreaker = new CircuitBreaker<>();
Fallback<Object> fallback = Fallback.of(this::connectToBackup);
Failsafe.with(fallback, retryPolicy, circuitBreaker).get(this::connect);
Order does matter when composing policies. See the section below for more details.
Policy compositions can also be saved for later use via a FailsafeExecutor:
FailsafeExecutor<Object> executor = Failsafe.with(fallback, retryPolicy, circuitBreaker);
executor.run(this::connect);
Failsafe uses policies to handle failures. By default, policies treat any Exception
as a failure. But policies can also be configured to handle more specific failures or conditions:
policy
.handle(ConnectException.class, SocketException.class)
.handleIf(failure -> failure instanceof ConnectException);
They can also be configured to handle specific results or result conditions:
policy
.handleResult(null)
.handleResultIf(result -> result == null);
Retry policies express when retries should be performed for an execution failure.
By default, a RetryPolicy will perform a maximum of 3 execution attempts. You can configure a max number of attempts or retries:
retryPolicy.withMaxAttempts(3);
And a delay between attempts:
retryPolicy.withDelay(Duration.ofSeconds(1));
You can add delay that backs off exponentially:
retryPolicy.withBackoff(1, 30, ChronoUnit.SECONDS);
A random delay for some range:
retryPolicy.withDelay(1, 10, ChronoUnit.SECONDS);
Or a computed delay based on an execution. You can add a random jitter factor to a delay:
retryPolicy.withJitter(.1);
Or a time based jitter:
retryPolicy.withJitter(Duration.ofMillis(100));
You can add a max retry duration:
retryPolicy.withMaxDuration(Duration.ofMinutes(5));
You can specify which results, failures or conditions to abort retries on:
retryPolicy
.abortWhen(true)
.abortOn(NoRouteToHostException.class)
.abortIf(result -> result == true)
And of course you can arbitrarily combine any of these things into a single policy.
Circuit breakers allow you to create systems that fail-fast by temporarily disabling execution as a way of preventing system overload. Creating a CircuitBreaker is straightforward:
CircuitBreaker<Object> breaker = new CircuitBreaker<>()
.handle(ConnectException.class)
.withFailureThreshold(3, 10)
.withSuccessThreshold(5)
.withDelay(Duration.ofMinutes(1));
When a configured threshold of execution failures occurs on a circuit breaker, the circuit is opened and further execution requests fail with CircuitBreakerOpenException
. After a delay, the circuit is half-opened and trial executions are attempted to determine whether the circuit should be closed or opened again. If the trial executions meet a success threshold, the breaker is closed again and executions will proceed as normal.
Circuit breakers can be flexibly configured to express when the circuit should be opened or closed.
A circuit breaker can be configured to open when a successive number of executions have failed:
breaker.withFailureThreshold(5);
Or when, for example, the last 3 out of 5 executions have failed:
breaker.withFailureThreshold(3, 5);
After opening, a breaker will delay for 1 minute by default before before attempting to close again, or you can configure a specific delay:
breaker.withDelay(Duration.ofSeconds(30));
The breaker can be configured to close again if a number of trial executions succeed, else it will re-open:
breaker.withSuccessThreshold(5);
The breaker can also be configured to close again if, for example, the last 3 out of 5 executions succeed, else it will re-open:
breaker.withSuccessThreshold(3, 5);
And the breaker can be configured to recognize executions that exceed a certain timeout as failures:
breaker.withTimeout(Duration.ofSeconds(10));
CircuitBreaker can provide metrics regarding the number of recorded successes or failures in the current state.
A circuit breaker can and should be shared across code that accesses inter-dependent system components that fail together. This ensures that if the circuit is opened, executions against one component that rely on another component will not be allowed until the circuit is closed again. For example, if multiple connections or requests are made to the same external server, typically they should all go through the same circuit breaker.
A CircuitBreaker can also be manually operated in a standalone way:
breaker.open();
breaker.halfOpen();
breaker.close();
if (breaker.allowsExecution()) {
try {
breaker.preExecute();
doSomething();
breaker.recordSuccess();
} catch (Exception e) {
breaker.recordFailure(e);
}
}
Fallbacks allow you to provide an alternative result for a failed execution. They can also be used to suppress exceptions and provide a default result:
Fallback<Object> fallback = Fallback.of(null);
Throw a custom exception:
Fallback<Object> fallback = Fallback.of(failure -> { throw new CustomException(failure); });
Or compute an alternative result such as from a backup resource:
Fallback<Object> fallback = Fallback.of(this::connectToBackup);
For computations that block, a Fallback can be configured to run asynchronously:
Fallback<Object> fallback = Fallback.ofAsync(this::blockingCall);
Policies can be composed in any way desired, including multiple policies of the same type. Policies handle execution results in reverse order, similar to the way that function composition works. For example, consider:
Failsafe.with(fallback, retryPolicy, circuitBreaker).get(supplier);
This results in the following internal composition when executing the supplier
and handling its result:
Fallback(RetryPolicy(CircuitBreaker(Supplier)))
This means the CircuitBreaker
is first to evaluate the Supplier
's result, then the RetryPolicy
, then the Fallback
. Each policy makes its own determination as to whether the result represents a failure. This allows different policies to be used for handling different types of failures.
A typical Failsafe configuration that uses multiple policies will place a Fallback
as the outer-most policy, followed by a RetryPolicy
, and a CircuitBreaker
as the inner-most policy:
Failsafe.with(fallback, retryPolicy, circuitBreaker)
That said, it really depends on how the policies are being used, and different compositions make sense for different use cases.
By default, Failsafe uses the ForkJoinPool's common pool to perform async executions, but you can also configure a specific ScheduledExecutorService, custom Scheduler, or ExecutorService to use:
Failsafe.with(scheduler).getAsync(this::connect);
Failsafe supports event listeners, both in the top level Failsafe API, and in the different Policy implementations.
At the top level, it can notify you when an execution completes for all policies:
Failsafe.with(retryPolicy, circuitBreaker)
.onComplete(e -> {
if (e.getResult() != null)
log.info("Connected to {}", e.getResult());
else if (e.getFailure() != null)
log.error("Failed to create connection", e.getFailure());
})
.get(this::connect);
It can notify you when an execution completes successfully for all policies:
Failsafe.with(retryPolicy, circuitBreaker)
.onSuccess(e -> log.info("Connected to {}", e.getResult()))
.get(this::connect);
Or when an execution fails for any policy:
Failsafe.with(retryPolicy, circuitBreaker)
.onFailure(e -> log.error("Failed to create connection", e.getFailure()))
.get(this::connect);
At the policy level, it can notify you when an execution succeeds or fails for a particular policy:
policy
.onSuccess(e -> log.info("Connected to {}", e.getResult()))
.onFailure(e -> log.error("Failed to create connection", e.getFailure()))
.get(this::connect);
When an execution attempt fails and before a retry is performed for a RetryPolicy:
retryPolicy
.onFailedAttempt(e -> log.error("Connection attempt failed", e.getLastFailure()))
.onRetry(e -> log.warn("Failure #{}. Retrying.", ctx.getAttemptCount()));
Or when an execution fails and the max retries are exceeded for a RetryPolicy:
retryPolicy.onRetriesExceeded(e -> log.warn("Failed to connect. Max retries exceeded."));
For CircuitBreakers, Failsafe can notify you when the state changes:
circuitBreaker
.onClose(() -> log.info("The circuit breaker was closed"));
.onOpen(() -> log.info("The circuit breaker was opened"))
.onHalfOpen(() -> log.info("The circuit breaker was half-opened"))
Failsafe can provide an ExecutionContext containing execution related information such as the number of execution attempts as well as start and elapsed times:
Failsafe.with(retryPolicy).run(ctx -> {
log.debug("Connection attempt #{}", ctx.getAttemptCount());
connect();
});
Failsafe Policies are typed based on the expected result. For generic policies that are used for various executions, the result type may just be Object
:
RetryPolicy<Object> retryPolicy = new RetryPolicy<>();
But for other policies we may declare a more specific result type:
RetryPolicy<HttpResponse> retryPolicy = new RetryPolicy<HttpResponse>()
.handleResultIf(reponse -> response.getStatusCode == 500)
.onFailedAttempt(e -> log.warn("Failed attempt: {}", e.getLastResult().getStatusCode()));
This allows Failsafe to ensure that the same result type used for the policy is returned by the execution:
HttpResponse response = Failsafe.with(retryPolicy)
.onSuccess(e -> log.info("Success: {}", e.getResult().getStatusCode()))
.get(this::sendHttpRequest);
Failsafe can be integrated with asynchronous code that reports completion via callbacks. The runAsyncExecution, getAsyncExecution and futureAsyncExecution methods provide an AsyncExecution reference that can be used to manually schedule retries or complete the execution from inside asynchronous callbacks:
Failsafe.with(retryPolicy)
.getAsyncExecution(execution -> service.connect().whenComplete((result, failure) -> {
if (execution.complete(result, failure))
log.info("Connected");
else if (!execution.retry())
log.error("Connection attempts failed", failure);
}));
Failsafe can also perform asynchronous executions and retries on 3rd party schedulers via the Scheduler interface. See the Vert.x example for a more detailed implementation.
Failsafe can accept a CompletionStage and return a new CompletableFuture with failure handling built-in:
Failsafe.with(retryPolicy)
.getStageAsync(this::connectAsync)
.thenApplyAsync(value -> value + "bar")
.thenAccept(System.out::println));
Failsafe can be used to create resilient functional interfaces:
Function<String, Connection> connect = address -> Failsafe.with(retryPolicy).get(() -> connect(address));
We can wrap Streams:
Failsafe.with(retryPolicy).run(() -> Stream.of("foo").map(value -> value + "bar"));
Individual Stream operations:
Stream.of("foo").map(value -> Failsafe.with(retryPolicy).get(() -> value + "bar"));
Or individual CompletableFuture stages:
CompletableFuture.supplyAsync(() -> Failsafe.with(retryPolicy).get(() -> "foo"))
.thenApplyAsync(value -> Failsafe.with(retryPolicy).get(() -> value + "bar"));
In addition to automatically performing retries, Failsafe can be used to track executions for you, allowing you to manually retry as needed:
Execution execution = new Execution(retryPolicy);
while (!execution.isComplete()) {
try {
doSomething();
execution.complete();
} catch (ConnectException e) {
execution.recordFailure(e);
}
}
Execution tracking is also useful for integrating with APIs that have their own retry mechanism:
Execution execution = new Execution(retryPolicy);
// On failure
if (execution.canRetryOn(someFailure))
service.scheduleRetry(execution.getWaitTime().toNanos(), TimeUnit.MILLISECONDS);
See the RxJava example for a more detailed implementation.
Failsafe provides an SPI that allows you to implement your own Policy and plug it into Failsafe. Each Policy implementation must returns a PolicyExecutor which is responsible for performing synchronous or asynchronous execution, handling pre-execution requests, or handling post-execution results. The existing PolicyExecutor implementations are a good reference for creating additional implementations.
For library and public API developers, Failsafe integrates nicely into existing APIs, allowing your users to configure retry policies for different operations. One integration approach is to subclass the RetryPolicy class and expose that as part of your API while the rest of Failsafe remains internal. Another approach is to use something like the Maven shade plugin to rename and relocate Failsafe classes into your project's package structure as desired.
Failsafe is a volunteer effort. If you use it and you like it, let us know, and also help by spreading the word!
Copyright 2015-2019 Jonathan Halterman and friends. Released under the Apache 2.0 license.