Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HashedWheelTimer for command expiration management to reduce thread context switches and improve performance #2773

Closed
yangty89 opened this issue Feb 29, 2024 · 12 comments
Labels
type: feature A new feature
Milestone

Comments

@yangty89
Copy link
Contributor

yangty89 commented Feb 29, 2024

Feature Request

Hi @mp911de, I'm recently working on our discussion about Lettuce command expiration improvement :)

Is your feature request related to a problem? Please describe

While analyzing the performance of Lettuce using flame graph, I found that a portion of time was "consumed" by command expiration management, both in client threads and I/O thread.

Flame graph of client thread:
caller-thread-fg

Flame graph of I/O thread:
io-thread-fg

We can see from the two graphs above that the calls of signalNotEmpty method in LinkedBlockingQueue "occupied" a relative large portion of time. In my opinion, the calls of signalNotEmpty were not time consuming in themselves, however they woke up the executor thread and cause the current thread to be swapped out.

Describe the solution you'd like

I tried to use netty's HashedWheelTimer (defined in DefaultClientResources) instead of the original DefaultEventExecutorGroup for command expiration management and was able to gain a performance boost of about 20% to 60% (vary with the computation thread number for the original approach) in RedisClientBenchmark#syncSet benchmark.

Original approach:

    private void potentiallyExpire(RedisCommand<?, ?, ?> command, ScheduledExecutorService executors) {

        long timeout = applyConnectionTimeout ? this.timeout : source.getTimeout(command);

        if (timeout <= 0) {
            return;
        }

        ScheduledFuture<?> schedule = executors.schedule(() -> {
            if (!command.isDone()) {
                command.completeExceptionally(
                        ExceptionFactory.createTimeoutException(Duration.ofNanos(timeUnit.toNanos(timeout))));
            }
        }, timeout, timeUnit);

        if (command instanceof CompleteableCommand) {
            ((CompleteableCommand) command).onComplete((o, o2) -> {
                if (!schedule.isDone()) {
                    schedule.cancel(false);
                }
            });
        }

    }

Hashed wheel timer approach:

    private void potentiallyExpire(RedisCommand<?, ?, ?> command, ScheduledExecutorService executors) {

        long timeout = applyConnectionTimeout ? this.timeout : source.getTimeout(command);

        if (timeout <= 0) {
            return;
        }

        Timeout commandTimeout = timer.newTimeout(t -> {
            if (!command.isDone()) {
                command.completeExceptionally(
                        ExceptionFactory.createTimeoutException(Duration.ofNanos(timeUnit.toNanos(timeout))));
            }
        }, timeout, timeUnit);

        if (command instanceof CompleteableCommand) {
            ((CompleteableCommand) command).onComplete((o, o2) -> commandTimeout.cancel());
        }

    }

As HashedWheelTimer use lock-free queues for adding and cancelling timeout events, it avoids thread context switches for both client threads and I/O thread, and thus performs better under heavy load compared to LinkedBlockingQueue backed approach.

I've performed some benchmarks on syncSet on my notebook (4-core) and got the result below:

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @Threads(100)
    @Warmup(iterations = 1, time = 30, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 1, time = 30, timeUnit = TimeUnit.SECONDS)
    public void syncSet() {
        connection.sync().set(KEY, KEY);
    }
  • Original approach with default computation thread pool size (8 threads in my case): about 7k ops
  • Original approach with minimum computation thread pool size (2 threads according to MIN_COMPUTATION_THREADS): about 9k ops
  • Hashed wheel timer approach: about 11k ops (a performance boost of about 20% to 60% compared to original approaches)

Describe alternatives you've considered

According to the benchmarks mentioned above, restricting the computation thread pool size to a low level might also help to improve the performance under heavy load, while using a hashed wheel timer seems a better solution for me.

Teachability, Documentation, Adoption, Migration Strategy

NA

I'll prepare a pull request if project owners agree with this modification, and please let me know if more work should be done on this feature :)

@mp911de
Copy link
Collaborator

mp911de commented Feb 29, 2024

Thanks a lot for looking into this. What do you think about sending the actual cancellation onto DefaultEventExecutorGroup? I like the approach of using the timer for scheduling, however, the EventExecutor should be the actual worker and expiry should be only an exception, not the regular case.

Also, depending on what is going to happen upon cancellation, the timer thread might be kept busy with cancellation and cannot perform tasks like reconnect in time.

Let me know what you think.

@yangty89
Copy link
Contributor Author

yangty89 commented Mar 1, 2024

Thanks for your reply and suggestion :) I'm taking a deeper look into HashedWheelTimer and the components of Lettuce that use the timer.

As for sending the actual timeout cancellation to DefaultEventExecutorGroup, I wonder if it means that we submit the cancellation via the executor service, like this:

    private void potentiallyExpire(RedisCommand<?, ?, ?> command, ScheduledExecutorService executors) {

        //...

        Timeout commandTimeout = timer.newTimeout(t -> {
            if (!command.isDone()) {
                command.completeExceptionally(
                        ExceptionFactory.createTimeoutException(Duration.ofNanos(timeUnit.toNanos(timeout))));
            }
        }, timeout, timeUnit);

        if (command instanceof CompleteableCommand) {
            ((CompleteableCommand) command).onComplete((o, o2) -> executors.submit(timeoutTask::cancel));
        }

    }

However, the cancel method of the Timeout simply put the object itself into a lock-free queue (and it's the timer's worker thread that will process the actual cancellation), so it seems ok to call directly the cancel method.

As for the actual timeout cancellation processed by the timer's worker thread, although removing a Timeout object from the timer's internal linked list is an O(1) operation, it might delay other important tasks under heavy load, such as reconnection as you pointed out. I made a search in the project and found that we use the timer in ConnectionWatchDog handler for reconnection:

            this.reconnectScheduleTimeout = timer.newTimeout(it -> {

                reconnectScheduleTimeout = null;

                if (!isEventLoopGroupActive()) {
                    logger.warn("Cannot execute scheduled reconnect timer, reconnect workers are terminated");
                    return;
                }

                reconnectWorkers.submit(() -> {
                    ConnectionWatchdog.this.run(attempt, delay);
                    return null;
                });
            }, timeout, TimeUnit.MILLISECONDS);

Here we submit the action of reconnection to a executor, which is reasonable for me. However, I wonder if we could schedule the action via the executor directly, like this:

            reconnectWorkers.schedule(() -> {

                //...

                ConnectionWatchdog.this.run(attempt, delay);
            }, timeout, TimeUnit.MILLISECONDS);

In my opinion, we might consider scheduling important tasks (e.g. reconnection) via DefaultEventExecutorGroup, and other timeout events via HashedWheelTimer, so as to maintain the robustness as well as high performance of Lettuce client. Alternatively, we might also consider reducing the tickDuration of the timer to execute tasks with more time accuracy.

As I'm not yet familiar with the whole code base and might have made some mistakes, I would like to know your opinion on this, thanks :)

@mp911de
Copy link
Collaborator

mp911de commented Mar 1, 2024

ConnectionWatchDog already follows the scheme of efficient scheduling and running the nested task on a different thread. So I think we could apply the same pattern to command expiration. Cancelling the timeout can remain on a much simpler implementation:

(CompleteableCommand) command).onComplete((o, o2) -> timeoutTask::cancel);

In fact, we only cancel the timeout task as optimization. The underlying future no-ops if it was already completed.

Let me know what details you need from my side to continue here.

@mp911de mp911de added the type: feature A new feature label Mar 1, 2024
@yangty89
Copy link
Contributor Author

yangty89 commented Mar 1, 2024

Thanks for the further information, I understand better the point now :)

I'd like to follow the scheme mentioned and the code should be as follows, if I'm not mistaken:

    private void potentiallyExpire(RedisCommand<?, ?, ?> command, ScheduledExecutorService executors) {

        long timeout = applyConnectionTimeout ? this.timeout : source.getTimeout(command);

        if (timeout <= 0) {
            return;
        }

        Timeout commandTimeout = timer.newTimeout(t -> {
            if (!command.isDone()) {
                executors.submit(() -> command.completeExceptionally(
                        ExceptionFactory.createTimeoutException(Duration.ofNanos(timeUnit.toNanos(timeout)))));
            }
        }, timeout, timeUnit);

        if (command instanceof CompleteableCommand) {
            ((CompleteableCommand) command).onComplete((o, o2) -> commandTimeout.cancel());
        }

    }

I'll prepare a pull request if it's ok with you :)

@yangty89
Copy link
Contributor Author

yangty89 commented Mar 1, 2024

Also I noticed that the default timeout duration of a command is 60 seconds, while the default tick duration and wheel size of the timer is 100ms and 512 respectively, resulting in a round duration of 51.2 seconds -- which means that with these default configurations, if a command expires, it will be processed twice by the timer (remainingRounds decreased to 0 at the first round, and finally expired by the timer at the second round).

In the worst case where a large number of commands expire after 60 seconds, all of them would be processed by the timer twice, which might lead to a decline of the timer's performance. So I think it might be a good idea to set a default command timeout of less than 51.2 seconds, or to increase the wheel size to 1024, thus a command would be processed by the timer only once even it expires. I would like to know your opinion on this, thanks :)

@mp911de
Copy link
Collaborator

mp911de commented Mar 4, 2024

What do you mean that a command will be processed twice? The timer runnable is called only once, right?

We do not want to update the default timeout to create surprises. I also think that any timer optimizations should happen by the library that is using Lettuce. Timeouts are in most cases subject to customization. Commands can have different timeouts and so we cannot make too many assumptions over the later runtime arrangement.

@yangty89
Copy link
Contributor Author

yangty89 commented Mar 4, 2024

I'm sorry for not explaining the point clearly. In fact, by saying that a command will be processed twice by the timer, I mean that given the default configurations of a Lettuce command and HashedWheelTimer, it will take 2 rounds for the timer to finally remove the command if it expires.

        public void expireTimeouts(long deadline) {
            HashedWheelTimeout timeout = head;
            // process all timeouts
            while (timeout != null) {
                HashedWheelTimeout next = timeout.next;
                if (timeout.remainingRounds <= 0) {
                    next = remove(timeout); // <-- second round
                    if (timeout.deadline <= deadline) {
                        timeout.expire(); // execute command expiration behavior
                    } else {
                        // The timeout was placed into a wrong slot. This should never happen.
                        throw new IllegalStateException(String.format(
                                "timeout.deadline (%d) > deadline (%d)", timeout.deadline, deadline));
                    }
                } else if (timeout.isCancelled()) {
                    next = remove(timeout);
                } else {
                    timeout.remainingRounds --; // <-- first round
                }
                timeout = next;
            }
        }

As the default timeout duration of a command is 60 seconds and the default round duration of the timer is 51.2 seconds (512 buckets * tick duration of 100ms), the remainingRounds property of the HashedWheelTimeout object (which represent a command expiration task) will be set to 1 (calculated in transferTimeoutsToBuckets method) when the object is added to a specific HashedWheelBucket of the timer. So if a command expires after 60s, at the first round when the timer's worker thread reaches the HashedWheelTimeout object in the bucket, it will only decrease its remainingRounds by 1, and it's only at the second round that the timeout object will be removed from the timer and the command's expiration behavior will be executed, as shown in the code above.

However, it's a worst-case scenario, and reaching the timeout commands twice might not cause a significant performance decline as long as there is not an excessive amount of them in the timer... I also agree that changing the default settings might bring about unexpected problems for users, so maybe we'd better leave them unchanged as you say :)

@mp911de
Copy link
Collaborator

mp911de commented Mar 4, 2024

Thanks a lot for the detail. I think, that any mechanism can be overloaded so that scheduled jobs won't be processed in time. Therefore, with the code sample above we've got the best out of two worlds: Efficient scheduling and offloading the actual work into a ExecutorService. Feel free to submit your changes as pull request.

@yangty89
Copy link
Contributor Author

yangty89 commented Mar 4, 2024

Got it, thanks. I'll submit a pull request within these days :)

@yangty89
Copy link
Contributor Author

yangty89 commented Mar 6, 2024

Hi Mark, I've created a pull request #2774, please have a look, thanks :)

mp911de pushed a commit that referenced this issue Mar 7, 2024
…d context switches and improve performance #2773

Original pull request: #2774
mp911de added a commit that referenced this issue Mar 7, 2024
Reformat code. Resolve warnings.

Original pull request: #2774
mp911de pushed a commit that referenced this issue Mar 7, 2024
…d context switches and improve performance #2773

Original pull request: #2774
mp911de added a commit that referenced this issue Mar 7, 2024
Reformat code. Resolve warnings.

Original pull request: #2774
@mp911de mp911de added this to the 6.3.2.RELEASE milestone Mar 7, 2024
@mp911de mp911de closed this as completed Mar 7, 2024
@mp911de
Copy link
Collaborator

mp911de commented Mar 7, 2024

Thanks a lot, this was a fun ticket.

@yangty89
Copy link
Contributor Author

yangty89 commented Mar 7, 2024

Thanks, I'm happy to contribute to the Lettuce project :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
2 participants