Bubble-up exceptions from scheduler #38317

henningandersen · 2019-02-04T10:28:10Z

Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.

This is a continuation of #38014

Primary review target is Scheduler.SafeScheduledThreadPoolExecutor.afterExecute

Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. This is a continuation of elastic#38014

elasticmachine · 2019-02-04T10:28:11Z

Pinging @elastic/es-core-infra

Fixed NPE that caused CCR tests (IndexFollowingIT and likely others) to fail.

scheduleUnlessShuttingDown could bubble rejected exception to uncaught exception handler when not using SAME executor. Now ignore rejected exception if executor is shutdown.

Fixed test failure.

Checkstyle fix.

henningandersen · 2019-02-04T16:37:17Z

@elasticmachine please run elasticsearch-ci/2

ywelsch

I've left one question.

ywelsch · 2019-02-04T17:57:42Z

server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java

@@ -354,8 +354,11 @@ public ScheduledCancellable schedule(Runnable command, TimeValue delay, String e
    }

    public void scheduleUnlessShuttingDown(TimeValue delay, String executor, Runnable command) {
+        if (!Names.SAME.equals(executor)) {
+            command = new ThreadedRunnableAllowShutdown(command, executor(executor));


what's the reason for this change? It's unrelated to this PR, no?

It is related to this PR. If you schedule something on a non-SAME executor and then shutdown the threadpool afterwards (before the delay is passed), the execute on the executor will fail, causing an exception. This exception is thrown on the scheduler thread. So far this caused no issues, since it was ignored (and recently logged). But with the changes in this PR, the exception was bubbled to the uncaught exception handler causing tests to fail.

Above fixes that such that scheduleUnlessShuttingDown allows both the schedule call itself to pass and the subsequent execute on the executor.

Will the regular .schedule method not suffer from the same issue?

Yes, it will potentially, though none of the CI runs ran into it.

I guess we could argue that before the warning/exception changes, this would go unnoticed for both schedule and scheduleUnlessShuttingDown and therefore we should silently ignore this in both cases?

I think the shutdown procedure is relevant to this question. ThreadPool.terminate does a regular shutdown first on both scheduler and pool executors. This allows any scheduled tasks on the ScheduledThreadPoolExecutor to run to completion but does not allow executing them on the pool executor.

So scheduling jobs using SAME executor with delay until after shutdown will complete, using any other executor will not.

Given that we have both methods, I think the right solution is to make sure that tasks scheduled to run within the terminate timeout will run. I think this belongs in a follow-up PR. Two options:

During terminate, we can do awaitTermination of the scheduler before calling shutdown on the rest of the executors.

If a scheduled job fails to call execute on the executor, we simply call it in current thread (ie. one of the scheduler threads).

Discussed this on another channel. Conclusion is to silently ignore scheduled tasks that fail to re-execute on the target executor if the target executor is shutdown (ie. like it was before all our changes).

scheduleUnlessShuttingDown was primarily added to avoid having to catch and handle the rejected exception everywhere. Down the road we should likely merge the two methods into one after analyzing all usages.

henningandersen · 2019-02-04T19:03:36Z

Thanks @ywelsch, have responded to your question above.

henningandersen · 2019-02-05T10:08:11Z

@elasticmachine run elasticsearch-ci/1
@elasticmachine run elasticsearch-ci/2

Like scheduleUnlessShuttingDown, we want to silently ignore exceptions thrown during execute on target executor. Once ThreadPool.terminate() has done shutdown of thread pools, all scheduled tasks not using SAME executor will not run.

ywelsch

LGTM

henningandersen · 2019-02-05T13:51:34Z

@elasticmachine run elasticsearch-ci/1
@elasticmachine run elasticsearch-ci/2
@elasticmachine run elasticsearch-ci/default-distro

henningandersen · 2019-02-05T15:01:49Z

@elasticmachine run elasticsearch-ci/default-distro

henningandersen · 2019-02-05T15:07:46Z

@elasticmachine run elasticsearch-ci/1

henningandersen · 2019-02-05T16:21:20Z

@elasticmachine run elasticsearch-ci/1

henningandersen · 2019-02-05T18:10:49Z

@elasticmachine run elasticsearch-ci/packaging-sample

henningandersen · 2019-02-05T18:16:36Z

@elasticmachine run elasticsearch-ci/packaging-sample

Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. Fixed NPE in GlobalCheckPointListeners that caused CCR tests (IndexFollowingIT and likely others) to fail. This is a continuation of #38014 Backports #38317

* master: Add an authentication cache for API keys (elastic#38469) Fix exit code in certutil packaging test (elastic#38393) Enable logs for intermittent test failure (elastic#38426) Disable BWC to backport recovering retention leases (elastic#38477) Enable bwc tests now that elastic#38443 is backported. (elastic#38462) Fix Master Failover and DataNode Leave Blocking Snapshot (elastic#38460) Recover retention leases during peer recovery (elastic#38435) Set update mappings mater node timeout to 30 min (elastic#38439) Assert job is not null in FullClusterRestartIT (elastic#38218) Update ilm-api.asciidoc, point to REMOVE policy (elastic#38235) (elastic#38463) SQL: Fix esType for DATETIME/DATE and INTERVALS (elastic#38179) Handle deprecation header-AbstractUpgradeTestCase (elastic#38396) XPack: core/ccr/Security-cli migration to java-time (elastic#38415) Disable bwc tests for elastic#38443 (elastic#38456) Bubble-up exceptions from scheduler (elastic#38317) Re-enable TasksClientDocumentationIT.testCancelTasks (elastic#38234) Allow custom authorization with an authorization engine (elastic#38358) CRUDDocumentationIT fix documentation references Remove support for internal versioning for concurrency control (elastic#38254)

henningandersen added >bug :Core/Infra/Core Core issues without another label v7.0.0 v6.7.0 labels Feb 4, 2019

henningandersen added 6 commits February 4, 2019 13:54

NPE fix in GlobalCheckPointListeners

c75b7d1

Fixed NPE that caused CCR tests (IndexFollowingIT and likely others) to fail.

Fix ThreadPool.scheduleUnlessShuttingDown

5f851e7

scheduleUnlessShuttingDown could bubble rejected exception to uncaught exception handler when not using SAME executor. Now ignore rejected exception if executor is shutdown.

Fix ThreadPool.scheduleUnlessShuttingDown

3e14992

scheduleUnlessShuttingDown could bubble rejected exception to uncaught exception handler when not using SAME executor. Now ignore rejected exception if executor is shutdown.

Fix ThreadPool.scheduleUnlessShuttingDown

8ed0a92

Fixed test failure.

Fix ThreadPool.scheduleUnlessShuttingDown

7348731

Fixed test failure.

Fix ThreadPool.scheduleUnlessShuttingDown

f44bdc8

Checkstyle fix.

henningandersen requested a review from ywelsch February 4, 2019 17:14

ywelsch suggested changes Feb 4, 2019

View reviewed changes

ThreadPool.schedule disregard shutdown

2518d7e

Like scheduleUnlessShuttingDown, we want to silently ignore exceptions thrown during execute on target executor. Once ThreadPool.terminate() has done shutdown of thread pools, all scheduled tasks not using SAME executor will not run.

henningandersen requested a review from ywelsch February 5, 2019 12:16

ywelsch approved these changes Feb 5, 2019

View reviewed changes

henningandersen mentioned this pull request Feb 5, 2019

Bubble-up exceptions from scheduler #38441

Merged

henningandersen merged commit 20c66c5 into elastic:master Feb 5, 2019

colings86 added the v7.0.0-beta1 label Feb 7, 2019

colings86 removed the v7.0.0 label Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bubble-up exceptions from scheduler #38317

Bubble-up exceptions from scheduler #38317

henningandersen commented Feb 4, 2019

elasticmachine commented Feb 4, 2019

henningandersen commented Feb 4, 2019

ywelsch left a comment

ywelsch Feb 4, 2019

henningandersen Feb 4, 2019

ywelsch Feb 4, 2019

henningandersen Feb 4, 2019

henningandersen Feb 5, 2019

henningandersen Feb 5, 2019

henningandersen commented Feb 4, 2019

henningandersen commented Feb 5, 2019

ywelsch left a comment

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

Bubble-up exceptions from scheduler #38317

Bubble-up exceptions from scheduler #38317

Conversation

henningandersen commented Feb 4, 2019

elasticmachine commented Feb 4, 2019

henningandersen commented Feb 4, 2019

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Feb 4, 2019

Choose a reason for hiding this comment

henningandersen Feb 4, 2019

Choose a reason for hiding this comment

ywelsch Feb 4, 2019

Choose a reason for hiding this comment

henningandersen Feb 4, 2019

Choose a reason for hiding this comment

henningandersen Feb 5, 2019

Choose a reason for hiding this comment

henningandersen Feb 5, 2019

Choose a reason for hiding this comment

henningandersen commented Feb 4, 2019

henningandersen commented Feb 5, 2019

ywelsch left a comment

Choose a reason for hiding this comment

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019

henningandersen commented Feb 5, 2019