-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bubble-up exceptions from scheduler #38317
Bubble-up exceptions from scheduler #38317
Conversation
Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. This is a continuation of elastic#38014
Pinging @elastic/es-core-infra |
Fixed NPE that caused CCR tests (IndexFollowingIT and likely others) to fail.
scheduleUnlessShuttingDown could bubble rejected exception to uncaught exception handler when not using SAME executor. Now ignore rejected exception if executor is shutdown.
scheduleUnlessShuttingDown could bubble rejected exception to uncaught exception handler when not using SAME executor. Now ignore rejected exception if executor is shutdown.
Fixed test failure.
Fixed test failure.
Checkstyle fix.
@elasticmachine please run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left one question.
@@ -354,8 +354,11 @@ public ScheduledCancellable schedule(Runnable command, TimeValue delay, String e | |||
} | |||
|
|||
public void scheduleUnlessShuttingDown(TimeValue delay, String executor, Runnable command) { | |||
if (!Names.SAME.equals(executor)) { | |||
command = new ThreadedRunnableAllowShutdown(command, executor(executor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason for this change? It's unrelated to this PR, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is related to this PR. If you schedule something on a non-SAME executor and then shutdown the threadpool afterwards (before the delay is passed), the execute on the executor will fail, causing an exception. This exception is thrown on the scheduler thread. So far this caused no issues, since it was ignored (and recently logged). But with the changes in this PR, the exception was bubbled to the uncaught exception handler causing tests to fail.
Above fixes that such that scheduleUnlessShuttingDown
allows both the schedule call itself to pass and the subsequent execute
on the executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the regular .schedule
method not suffer from the same issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will potentially, though none of the CI runs ran into it.
I guess we could argue that before the warning/exception changes, this would go unnoticed for both schedule
and scheduleUnlessShuttingDown
and therefore we should silently ignore this in both cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the shutdown procedure is relevant to this question. ThreadPool.terminate
does a regular shutdown first on both scheduler and pool executors. This allows any scheduled tasks on the ScheduledThreadPoolExecutor
to run to completion but does not allow executing them on the pool executor.
So scheduling jobs using SAME executor with delay until after shutdown will complete, using any other executor will not.
Given that we have both methods, I think the right solution is to make sure that tasks scheduled to run within the terminate timeout will run. I think this belongs in a follow-up PR. Two options:
- During terminate, we can do awaitTermination of the scheduler before calling shutdown on the rest of the executors.
- If a scheduled job fails to call execute on the executor, we simply call it in current thread (ie. one of the scheduler threads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this on another channel. Conclusion is to silently ignore scheduled tasks that fail to re-execute on the target executor if the target executor is shutdown (ie. like it was before all our changes).
scheduleUnlessShuttingDown
was primarily added to avoid having to catch and handle the rejected exception everywhere. Down the road we should likely merge the two methods into one after analyzing all usages.
Thanks @ywelsch, have responded to your question above. |
@elasticmachine run elasticsearch-ci/1 |
Like scheduleUnlessShuttingDown, we want to silently ignore exceptions thrown during execute on target executor. Once ThreadPool.terminate() has done shutdown of thread pools, all scheduled tasks not using SAME executor will not run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine run elasticsearch-ci/1 |
@elasticmachine run elasticsearch-ci/default-distro |
@elasticmachine run elasticsearch-ci/1 |
1 similar comment
@elasticmachine run elasticsearch-ci/1 |
@elasticmachine run elasticsearch-ci/packaging-sample |
1 similar comment
@elasticmachine run elasticsearch-ci/packaging-sample |
Instead of logging warnings we now rethrow exceptions thrown inside scheduled/submitted tasks. This will still log them as warnings in production but has the added benefit that if they are thrown during unit/integration test runs, the test will be flagged as an error. Fixed NPE in GlobalCheckPointListeners that caused CCR tests (IndexFollowingIT and likely others) to fail. This is a continuation of #38014 Backports #38317
* master: Add an authentication cache for API keys (elastic#38469) Fix exit code in certutil packaging test (elastic#38393) Enable logs for intermittent test failure (elastic#38426) Disable BWC to backport recovering retention leases (elastic#38477) Enable bwc tests now that elastic#38443 is backported. (elastic#38462) Fix Master Failover and DataNode Leave Blocking Snapshot (elastic#38460) Recover retention leases during peer recovery (elastic#38435) Set update mappings mater node timeout to 30 min (elastic#38439) Assert job is not null in FullClusterRestartIT (elastic#38218) Update ilm-api.asciidoc, point to REMOVE policy (elastic#38235) (elastic#38463) SQL: Fix esType for DATETIME/DATE and INTERVALS (elastic#38179) Handle deprecation header-AbstractUpgradeTestCase (elastic#38396) XPack: core/ccr/Security-cli migration to java-time (elastic#38415) Disable bwc tests for elastic#38443 (elastic#38456) Bubble-up exceptions from scheduler (elastic#38317) Re-enable TasksClientDocumentationIT.testCancelTasks (elastic#38234) Allow custom authorization with an authorization engine (elastic#38358) CRUDDocumentationIT fix documentation references Remove support for internal versioning for concurrency control (elastic#38254)
Instead of logging warnings we now rethrow exceptions thrown inside
scheduled/submitted tasks. This will still log them as warnings in
production but has the added benefit that if they are thrown during
unit/integration test runs, the test will be flagged as an error.
This is a continuation of #38014
Primary review target is
Scheduler.SafeScheduledThreadPoolExecutor.afterExecute