-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recurring Task getting stuck #115
Comments
It sounds to me as though the task (your code) is still running? Take a threaddump of the jvm to find out what it is doing. |
|
From https://github.com/kagkarlsson/db-scheduler#things-to-note--gotchas:
|
If you want guarantees that all Instants in a schedule is executed, you need another type of task, for example a recurring task that "spawns" one-time executions for all upcoming Instants regularly |
It just happened to me the exact same thing @RLHawk1 is reporting. A recurrent task got stuck 3 days ago with the last_heartbeat column still being updated regularly. My guess is that somehow db-scheduler wasn't able to update the db row when the task ended, maybe due to a network issue, and caused the task to never be freed. I'm using db-schedule version 6.8 with MySQL 5.6. |
Hmm, curious. Are you able to locate any logging that might be related to this? |
If someone encounters this, would you please take a threaddump? |
When this happened the task being executed failed due to an apparent network issue, though the exception was captured and returned as having succeeded but db-scheduler didn't mark it as complete, and that's why I suggest there may have been a problem updating the scheduler db row due a short network glitch. Don't have any db-scheduler specific log message. As @RLHawk1 reported by setting the picked_by column to null and picked to 0 it came back to normal, no additional actions necessary. |
If the The code handling it looks like: currentlyProcessing.put(pickedExecution.get(), new CurrentlyExecuting(pickedExecution.get(), clock));
try {
statsRegistry.register(StatsRegistry.CandidateStatsEvent.EXECUTED);
executePickedExecution(pickedExecution.get());
} finally {
if (currentlyProcessing.remove(pickedExecution.get()) == null) {
// May happen in rare circumstances (typically concurrency tests)
LOG.warn("Released execution was not found in collection of executions currently being processed. Should never happen.");
}
addedDueExecutionsBatch.oneExecutionDone(() -> triggerCheckForDueExecutions());
} So there is a |
Do you have the jvm in which the error occurred running right now? (and have not been restarted?) @cbarbosa2 |
I do. |
Are you able to get a threaddump of it? And extract the state(stacktrace) of the db-scheduler threads? Additionally, do you have a socket-timeout set on the |
@cbarbosa2 Saying that your issue was after a network request also made me look into that code more as mine is also after a network request. It looks like this is an issue on my end. I did not realize that the default for a Http(s)URLConnection was to never timeout. So I think my request was simply hanging indefinitely leading to the execution never finishing as @kagkarlsson suggested. Thank you for your help tracking this down and for such an excellent library! |
Indeed you nailed it @kagkarlsson ! I was mistaken by a log message in the same period that threw an HTTP error but by doing the thread dump as you suggested I realized it was indeed stuck in the HTTP call: |
Thanks so much for your quick and eye-opening suggestions! Your lib rocks! I'm so glad I got rid of Quartz and replaced it by yours which is way easier to handle with! |
Sorry for commenting here, but I can't find any way to contact you directly. @kagkarlsson, do you accept donations? This was such excellent support on an open source project and I see you've gone above and beyond by creating #116 to help catch and track down any similar issues in the future. I can't do much but I would love to do something to support your efforts! |
Very good! I was worried there a bit when we had two similar issues :) @cbarbosa2 Thanks for the feedback! Might I quote you on that if I were to put up some sort of "user testimonials" on the front page? @RLHawk1 Thanks! I haven't thought about donations, maybe I should look into it and add a link. I mostly do it because I enjoy building something that is proving useful to others :). For now, spreading the word and if possible adding yourself as users to Who uses db-scheduler? would be helpful. It would also be helpful to have a "user testimonial" on how your experience with db-scheduler have been :) |
Feel free to quote me at will. I work for Becker Professional Education, and we've replaced Quartz by db-scheduler a month ago in production on our main product and so far I'm more than glad we did. It's way simpler and provides all the capabilities we need. |
I have a recurring task that's supposed to run more or less every 30 seconds. It's a polling service that handles tasks as they come in which can take longer to run if there's anything to process.
I keep having an issue though where it gets "stuck". The DB record shows it was picked and which machine it was picked by indefinitely and doesn't keep executing. The last_heartbeat column gets updated regularly. But the last_success column stays stale. The only way I've found to get it back running again is to set the picked_by column to null and set picked to 0.
Do you have any advice for how I can make this more robust? Some way of just resuming the usual schedule if it ever gets stuck for more than 10 minutes or something like that? I am also trying to figure out how/why it's getting stuck to begin with, obviously, but I'm assuming that's some issue on my end and not related to db-scheduler.
I'm currently using db-scheduler version 6.7.
The text was updated successfully, but these errors were encountered: