diff --git a/documentation/zenodo_integration/delayed_jobs.md b/documentation/zenodo_integration/delayed_jobs.md index bfef722ecc..f3e7a73d6f 100644 --- a/documentation/zenodo_integration/delayed_jobs.md +++ b/documentation/zenodo_integration/delayed_jobs.md @@ -1,5 +1,41 @@ -ActiveJob / Delayed Job -========================= +# Zenodo extra copies + +## It's not processing? Why? Look at this stuff +- (On 2c) Check `~/init.d/delayed_job.dryad status` for status or start the daemon with `~/init.d/delayed_job.dryad start` +- Delayed jobs has a bare work queue in the table `delayed_jobs`. Jobs should appear here until processed successfully and then will be deleted on success. +- The application state of the zenodo jobs is maintained in `stash_engine_zenodo_copies` and also contains important info such as the deposition id (zenodo's internal id) +- Most errors and stack traces should be saved into the error field in the table above so we don't have to spend hours digging through logs to figure out problems + +## We're about to have a maintenance, shutdow or restart, what do I do? +- (On 2c) "pause" or "drain" the jobs at least a few hours or more ahead so they don't get cut off in the middle of something that takes many hours to process. +- Check status with `~/init.d/long_jobs.dryad status` +- Let jobs drain out with `~/init.d/long_jobs.dryad drain`. This really just touches two files in the `~/app/ui/releases` directory. The files are: `defer_jobs.txt` (zenodo replication) and `hold-submissions.txt` (Merritt submissions). When these files are present then the internal state of these is put into a `defered` or `rejected-shutting-down` when it's their turn to run. These states are in their own tables and not the delayed_job work queue because the delayed_job queue is very simple and dumb. + +## We just finished maintenance, what do I do? +- (On 2c) `~/init.d/long_jobs.dryad restart` . This takes the jobs that were rejected/deferred and resets their status to 'enqueued' and re-inserts them into the work queue. +- The deferred job things (zenodo) only runs on one server (2c), but currently the Merritt submission queue runs on both inside the web server processes so it is more complicated. +- For Merritt queues: + - Be sure the `~/app/ui/releases/hold-submissions.txt` files are deleted. + - Look at the Merritt queue page in the UI and note which server you're accessing. + - For the 'rejected shutting down' jobs showing as the other server (in `stash_engine_repo_queue_states`, manipulate the database to set the latest state's hostname to the server you're on. + - Click "Restart submissions which were shut down gracefully". It should re-enqueue them on your current server. + +## Example of manually submitting a 3rd copy from the console + +``` +RAILS_ENV=local_dev bin/delayed_job start +``` + +from RAILS_ENV=local_dev rails console: +``` +resource = StashEngine::Resource.find() +resource.send_to_zenodo +``` + +You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status +or if you want to look at the item on zenodo (it has their id in the table). + +## ActiveJob / Delayed Job Background The Rails framework ActiveJob libraries are meant to address these issues and are a common standard way to address background processing @@ -33,11 +69,7 @@ could run on any server that has access to the database and doesn't even need to run on a UI server and probably wouldn't use most of the application code. -I've made a PR with the configuration and a simple example of using -the queue to write to a file text to a file to see how things happen. - -Some notes & common commands to manually try out delayed job/ActiveJob -on this branch. +Some background info on delayed job ``` https://github.com/collectiveidea/delayed_job @@ -46,72 +78,16 @@ https://github.com/collectiveidea/delayed_job/issues/776 https://github.com/collectiveidea/delayed_job/wiki/Delayed-job-command-details https://guides.rubyonrails.org/v4.2/active_job_basics.html - +# to start and stop locally RAILS_ENV=local_dev bin/delayed_job start RAILS_ENV=local_dev bin/delayed_job stop -n, --number_of_workers=workers - -StashEngine::ZenodoCopyJob.perform_later('my cat has fleas') -StashEngine::ZenodoCopyJob.perform_later('my dog has fleas') -StashEngine::ZenodoCopyJob.perform_later('my rat has fleas') ``` -We will need to add additional states and other thing to our database -for tracking since ActiveJob is really just a work queue and doesn't -automatically track application states. +ActiveJob is really just a work queue and doesn't automatically track application states. We really may want to move our Merritt submissions to use something like this rather than the expansion I made to David's home-baked queueing system which still runs inside the UI server processes and -can have problems if the UI server goes down at a bad time. - - -To test jobs go through and get processed with ActiveJob --------------------------------------------------------- - -``` -RAILS_ENV=local_dev bin/delayed_job start -``` - -from RAILS_ENV=local_dev rails console: -``` -resource = StashEngine::Resource.find() -resource.send_to_zenodo -``` - -You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status -or if you want to look at the item on zenodo (it has their id in the table). - - -To test draining long running jobs before a restart ---------------------------------------------------- - -In the directory above the application root. (We don't want this in -the application root since it gets changed with a new deploy on the -servers.) -``` -touch defer_jobs.txt -``` - -Send another Zenodo job as documented above for testing jobs go -through. - -Wait for it to run and see that it didn't run and status was changed -to deferred in the zenodo_copies.state field. - - -To test sending through again later ------------------------------------ - -remove the defer file -``` -rm defer_jobs.txt -``` - -in rails console -``` -StashEngine::ZenodoCopyJob.enqueue_deferred -``` +can have problems if the UI server goes down at an inopportune time. -Now wait for it to be enqueued and run and check the tables as above -for status to see it go through.