From 9bfc462b671c2c9dddd41748c07aae15db5cc589 Mon Sep 17 00:00:00 2001
From: Scott F <Scott.Fisher@ucop.edu>
Date: Fri, 22 May 2020 14:55:23 -0700
Subject: [PATCH] Updating with info about how it works

---
 .../zenodo_integration/delayed_jobs.md        | 108 +++++++-----------
 1 file changed, 42 insertions(+), 66 deletions(-)
diff --git a/documentation/zenodo_integration/delayed_jobs.md b/documentation/zenodo_integration/delayed_jobs.md
index bfef722ecc..f3e7a73d6f 100644
--- a/documentation/zenodo_integration/delayed_jobs.md
+++ b/documentation/zenodo_integration/delayed_jobs.md
@@ -1,5 +1,41 @@
-ActiveJob / Delayed Job
-=========================
+# Zenodo extra copies
+
+## It's not processing? Why? Look at this stuff
+- (On 2c) Check `~/init.d/delayed_job.dryad status` for status or start the daemon with `~/init.d/delayed_job.dryad start`
+- Delayed jobs has a bare work queue in the table `delayed_jobs`.  Jobs should appear here until processed successfully and then will be deleted on success.
+- The application state of the zenodo jobs is maintained in `stash_engine_zenodo_copies` and also contains important info such as the deposition id (zenodo's internal id)
+- Most errors and stack traces should be saved into the error field in the table above so we don't have to spend hours digging through logs to figure out problems
+
+## We're about to have a maintenance, shutdow or restart, what do I do?
+- (On 2c) "pause" or "drain" the jobs at least a few hours or more ahead so they don't get cut off in the middle of something that takes many hours to process.
+- Check status with `~/init.d/long_jobs.dryad status`
+- Let jobs drain out with `~/init.d/long_jobs.dryad drain`.  This really just touches two files in the `~/app/ui/releases` directory. The files are: `defer_jobs.txt` (zenodo replication) and `hold-submissions.txt` (Merritt submissions).  When these files are present then the internal state of these is put into a `defered` or `rejected-shutting-down` when it's their turn to run.  These states are in their own tables and not the delayed_job work queue because the delayed_job queue is very simple and dumb.
+
+## We just finished maintenance, what do I do?
+- (On 2c) `~/init.d/long_jobs.dryad restart`  .  This takes the jobs that were rejected/deferred and resets their status to 'enqueued' and re-inserts them into the work queue.
+- The deferred job things (zenodo) only runs on one server (2c), but currently the Merritt submission queue runs on both inside the web server processes so it is more complicated.
+- For Merritt queues:
+  - Be sure the `~/app/ui/releases/hold-submissions.txt` files are deleted.
+  - Look at the Merritt queue page in the UI and note which server you're accessing.
+  - For the 'rejected shutting down' jobs showing as the other server (in `stash_engine_repo_queue_states`, manipulate the database to set the latest state's hostname to the server you're on.
+  - Click "Restart submissions which were shut down gracefully".  It should re-enqueue them on your current server.
+
+## Example of manually submitting a 3rd copy from the console
+
+```
+RAILS_ENV=local_dev bin/delayed_job start
+```
+
+from RAILS_ENV=local_dev rails console:
+```
+resource = StashEngine::Resource.find(<id>)
+resource.send_to_zenodo
+```
+
+You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status
+or if you want to look at the item on zenodo (it has their id in the table).
+
+## ActiveJob / Delayed Job Background
 
 The Rails framework ActiveJob libraries are meant to address these
 issues and are a common standard way to address background processing
@@ -33,11 +69,7 @@ could run on any server that has access to the database and doesn't
 even need to run on a UI server and probably wouldn't use most of the
 application code.
 
-I've made a PR with the configuration and a simple example of using
-the queue to write to a file text to a file to see how things happen.
-
-Some notes & common commands to manually try out delayed job/ActiveJob
-on this branch.
+Some background info on delayed job
 
 ```
 https://github.com/collectiveidea/delayed_job
@@ -46,72 +78,16 @@ https://github.com/collectiveidea/delayed_job/issues/776
 https://github.com/collectiveidea/delayed_job/wiki/Delayed-job-command-details
 https://guides.rubyonrails.org/v4.2/active_job_basics.html
 
-
+# to start and stop locally
 RAILS_ENV=local_dev bin/delayed_job start
 RAILS_ENV=local_dev bin/delayed_job stop
 -n, --number_of_workers=workers
-
-StashEngine::ZenodoCopyJob.perform_later('my cat has fleas')
-StashEngine::ZenodoCopyJob.perform_later('my dog has fleas')
-StashEngine::ZenodoCopyJob.perform_later('my rat has fleas')
 ```
 
-We will need to add additional states and other thing to our database
-for tracking since ActiveJob is really just a work queue and doesn't
-automatically track application states.
+ActiveJob is really just a work queue and doesn't automatically track application states.
 
 We really may want to move our Merritt submissions to use something
 like this rather than the expansion I made to David's home-baked
 queueing system which still runs inside the UI server processes and
-can have problems if the UI server goes down at a bad time.
-
-
-To test jobs go through and get processed with ActiveJob
---------------------------------------------------------
-
-```
-RAILS_ENV=local_dev bin/delayed_job start
-```
-
-from RAILS_ENV=local_dev rails console:
-```
-resource = StashEngine::Resource.find(<id>)
-resource.send_to_zenodo
-```
-
-You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status
-or if you want to look at the item on zenodo (it has their id in the table).
-
-
-To test draining long running jobs before a restart
----------------------------------------------------
-
-In the directory above the application root.  (We don't want this in
-the application root since it gets changed with a new deploy on the
-servers.)
-```
-touch defer_jobs.txt
-```
-
-Send another Zenodo job as documented above for testing jobs go
-through.
-
-Wait for it to run and see that it didn't run and status was changed
-to deferred in the zenodo_copies.state field.
-
-
-To test sending through again later
------------------------------------
-
-remove the defer file
-```
-rm defer_jobs.txt
-```
-
-in rails console
-```
-StashEngine::ZenodoCopyJob.enqueue_deferred
-```
+can have problems if the UI server goes down at an inopportune time.
 
-Now wait for it to be enqueued and run and check the tables as above
-for status to see it go through.