Skip to content

Commit

Permalink
Updating with info about how it works
Browse files Browse the repository at this point in the history
  • Loading branch information
sfisher authored May 22, 2020
1 parent 024228a commit 9bfc462
Showing 1 changed file with 42 additions and 66 deletions.
108 changes: 42 additions & 66 deletions documentation/zenodo_integration/delayed_jobs.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,41 @@
ActiveJob / Delayed Job
=========================
# Zenodo extra copies

## It's not processing? Why? Look at this stuff
- (On 2c) Check `~/init.d/delayed_job.dryad status` for status or start the daemon with `~/init.d/delayed_job.dryad start`
- Delayed jobs has a bare work queue in the table `delayed_jobs`. Jobs should appear here until processed successfully and then will be deleted on success.
- The application state of the zenodo jobs is maintained in `stash_engine_zenodo_copies` and also contains important info such as the deposition id (zenodo's internal id)
- Most errors and stack traces should be saved into the error field in the table above so we don't have to spend hours digging through logs to figure out problems

## We're about to have a maintenance, shutdow or restart, what do I do?
- (On 2c) "pause" or "drain" the jobs at least a few hours or more ahead so they don't get cut off in the middle of something that takes many hours to process.
- Check status with `~/init.d/long_jobs.dryad status`
- Let jobs drain out with `~/init.d/long_jobs.dryad drain`. This really just touches two files in the `~/app/ui/releases` directory. The files are: `defer_jobs.txt` (zenodo replication) and `hold-submissions.txt` (Merritt submissions). When these files are present then the internal state of these is put into a `defered` or `rejected-shutting-down` when it's their turn to run. These states are in their own tables and not the delayed_job work queue because the delayed_job queue is very simple and dumb.

## We just finished maintenance, what do I do?
- (On 2c) `~/init.d/long_jobs.dryad restart` . This takes the jobs that were rejected/deferred and resets their status to 'enqueued' and re-inserts them into the work queue.
- The deferred job things (zenodo) only runs on one server (2c), but currently the Merritt submission queue runs on both inside the web server processes so it is more complicated.
- For Merritt queues:
- Be sure the `~/app/ui/releases/hold-submissions.txt` files are deleted.
- Look at the Merritt queue page in the UI and note which server you're accessing.
- For the 'rejected shutting down' jobs showing as the other server (in `stash_engine_repo_queue_states`, manipulate the database to set the latest state's hostname to the server you're on.
- Click "Restart submissions which were shut down gracefully". It should re-enqueue them on your current server.

## Example of manually submitting a 3rd copy from the console

```
RAILS_ENV=local_dev bin/delayed_job start
```

from RAILS_ENV=local_dev rails console:
```
resource = StashEngine::Resource.find(<id>)
resource.send_to_zenodo
```

You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status
or if you want to look at the item on zenodo (it has their id in the table).

## ActiveJob / Delayed Job Background

The Rails framework ActiveJob libraries are meant to address these
issues and are a common standard way to address background processing
Expand Down Expand Up @@ -33,11 +69,7 @@ could run on any server that has access to the database and doesn't
even need to run on a UI server and probably wouldn't use most of the
application code.

I've made a PR with the configuration and a simple example of using
the queue to write to a file text to a file to see how things happen.

Some notes & common commands to manually try out delayed job/ActiveJob
on this branch.
Some background info on delayed job

```
https://github.com/collectiveidea/delayed_job
Expand All @@ -46,72 +78,16 @@ https://github.com/collectiveidea/delayed_job/issues/776
https://github.com/collectiveidea/delayed_job/wiki/Delayed-job-command-details
https://guides.rubyonrails.org/v4.2/active_job_basics.html
# to start and stop locally
RAILS_ENV=local_dev bin/delayed_job start
RAILS_ENV=local_dev bin/delayed_job stop
-n, --number_of_workers=workers
StashEngine::ZenodoCopyJob.perform_later('my cat has fleas')
StashEngine::ZenodoCopyJob.perform_later('my dog has fleas')
StashEngine::ZenodoCopyJob.perform_later('my rat has fleas')
```

We will need to add additional states and other thing to our database
for tracking since ActiveJob is really just a work queue and doesn't
automatically track application states.
ActiveJob is really just a work queue and doesn't automatically track application states.

We really may want to move our Merritt submissions to use something
like this rather than the expansion I made to David's home-baked
queueing system which still runs inside the UI server processes and
can have problems if the UI server goes down at a bad time.


To test jobs go through and get processed with ActiveJob
--------------------------------------------------------

```
RAILS_ENV=local_dev bin/delayed_job start
```

from RAILS_ENV=local_dev rails console:
```
resource = StashEngine::Resource.find(<id>)
resource.send_to_zenodo
```

You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status
or if you want to look at the item on zenodo (it has their id in the table).


To test draining long running jobs before a restart
---------------------------------------------------

In the directory above the application root. (We don't want this in
the application root since it gets changed with a new deploy on the
servers.)
```
touch defer_jobs.txt
```

Send another Zenodo job as documented above for testing jobs go
through.

Wait for it to run and see that it didn't run and status was changed
to deferred in the zenodo_copies.state field.


To test sending through again later
-----------------------------------

remove the defer file
```
rm defer_jobs.txt
```

in rails console
```
StashEngine::ZenodoCopyJob.enqueue_deferred
```
can have problems if the UI server goes down at an inopportune time.

Now wait for it to be enqueued and run and check the tables as above
for status to see it go through.

0 comments on commit 9bfc462

Please sign in to comment.