diff --git a/documentation/zenodo_integration/delayed_jobs.md b/documentation/zenodo_integration/delayed_jobs.md index bfef722ecc..f3e7a73d6f 100644 --- a/documentation/zenodo_integration/delayed_jobs.md +++ b/documentation/zenodo_integration/delayed_jobs.md @@ -1,5 +1,41 @@ -ActiveJob / Delayed Job -========================= +# Zenodo extra copies + +## It's not processing? Why? Look at this stuff +- (On 2c) Check `~/init.d/delayed_job.dryad status` for status or start the daemon with `~/init.d/delayed_job.dryad start` +- Delayed jobs has a bare work queue in the table `delayed_jobs`. Jobs should appear here until processed successfully and then will be deleted on success. +- The application state of the zenodo jobs is maintained in `stash_engine_zenodo_copies` and also contains important info such as the deposition id (zenodo's internal id) +- Most errors and stack traces should be saved into the error field in the table above so we don't have to spend hours digging through logs to figure out problems + +## We're about to have a maintenance, shutdow or restart, what do I do? +- (On 2c) "pause" or "drain" the jobs at least a few hours or more ahead so they don't get cut off in the middle of something that takes many hours to process. +- Check status with `~/init.d/long_jobs.dryad status` +- Let jobs drain out with `~/init.d/long_jobs.dryad drain`. This really just touches two files in the `~/app/ui/releases` directory. The files are: `defer_jobs.txt` (zenodo replication) and `hold-submissions.txt` (Merritt submissions). When these files are present then the internal state of these is put into a `defered` or `rejected-shutting-down` when it's their turn to run. These states are in their own tables and not the delayed_job work queue because the delayed_job queue is very simple and dumb. + +## We just finished maintenance, what do I do? +- (On 2c) `~/init.d/long_jobs.dryad restart` . This takes the jobs that were rejected/deferred and resets their status to 'enqueued' and re-inserts them into the work queue. +- The deferred job things (zenodo) only runs on one server (2c), but currently the Merritt submission queue runs on both inside the web server processes so it is more complicated. +- For Merritt queues: + - Be sure the `~/app/ui/releases/hold-submissions.txt` files are deleted. + - Look at the Merritt queue page in the UI and note which server you're accessing. + - For the 'rejected shutting down' jobs showing as the other server (in `stash_engine_repo_queue_states`, manipulate the database to set the latest state's hostname to the server you're on. + - Click "Restart submissions which were shut down gracefully". It should re-enqueue them on your current server. + +## Example of manually submitting a 3rd copy from the console + +``` +RAILS_ENV=local_dev bin/delayed_job start +``` + +from RAILS_ENV=local_dev rails console: +``` +resource = StashEngine::Resource.find() +resource.send_to_zenodo +``` + +You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status +or if you want to look at the item on zenodo (it has their id in the table). + +## ActiveJob / Delayed Job Background The Rails framework ActiveJob libraries are meant to address these issues and are a common standard way to address background processing @@ -33,11 +69,7 @@ could run on any server that has access to the database and doesn't even need to run on a UI server and probably wouldn't use most of the application code. -I've made a PR with the configuration and a simple example of using -the queue to write to a file text to a file to see how things happen. - -Some notes & common commands to manually try out delayed job/ActiveJob -on this branch. +Some background info on delayed job ``` https://github.com/collectiveidea/delayed_job @@ -46,72 +78,16 @@ https://github.com/collectiveidea/delayed_job/issues/776 https://github.com/collectiveidea/delayed_job/wiki/Delayed-job-command-details https://guides.rubyonrails.org/v4.2/active_job_basics.html - +# to start and stop locally RAILS_ENV=local_dev bin/delayed_job start RAILS_ENV=local_dev bin/delayed_job stop -n, --number_of_workers=workers - -StashEngine::ZenodoCopyJob.perform_later('my cat has fleas') -StashEngine::ZenodoCopyJob.perform_later('my dog has fleas') -StashEngine::ZenodoCopyJob.perform_later('my rat has fleas') ``` -We will need to add additional states and other thing to our database -for tracking since ActiveJob is really just a work queue and doesn't -automatically track application states. +ActiveJob is really just a work queue and doesn't automatically track application states. We really may want to move our Merritt submissions to use something like this rather than the expansion I made to David's home-baked queueing system which still runs inside the UI server processes and -can have problems if the UI server goes down at a bad time. - - -To test jobs go through and get processed with ActiveJob --------------------------------------------------------- - -``` -RAILS_ENV=local_dev bin/delayed_job start -``` - -from RAILS_ENV=local_dev rails console: -``` -resource = StashEngine::Resource.find() -resource.send_to_zenodo -``` - -You can now check the stash_engine_zenodo_copies and delayed_jobs tables for status -or if you want to look at the item on zenodo (it has their id in the table). - - -To test draining long running jobs before a restart ---------------------------------------------------- - -In the directory above the application root. (We don't want this in -the application root since it gets changed with a new deploy on the -servers.) -``` -touch defer_jobs.txt -``` - -Send another Zenodo job as documented above for testing jobs go -through. - -Wait for it to run and see that it didn't run and status was changed -to deferred in the zenodo_copies.state field. - - -To test sending through again later ------------------------------------ - -remove the defer file -``` -rm defer_jobs.txt -``` - -in rails console -``` -StashEngine::ZenodoCopyJob.enqueue_deferred -``` +can have problems if the UI server goes down at an inopportune time. -Now wait for it to be enqueued and run and check the tables as above -for status to see it go through. diff --git a/spec/lib/stash/zenodo_replicate/metadata_generator_spec.rb b/spec/lib/stash/zenodo_replicate/metadata_generator_spec.rb index e5b55a3b25..7bb0c885c0 100644 --- a/spec/lib/stash/zenodo_replicate/metadata_generator_spec.rb +++ b/spec/lib/stash/zenodo_replicate/metadata_generator_spec.rb @@ -96,9 +96,8 @@ module ZenodoReplicate expect(@mg.notes.scan(/Award Number/).count).to eq(1) end - it 'has related_identifiers output for itself' do - expect(@mg.related_identifiers).to eq([{ relation: 'isIdenticalTo', - identifier: "https://doi.org/#{@resource.identifier.identifier}" }]) + it 'sets an item to the community' do + expect(@mg.communities).to eq([{ identifier: APP_CONFIG.zenodo.community_id }]) end it 'has method output' do diff --git a/stash/stash_engine/app/models/stash_engine/curation_activity.rb b/stash/stash_engine/app/models/stash_engine/curation_activity.rb index f04bf71d41..578c69e31d 100644 --- a/stash/stash_engine/app/models/stash_engine/curation_activity.rb +++ b/stash/stash_engine/app/models/stash_engine/curation_activity.rb @@ -156,7 +156,7 @@ def update_solr end def copy_to_zenodo - # resource.send_to_zenodo + resource.send_to_zenodo end def remove_peer_review diff --git a/stash/stash_engine/lib/stash/zenodo_replicate/metadata_generator.rb b/stash/stash_engine/lib/stash/zenodo_replicate/metadata_generator.rb index 56996ca153..0613f25dcc 100644 --- a/stash/stash_engine/lib/stash/zenodo_replicate/metadata_generator.rb +++ b/stash/stash_engine/lib/stash/zenodo_replicate/metadata_generator.rb @@ -15,7 +15,7 @@ def initialize(resource:, use_zenodo_doi: false) def metadata out_hash = {}.with_indifferent_access %i[doi upload_type publication_date title creators description access_right license - keywords notes related_identifiers method locations].each do |meth| + keywords notes related_identifiers method locations communities].each do |meth| next if meth == 'doi' && @use_zenodo_doi result = send(meth) out_hash[meth] = result unless result.blank? @@ -87,11 +87,7 @@ def related_identifiers end related ||= [] - - # if DOi is different - related.push( - relation: 'isIdenticalTo', identifier: "https://doi.org/#{@resource.identifier.identifier}" - ) + related end def method @@ -104,6 +100,10 @@ def locations end.compact end + def communities + [{ identifier: APP_CONFIG.zenodo.community_id }] + end + def location(geolocation) # no way to represent boxes in zenodo? return nil if geolocation.place_id.nil? && geolocation.point_id.nil?