diff --git a/.gitignore b/.gitignore index 4fac787581..c4c5e500f6 100644 --- a/.gitignore +++ b/.gitignore @@ -43,7 +43,7 @@ ui-library/bower_components ui-library/css/vendor/ ui-library/js/vendor/ stash/stash_engine/ui-library/ -*.css.map +*.css.map # ###################################################### @@ -104,6 +104,7 @@ build-iPhoneSimulator/ # unless supporting rvm < 1.11.0 or doing something fancy, ignore this: .rvmrc +.nvmrc #####=== Rails ===##### diff --git a/documentation/external_services/salesforce.md b/documentation/external_services/salesforce.md index 8a1dbc31de..88d20eb66f 100644 --- a/documentation/external_services/salesforce.md +++ b/documentation/external_services/salesforce.md @@ -80,7 +80,7 @@ journal settings: rails journals:check_salesforce_sync ``` -To clean up metadata in Salesforce associated with journals, add `DRY_RUN=false` +To clean up metadata in Salesforce associated with journals, add ` -- --dry_run false` to the end of the above command. diff --git a/documentation/reports.md b/documentation/reports.md index 58c630e6c5..2d16d0a16c 100644 --- a/documentation/reports.md +++ b/documentation/reports.md @@ -18,7 +18,7 @@ tracking of payments. Run it with a command like: ``` -RAILS_ENV=v3_production bundle exec rails identifiers:shopping_cart_report YEAR_MONTH=2024-01 +RAILS_ENV=v3_production bundle exec rails identifiers:shopping_cart_report -- --year_month 2024-01 ``` Fields in the shopping cart report @@ -37,7 +37,7 @@ To run the report and retrieve the files: ``` # on v3-prod server cd ~/deploy/current -RAILS_ENV=v3_production bundle exec rake identifiers:shopping_cart_report YEAR_MONTH=2024-01 +RAILS_ENV=v3_production bundle exec rake identifiers:shopping_cart_report -- --year_month 2024-01 cp ~/deploy/current/shopping* ~/journal-payments/shoppingcart/ cd ~/journal-payments/shoppingcart git pull @@ -66,7 +66,7 @@ Run the deferred payment reports with a command like: ``` cp ~/journal-payments/shoppingcart/shopping_cart_report_2023* /tmp # This command must be run on the v3-prod server, to access the production database -RAILS_ENV=v3_production bundle exec rails identifiers:deferred_journal_reports SC_REPORT=/tmp/shopping_cart_report_2024-Q1.csv +RAILS_ENV=v3_production bundle exec rails identifiers:deferred_journal_reports -- --sc_report /tmp/shopping_cart_report_2024-Q1.csv ``` ### Tiered payment reports @@ -83,7 +83,8 @@ Run the tiered journal payment reports with a command like: # This command must be run in a personal account with the journal-payments checked out cp ~/journal-payments/shoppingcart/shopping_cart_report_2023* /tmp # This command must be run on the production server, to access the production database -RAILS_ENV=v3_production bundle exec rails identifiers:tiered_journal_reports SC_REPORT=/tmp/shopping_cart_report_2023-Q1.csv BASE_REPORT=/tmp/shopping_cart_report_2023.csv + +RAILS_ENV=v3_production bundle exec rails identifiers:tiered_journal_reports -- --sc_report /tmp/shopping_cart_report_2023-Q1.csv --base_report /tmp/shopping_cart_report_2023.csv ``` For tenant institutions that have a tiered payment plan, a similar secondary task @@ -96,7 +97,8 @@ Run the tiered institution payment reports with a command like: # This command must be run in a personal account with the journal-payments checked out cp ~/journal-payments/shoppingcart/shopping_cart_report_2023* /tmp # This command must be run on the production server, to access the production database -RAILS_ENV=v3_production bundle exec rails identifiers:tiered_tenant_reports SC_REPORT=/tmp/shopping_cart_report_2023-Q1.csv BASE_REPORT=/tmp/shopping_cart_report_2023.csv + +RAILS_ENV=v3_production bundle exec rails identifiers:tiered_tenant_reports -- --sc_report tmp/shopping_cart_report_2023-Q1.csv --base_report /tmp/shopping_cart_report_2023.csv ``` Dataset info report @@ -205,7 +207,7 @@ This also gives institutions when people didn't autocomplete their ROR correctly Run like: ``` -bundle exec rails reports:from_text_institution name="Planck" RAILS_ENV=v3_production +RAILS_ENV=v3_production bundle exec rails reports:from_text_institution -- --name="Planck" ``` Put the string you want to detect in the `name` variable. It shows the matches in `author_affiliations` diff --git a/documentation/ror_transition.md b/documentation/ror_transition.md index b82dd49b55..59b740cf9b 100644 --- a/documentation/ror_transition.md +++ b/documentation/ror_transition.md @@ -12,7 +12,7 @@ Truncate the table to remove all current entries. The table is called `stash_en You can download the latest ROR exports at https://doi.org/10.5281/zenodo.6347574 . ```bash -RAILS_ENV=development bundle exec rails affiliation_import:populate_funder_ror_mapping /path/to/file +RAILS_ENV=development bundle exec rails affiliation_import:populate_funder_ror_mapping -- --path /path/to/file ``` ## task to re-import latest ROR data @@ -75,4 +75,4 @@ Update the DataCite records with updated metadata. ```bash RAILS_ENV=development bundle exec rails datacite_target:update_dryad -``` \ No newline at end of file +``` diff --git a/documentation/server_maintenance/troubleshooting.md b/documentation/server_maintenance/troubleshooting.md index 812f852648..ed63afaf10 100644 --- a/documentation/server_maintenance/troubleshooting.md +++ b/documentation/server_maintenance/troubleshooting.md @@ -267,7 +267,7 @@ embargo date. You can find the deposition_id in the stash_engine_zenodo_copies table. The zenodo_copy_id is the id from that same table. ``` # the arguments are 1) resource_id, 2) deposition_id at zenodo, 3) date, 4) zenodo_copy_id -RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo 97683 4407065 2021-12-31 12342 +RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo -- --resource_id 97683 --deposition_id 4407065 --date 2021-12-31 --zenodo_copy_id 12342 ``` **You must login to Zenodo and "publish" the new version of the dataset; otherwise the embargo @@ -300,7 +300,8 @@ table. The zenodo_copy_id is the `stash_engine_zenodo_copies.id` from that same ``` # the arguments are 1) resource_id, 2) deposition_id at zenodo, 3) date, 4) zenodo_copy_id -RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo 97683 4407065 2023-07-25 1234 +RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo -- --resource_id 97683 --deposition_id 4407065 --date 2023-07-25 --zenodo_copy_id 1234 + ``` **You must login to Zenodo and "publish" the new version of the dataset; otherwise the embargo will not take effect. This is probably something we can fix in the code, but it is waiting for us @@ -386,13 +387,13 @@ it is harder to do after removal. ``` # the parameters are 1) resource_id, 2) deposition_id (see in stash_engine_zenodo_copies), 3) date far in the future -RAILS_ENV=production bundle exec rails dev_ops:embargo_zenodo 2200-12-31 +RAILS_ENV=production bundle exec rails dev_ops:embargo_zenodo -- --resource_id --deposition_id --date --zenodo_copy_id ``` If you need to completely remove a dataset from existence, you can run ``` -rails dev_ops:destroy_dataset 10.27837/dryad.catfood +rails dev_ops:destroy_dataset -- --doi 10.27837/dryad.catfood ``` This command will remove the dataset from Dryad, and give instructions to remove @@ -484,7 +485,7 @@ end To update anything published between a set of dates using a task, you can use: ``` -RAILS_ENV=production bundle exec rails datacite_target:update_by_publication YYYY-MM-DD YYYY-MM-DD +RAILS_ENV=production bundle exec rails datacite_target:update_by_publication -- --start YYYY-MM-DD --end YYYY-MM-DD ``` If you need to update DataCite for *all* items in Dryad, you can use: diff --git a/documentation/sql_queries/fundref_to_ror_comparisons.md b/documentation/sql_queries/fundref_to_ror_comparisons.md index 6c3bcafc96..112701c032 100644 --- a/documentation/sql_queries/fundref_to_ror_comparisons.md +++ b/documentation/sql_queries/fundref_to_ror_comparisons.md @@ -13,7 +13,7 @@ RAILS_ENV=production bundle exec rails affiliation_import:update_ror_orgs Download the zip and extract the json file from https://doi.org/10.5281/zenodo.6347574: ```bash -RAILS_ENV= bundle exec rails affiliation_import:populate_funder_ror_mapping +RAILS_ENV= bundle exec rails affiliation_import:populate_funder_ror_mapping -- --path ``` After the imports you can run the following query to see how items in the database map and that @@ -66,4 +66,4 @@ Items that are funders but don't match a fundref ID: SELECT DISTINCT contributor_name, contributor_type, identifier_type, name_identifier_id FROM dcs_contributors WHERE contributor_type = 'funder' AND name_identifier_id =''; -``` \ No newline at end of file +``` diff --git a/lib/tasks/affiliation_import.rake b/lib/tasks/affiliation_import.rake index 60b632db68..8b43e47ade 100644 --- a/lib/tasks/affiliation_import.rake +++ b/lib/tasks/affiliation_import.rake @@ -22,21 +22,23 @@ namespace :affiliation_import do Stash::Organization::RorUpdater.perform end + # example: RAILS_ENV=development bundle exec rake affiliation_import:process_ror_csv -- --affiliation_mode true desc 'Process all of the CSV files' task process_ror_csv: :environment do start_time = Time.now @dois_to_skip = [] @live_mode = false @last_resource = nil + args = Tasks::ArgsParser.parse(:affiliation_mode) - case ENV.fetch('AFFILIATION_MODE', nil) + case args.affiliation_mode when nil - puts 'Environment variable AFFILIATION_MODE is blank, assuming test mode.' + puts '--affiliation_mode argument is blank, assuming test mode.' when 'live' - puts 'Starting live processing due to environment variable AFFILIATION_MODE.' + puts 'Starting live processing due to --affiliation_mode argument.' @live_mode = true else - puts "Environment variable AFFILIATION_MODE is #{ENV.fetch('AFFILIATION_MODE', nil)}, entering test mode." + puts "--affiliation_mode argument is #{args.affiliation_mode}, entering test mode." end puts 'Loading affiliation info from CSV files in /tmp/dryad_affiliations*' @@ -50,25 +52,28 @@ namespace :affiliation_import do end puts "DONE! Elapsed time: #{Time.at(Time.now - start_time).utc.strftime('%H:%M:%S')}" + exit end + # example: RAILS_ENV=development bundle exec rake affiliation_import:merge_duplicate_authors -- --author_merge_mode true --start 0 desc 'Merge duplicate authors' task merge_duplicate_authors: :environment do start_time = Time.now @live_mode = false + args = Tasks::ArgsParser.parse(:author_merge_mode, :start) - case ENV.fetch('AUTHOR_MERGE_MODE', nil) + case args.author_merge_mode when nil - puts 'Environment variable AUTHOR_MERGE_MODE is blank, assuming test mode.' + puts '--author_merge_mode argument is blank, assuming test mode.' when 'live' - puts 'Starting live processing due to environment variable AUTHOR_MERGE_MODE.' + puts 'Starting live processing due to --author_merge_mode argument.' @live_mode = true else - puts "Environment variable AUTHOR_MERGE_MODE is #{ENV.fetch('AUTHOR_MERGE_MODE', nil)}, entering test mode." + puts "--author_merge_mode argument is #{args.author_merge_mode}, entering test mode." end start_from = 0 - start_from = ENV['START'].to_i unless ENV['START'].blank? + start_from = args.start.to_i unless args.start.blank? stash_ids = StashEngine::Identifier.all.order('stash_engine_identifiers.id').distinct stash_ids.each_with_index do |i, idx| @@ -94,35 +99,40 @@ namespace :affiliation_import do end end puts "DONE! Elapsed time: #{Time.at(Time.now - start_time).utc.strftime('%H:%M:%S')}" + exit end + # example: rake affiliation_import:populate_ror_db -- --path /path/to/json_file desc 'Populate our ROR database manually from the ROR dump json file because the Zenodo API not working' task populate_ror_db: :environment do $stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f - if ARGV.length != 1 + args = Tasks::ArgsParser.parse(:path) + unless args.path puts 'Please enter the path to the ROR dump json file as an argument' puts 'You can get the latest dump from https://doi.org/10.5281/zenodo.6347574 (get json file for last version in zip)' exit end - ror_dump_file = ARGV[0] + ror_dump_file = args.path exit unless File.exist?(ror_dump_file) Stash::Organization::RorUpdater.process_ror_json(json_file_path: ror_dump_file) end + # example: rake affiliation_import:populate_funder_ror_mapping -- --path /path/to/json_file desc 'Populate fundref_id to ror_id mapping table' task populate_funder_ror_mapping: :environment do $stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f - if ARGV.length != 1 + args = Tasks::ArgsParser.parse(:path) + unless args.path puts 'Please enter the path to the ROR dump json file as an argument' puts 'You can get the latest dump from https://doi.org/10.5281/zenodo.6347574 (get json file for last version in zip)' exit end - ror_dump_file = ARGV[0] + ror_dump_file = args.path exit unless File.exist?(ror_dump_file) ActiveRecord::Base.connection.truncate(StashEngine::XrefFunderToRor.table_name) diff --git a/lib/tasks/args_parser.rb b/lib/tasks/args_parser.rb new file mode 100644 index 0000000000..2417e4d6e2 --- /dev/null +++ b/lib/tasks/args_parser.rb @@ -0,0 +1,23 @@ +require 'optparse' +# rubocop:disable Lint/EmptyBlock +module Tasks + module ArgsParser + + def self.parse(*attributes) + options = OpenStruct.new + return options if attributes.blank? + + opts = OptionParser.new + opts.banner = 'Usage: rake add [options]' + attributes.each do |key| + opts.on('-o', "--#{key}=value", String) { |value| options[key] = value } + end + + args = opts.order!(ARGV) {} + opts.parse!(args) + + options + end + end +end +# rubocop:enable Lint/EmptyBlock diff --git a/lib/tasks/compressed.rake b/lib/tasks/compressed.rake index 99e0219c02..73f6686f4c 100644 --- a/lib/tasks/compressed.rake +++ b/lib/tasks/compressed.rake @@ -1,5 +1,7 @@ # :nocov: + namespace :compressed do + # example: rails/rake compressed:update_contents task update_contents: :environment do $stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f @@ -52,15 +54,17 @@ namespace :compressed do end # a simplified version of the above task that only updates one resource for testing and doesn't catch errors + # example: rails/rake compressed:update_one -- --file_id 10 task update_one: :environment do $stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f + args = Tasks::ArgsParser.parse(:file_id) - if ARGV.length != 1 + unless args.file_id puts 'Please enter the file id as the only argument to this task' exit end - db_file = StashEngine::DataFile.find(ARGV[0]) + db_file = StashEngine::DataFile.find(args.file_id) puts "Updating container_contents for #{db_file.upload_file_name} (id: #{db_file.id}, " \ "resource_id: #{db_file.resource_id})" diff --git a/lib/tasks/counter.rake b/lib/tasks/counter.rake index b07069f9c3..ba4a07ff62 100644 --- a/lib/tasks/counter.rake +++ b/lib/tasks/counter.rake @@ -2,33 +2,40 @@ require 'net/scp' require_relative 'counter/validate_file' require_relative 'counter/log_combiner' require_relative 'counter/json_stats' -namespace :counter do +namespace :counter do + # example: RAILS_ENV=development bundle exec rake counter:combine_files -- --log_directory /user/me/dir --scp_hosts host1,host2 desc 'get and combine files from the other servers' task :combine_files do - lc = Tasks::Counter::LogCombiner.new(log_directory: ENV.fetch('LOG_DIRECTORY', nil), scp_hosts: ENV['SCP_HOSTS'].split, - scp_path: ENV.fetch('LOG_DIRECTORY', nil)) + args = Tasks::ArgsParser.parse(:log_directory, :scp_hosts) + lc = Tasks::Counter::LogCombiner.new(log_directory: args.log_directory, scp_hosts: args.scp_hosts.to_s.split(','), + scp_path: args.log_directory) lc.copy_missing_files lc.combine_logs + exit end + # example: RAILS_ENV=development bundle exec rake counter:remove_old_logs -- --log_directory /user/me/dir --scp_hosts host1,host2 desc 'remove log files we are not keeping because of our privacy policy' task :remove_old_logs do - lc = Tasks::Counter::LogCombiner.new(log_directory: ENV.fetch('LOG_DIRECTORY', nil), scp_hosts: ENV['SCP_HOSTS'].split, - scp_path: ENV.fetch('LOG_DIRECTORY', nil)) + args = Tasks::ArgsParser.parse(:log_directory, :scp_hosts) + lc = Tasks::Counter::LogCombiner.new(log_directory: args.log_directory, scp_hosts: args.scp_hosts.to_s.split(','), + scp_path: args.log_directory) lc.remove_old_logs(days_old: 60) lc.remove_old_logs_remote(days_old: 60) + exit end + # example: rails/rake counter:validate_logs -- --files file_name_1,file_name_2 desc 'validate counter logs format (filenames come after rake task)' task :validate_logs do - if ARGV.length == 1 - puts 'Please enter the filenames of files to validate, separated by spaces' + args = Tasks::ArgsParser.parse(:files) + unless args.files + puts 'Please enter the filenames of files to validate, separated by comma' exit end - ARGV.each do |filename| - next if filename == 'counter:validate_logs' + args.files.split(',').each do |filename| puts "Validating #{filename}" cv = Tasks::Counter::ValidateFile.new(filename: filename) cv.validate_file @@ -37,19 +44,21 @@ namespace :counter do exit # makes the arguments not be interpreted as other rake tasks end # end of task - # example: JSON_DIRECTORY="/user/me/json-reports" RAILS_ENV=production bundle exec rake counter:cop_manual + # example: RAILS_ENV=production bundle exec rake counter:cop_manual -- --json_directory /user/me/json-reports desc 'manually populate CoP stats from json files' task cop_manual: :environment do # this keeps the output from buffering forever until a chunk fills so that output is timely $stdout.sync = true - puts "JSON_DIRECTORY is #{ENV.fetch('JSON_DIRECTORY', nil)}" + args = Tasks::ArgsParser.parse(:json_directory) + puts "JSON_DIRECTORY is #{args.json_directory}" js = Tasks::Counter::JsonStats.new - Dir.glob(File.join(ENV.fetch('JSON_DIRECTORY', nil), '????-??.json')).each do |f| + Dir.glob(File.join(args.json_directory, '????-??.json')).each do |f| puts f js.update_stats(f) end js.update_database + exit end desc 'pre-populate our COUNTER CoP stats from datacite hub' @@ -95,39 +104,45 @@ namespace :counter do end end + # example: RAILS_ENV=development bundle exec rake counter:test_env -- --log_directory /user/me/dir --scp_hosts host1,host2 desc 'test that environment is passed in' task :test_env do - puts "LOG_DIRECTORY is set as #{ENV['LOG_DIRECTORY']}" if ENV['LOG_DIRECTORY'] - puts "SCP_HOSTS are set as #{ENV['SCP_HOSTS'].split}" if ENV['SCP_HOSTS'] + args = Tasks::ArgsParser.parse(:log_directory, :scp_hosts) + + puts "LOG_DIRECTORY is set as #{args.log_directory}" if args.log_directory + puts "SCP_HOSTS are set as #{args.scp_hosts.split(',')}" if args.scp_hosts puts "note: in order to scp, you must add this server's public key to the authorized keys for the server you want to copy from" + exit end + # example: RAILS_ENV=development bundle exec rake counter:datacite_pusher -- --report_dir /user/me/dir --report_ids true desc 'look for missing reports and force send them to datacite' task datacite_pusher: :environment do # something like this will get a list of reports that have been sent to DataCite and their IDs - # RAILS_ENV=production REPORT_DIR="/my/report/dir" REPORT_IDS=true bundle exec rails counter:datacite_pusher + # RAILS_ENV=production bundle exec rails counter:datacite_pusher --report_dir /my/report/dir --report_ids true # # for typical monthly run of submitting missing and forcing last month - # RAILS_ENV=production REPORT_DIR="/my/report/dir" FORCE_SUBMISSION="2021-11" bundle exec rails counter:datacite_pusher + # RAILS_ENV=production bundle exec rails counter:datacite_pusher --report_dir /my/report/dir --force_submission 2021-11 $stdout.sync = true require_relative '../../script/stash/counter-uploader/submitted_reports' require_relative '../../script/stash/counter-uploader/uploader' require_relative '../../script/stash/counter-uploader/utility_methods' + args = Tasks::ArgsParser.parse(:report_dir, :force_submission, :report_ids) - if ENV['REPORT_DIR'].blank? + if args.report_dir.blank? puts 'You must set an environment variable for REPORT_DIR to upload to DataCite.' puts 'Optional environment variables:' - puts "\tREPORT_IDS -- if set, only reports the yyyy-mm and ids that have been sent to DataCite." - puts "\tFORCE_SUBMISSION may be set with a comma separated list of yyyy-mm values and those reports" + puts "\t--report_ids -- if set, only reports the yyyy-mm and ids that have been sent to DataCite." + puts "\t--force_submission may be set with a comma separated list of yyyy-mm values and those reports" puts "\twill be sent again, even if they appear to have already been submitted successfully." next # this is like return but from a rake task end # setup variables needed - report_directory = ENV.fetch('REPORT_DIR', nil) - # if ENV['REPORT_IDS'] is set then just report the IDs for our reports - # if ENV['FORCE_SUBMISSION'] is set with comma separated yyyy-mm values then those reports will be + report_directory = args.report_dir + # if --report_dir is set then just report the IDs for our reports + # if --force_submission is set with comma separated yyyy-mm values then those reports will be # submitted again, even if they already appear to have been submitted force_list = UtilityMethods.force_submission_list @@ -137,9 +152,9 @@ namespace :counter do submitted_reports.process_reports # display submitted report info and exit if that option was chosen - if ENV['REPORT_IDS'] + if args.report_ids UtilityMethods.output_report_table(submitted_reports) - next # ie exit from rake task + exit # ie exit from rake task end # get the json files we have non-zero reports for and are in the correct filename format diff --git a/lib/tasks/datacite_target.rake b/lib/tasks/datacite_target.rake index 0ecdc955ea..1e980da801 100644 --- a/lib/tasks/datacite_target.rake +++ b/lib/tasks/datacite_target.rake @@ -15,14 +15,18 @@ namespace :datacite_target do end end + # example: rails datacite_target:update_by_publication -- --start 2024-06-25 --end 2024-06-26 desc 'update Dryad DOI targets for a specific date range of publication' task update_by_publication: :environment do $stdout.sync = true - unless ARGV.length == 3 + options = Tasks::ArgsParser.parse(%i[start end]) + + if !options[:start] || !options[:end] puts 'Takes 2 dates in format YYYY-MM-DD to create a range for DOI updates' - next + exit end - stash_ids = Tasks::DashUpdater.dated_items_to_update(ARGV[1].to_s, ARGV[2].to_s) + + stash_ids = Tasks::DashUpdater.dated_items_to_update(options[:start].to_s, options[:end].to_s) stash_ids.each_with_index do |stash_id, idx| puts "#{idx + 1}/#{stash_ids.length}: updating #{stash_id.identifier}" begin @@ -33,16 +37,19 @@ namespace :datacite_target do end sleep 1 end + exit end # this will go through the items in the same order, so if it crashes at a point it can be restarted from that item again # saves errors to a separate errors.txt file so we can handle these separately/manually assuming there are only a few + # example: rails datacite_target:update_dryad -- --start 10 desc 'update Dryad DOI targets to reflect new environment' task update_dryad: :environment do $stdout.sync = true + options = Tasks::ArgsParser.parse([:start]) start_from = 0 - start_from = ARGV[1].to_i unless ARGV[1].blank? + start_from = options[:start].to_i if options[:start] stash_ids = Tasks::DashUpdater.all_items_to_update @@ -59,6 +66,7 @@ namespace :datacite_target do end sleep 1 end + exit end end # :nocov: diff --git a/lib/tasks/dev_ops.rake b/lib/tasks/dev_ops.rake index 344c0dece6..16cb85b4ea 100644 --- a/lib/tasks/dev_ops.rake +++ b/lib/tasks/dev_ops.rake @@ -10,7 +10,7 @@ require 'fileutils' # rubocop:disable Metrics/BlockLength namespace :dev_ops do - # use like: bundle exec rake dev_ops:processing RAILS_ENV=development + # example: RAILS_ENV=development bundle exec rake dev_ops:processing desc 'Shows processing submissions' task processing: :environment do unless ENV['RAILS_ENV'] @@ -160,53 +160,52 @@ namespace :dev_ops do # changed to point to dryad, and 2) this script needs to be run against the text file # provided by David Loy in order to update the ARKs in the sword URLs so that downloads # and further version submissions work. + # example: RAILS_ENV="development" bundle exec rake dev_ops:download_uri -- --path /path/to/file.txt desc 'Updates database for Merritt ark changes' task download_uri: :environment do - # example command - # RAILS_ENV="development" bundle exec rake dev_ops:download_uri /path/to/file.txt unless ENV['RAILS_ENV'] puts 'RAILS_ENV must be explicitly set before running this script' - next + exit end + args = Tasks::ArgsParser.parse(:path) - unless ARGV.length == 2 + unless args.path puts 'Please put the path to the file to process' - next + exit end - Tasks::DevOps::DownloadUri.update_from_file(file_path: ARGV[1]) + Tasks::DevOps::DownloadUri.update_from_file(file_path: args.path) puts 'Done' + exit end + # example: RAILS_ENV="development" bundle exec rake dev_ops:version_into_new_dataset -- --doi string --user_id 10 --tenant_id 20 desc 'Takes a DOI, user_id (number), tenant_id and copies the latest submitted version into a new dataset for manual submission' task version_into_new_dataset: :environment do - # apparently I have to do this, at least in some cases because arguments to rake are ugly - # https://www.seancdavis.com/blog/4-ways-to-pass-arguments-to-a-rake-task/ - - ARGV.each { |a| task(a.to_sym {}) } # see comment above unless ENV['RAILS_ENV'] puts 'RAILS_ENV must be explicitly set before running this script' - next + exit end + args = Tasks::ArgsParser.parse(:doi, :user_id, :tenant_id) - unless ARGV.length == 4 + if !args.doi || !args.user_id || !args.tenant_id puts 'takes DOI, user_id (number from db), tenant_id -- please quote the DOI and do only bare DOI like 10.18737/D7CC8B' - next + exit end - identif_str = ARGV[1].strip - user_id = ARGV[2].strip.to_i - tenant_id = ARGV[3].strip - # get the identifier - dryad_id_obj = StashEngine::Identifier.where(identifier: identif_str).first + dryad_id_obj = StashEngine::Identifier.where(identifier: args.doi).first + unless dryad_id_obj + puts 'Invalid DOI' + exit + end # get the the last resource last_res = dryad_id_obj.resources.submitted_only.last # duplicate the resource new_res = last_res.amoeba_dup - new_res.tenant_id = tenant_id + new_res.tenant_id = args.tenant_id new_res.identifier_id = nil new_res.save @@ -217,7 +216,7 @@ namespace :dev_ops do db_id_obj = StashEngine::Identifier.create(identifier: id_text, identifier_type: id_type.upcase) # cleanup some old garbage from merritt-sword and reset user - new_res.update(identifier_id: db_id_obj.id, user_id: user_id, current_editor_id: user_id, download_uri: nil, update_uri: nil) + new_res.update(identifier_id: db_id_obj.id, user_id: args.user_id, current_editor_id: args.user_id, download_uri: nil, update_uri: nil) # update the versions to be version 1, since otherwise it will be version number from old resource new_res.stash_version.update(version: 1, merritt_version: 1) @@ -230,6 +229,7 @@ namespace :dev_ops do # delete any file records for deleted items new_res.data_files.deleted_from_version.each(&:destroy!) + exit end # We have a lot of junk identifiers without files that actually work since metadata was imported for testing without @@ -268,23 +268,22 @@ namespace :dev_ops do end end + # example: RAILS_ENV="development" bundle exec rake dev_ops:destroy_dataset -- --doi 20.18737/D7CC8B desc 'Takes a DOI (bare, without doi on front) and destroys it' task destroy_dataset: :environment do - # apparently I have to do this, at least in some cases because arguments to rake are ugly - # https://www.seancdavis.com/blog/4-ways-to-pass-arguments-to-a-rake-task/ + args = Tasks::ArgsParser.parse(:doi) - ARGV.each { |a| task(a.to_sym {}) } # see comment above unless ENV['RAILS_ENV'] puts 'RAILS_ENV must be explicitly set before running this script' - next + exit end - unless ARGV.length == 2 + unless args.doi puts 'Takes a DOI (bare, without doi on front) and destroys it like 10.18737/D7CC8B' - next + exit end - identif_str = ARGV[1].strip + identif_str = args.doi puts "Are you sure you want to delete #{identif_str}? (Type 'yes' to proceed)" response = $stdin.gets @@ -329,63 +328,59 @@ namespace :dev_ops do identifier.destroy! end + # example: RAILS_ENV="development" bundle exec rake dev_ops:embargo_zenodo -- --resource_id 5 --deposition_id 10 / + # --date 2024-06-06 --zenodo_copy_id 30 desc 'Updates database for Merritt ark changes' task embargo_zenodo: :environment do - # apparently I have to do this, at least in some cases because arguments to rake are ugly - # https://www.seancdavis.com/blog/4-ways-to-pass-arguments-to-a-rake-task/ - - ARGV.each { |a| task(a.to_sym {}) } # see comment above unless ENV['RAILS_ENV'] puts 'RAILS_ENV must be explicitly set before running this script' - next + exit end - unless ARGV.length == 5 - puts 'Add the following arguments after the rake command ' + args = Tasks::ArgsParser.parse(:resource_id, :deposition_id, :date, :zenodo_copy_id) + if !args.resource_id || !args.deposition_id || !args.date || !args.zenodo_copy_id + puts 'Add the following arguments after the rake command --resource_id 5 --deposition_id 10 --date 2024-06-06 --zenodo_copy_id 30' puts 'The deposition id can be found in the stash_engine_zenodo_copies table' - next + exit end - res_id = ARGV[1].to_s - dep_id = ARGV[2].to_s - emb_date = ARGV[3].to_s - zc_id = ARGV[4].to_s - require 'stash/zenodo_replicate/deposit' - res = StashEngine::Resource.find(res_id) + res = StashEngine::Resource.find(args.resource_id) - dep = Stash::ZenodoReplicate::Deposit.new(resource: res, zc_id: zc_id) + dep = Stash::ZenodoReplicate::Deposit.new(resource: res, zc_id: args.zenodo_copy_id) - resp = dep.get_by_deposition(deposition_id: dep_id) + resp = dep.get_by_deposition(deposition_id: args.deposition_id) meta = resp['metadata'] meta['access_right'] = 'embargoed' - meta['embargo_date'] = emb_date + meta['embargo_date'] = args.date dep.reopen_for_editing dep.update_metadata(manual_metadata: meta) dep.publish + exit end # NOTE: this only downloads the newly uploaded to S3 files since those are the only ones to exist there. # The rest that have been previously uploaded are in s#. # # This creates a directory in the Rails.root named after the resource id and downloads the files into that from S3 + # # example: RAILS_ENV="development" bundle exec rake dev_ops:download_s3 -- --resource_id 5 desc 'Download the files someone uploaded to S3, should take one argument which is the resource id' task download_s3: :environment do - ARGV.each { |a| task(a.to_sym {}) } # see comment above - resource_id = ARGV[1].to_i + args = Tasks::ArgsParser.parse(:resource_id) + resource_id = args.resource_id unless ENV['RAILS_ENV'] puts 'RAILS_ENV must be explicitly set before running this script' next end - unless ARGV.length == 2 - puts 'Add the following arguments after the rake command ' + unless resource_id + puts 'Add the following arguments after the rake command --resource_id' next end diff --git a/lib/tasks/ezid_transition.rake b/lib/tasks/ezid_transition.rake index ff2b348e8b..9875f7a49c 100644 --- a/lib/tasks/ezid_transition.rake +++ b/lib/tasks/ezid_transition.rake @@ -58,24 +58,26 @@ namespace :ezid_transition do puts "file written to #{filename}" end + # example: RAILS_ENV=production bundle exec rails ezid_transition:registered -- --doi_file /path/to/file desc 'Updates from list of DOIs to change reserved to registered in EZID' task registered: :environment do $stdout.sync = true + args = Tasks::ArgsParser.parse(:doi_file) - unless ENV['RAILS_ENV'] && ENV['DOI_FILE'] + if ENV['RAILS_ENV'].blank? || args.doi_file.blank? puts 'RAILS_ENV must be explicitly set before running this script (such as production)' - puts 'Also set environment variable DOI_FILE to the file that contains the DOIs to update with placeholder data.' + puts 'Also set --doi_file argument to the file that contains the DOIs to update with placeholder data.' puts 'These should be EZID DOIs to pre-populate with data so they are more than reserved and can be moved to DataCite.' puts '' puts 'Example:' - puts 'RAILS_ENV=development DOI_FILE="spec/fixtures/ezid_doi_examples.txt" bundle exec rails ezid_transition:registered' + puts 'RAILS_ENV=development bundle exec rails ezid_transition:registered -- --doi_file spec/fixtures/ezid_doi_examples.txt' puts '' puts 'The file with lists of DOIs should be one per line and be bare DOIs like 10.5072/FK2HT2SM2K.' puts 'You can check DOIs at urls like https://ezid.cdlib.org/id/doi:10.5072/FK2HT2SM2K' - next + exit end - File.foreach(ENV.fetch('DOI_FILE', nil)).with_index do |doi, idx| + File.foreach(args.doi_file).with_index do |doi, idx| doi.strip! next if doi.blank? @@ -85,6 +87,7 @@ namespace :ezid_transition do sleep 1 end puts 'Done' + exit end end # :nocov: diff --git a/lib/tasks/keywords.rake b/lib/tasks/keywords.rake index ccb552c9c2..9cef97a24f 100644 --- a/lib/tasks/keywords.rake +++ b/lib/tasks/keywords.rake @@ -1,11 +1,13 @@ # :nocov: namespace :keywords do + # example: RAILS_ENV=production bundle exec rails keywords:update_plos -- --plos_path /path/to/file task update_plos: :environment do $stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f + args = Tasks::ArgsParser.parse(:plos_path) - if ENV['PLOS_PATH'].blank? || ENV['RAILS_ENV'].blank? - puts 'Please enter the path to the PLoS keywords as the PLOS_PATH environment variable' - puts 'For example: PLOS_PATH="/my/path/to/plosthes.2020-1.full.tsv"' + if args.plos_path.blank? || ENV['RAILS_ENV'].blank? + puts 'Please enter the path to the PLoS keywords as the --plos_path argument' + puts 'For example: --plos_path "/my/path/to/plosthes.2020-1.full.tsv"' puts '' puts 'You can get the Excel files from https://github.com/PLOS/plos-thesaurus.' puts 'then open in a program that can convert to tab separated values such as Google docs.' @@ -23,7 +25,7 @@ namespace :keywords do # without silencing this, all I saw was ActiveRecord SQL logging and it was hard to see the progress Rails.logger.silence do - plos = Tasks::Keywords::Plos.new(fn: ENV.fetch('PLOS_PATH', nil)) + plos = Tasks::Keywords::Plos.new(fn: args.plos_path) plos.populate end diff --git a/lib/tasks/related_identifiers.rake b/lib/tasks/related_identifiers.rake index d1038ff4ce..93da007d3c 100644 --- a/lib/tasks/related_identifiers.rake +++ b/lib/tasks/related_identifiers.rake @@ -4,6 +4,7 @@ require 'csv' namespace :related_identifiers do + # example: rake related_identifiers:fix_common_doi_problems desc 'update all the DOIs I can into correct format (in separate field)' task fix_common_doi_problems: :environment do Tasks::RelatedIdentifiers::Replacements.update_doi_prefix @@ -17,18 +18,20 @@ namespace :related_identifiers do end # not sure we'll ever see this format again, a one-off spreadsheet from Ted + # example: RAILS_ENV=development bundle exec rake related_identifiers:ted_preprint_csv -- --path /path/to/csv_file desc 'An ephemeral csv from Ted with our doi, preprint doi and primary article doi' task ted_preprint_csv: :environment do unless ENV['RAILS_ENV'] puts 'RAILS_ENV must be explicitly set before running this script' - next + exit end - unless ARGV.length == 2 + args = Tasks::ArgsParser.parse(:path) + unless args.path puts 'Please put the path to the file to process' next end - rows = CSV.read(ARGV[1]) + rows = CSV.read(args.path) rows.each do |row| stash_id = StashEngine::Identifier.where(identifier: row[0]).first @@ -38,6 +41,7 @@ namespace :related_identifiers do StashDatacite::RelatedIdentifier.upsert_simple_relation(doi: row[2], resource_id: res.id, work_type: 'primary_article') end puts 'done' + exit end end # :nocov: diff --git a/lib/tasks/reports.rake b/lib/tasks/reports.rake index a2c175813e..598ceec8b8 100644 --- a/lib/tasks/reports.rake +++ b/lib/tasks/reports.rake @@ -2,33 +2,39 @@ require_relative 'reports/ror_author_datasets' require_relative 'reports/institution_datasets' namespace :reports do - # use like: bundle exec rake reports:ror_author_submitted tenant=ucop RAILS_ENV=production + # example: RAILS_ENV=production bundle exec rake reports:ror_author_submitted -- --tenant ucop # Not using Rake standard way to do arguments because it's ridiculous desc 'Shows information about datasets and authors for an institution via ror IDs defined in the tenant' task ror_author_submitted: :environment do - unless ENV['RAILS_ENV'] && ENV['tenant'] + args = Tasks::ArgsParser.parse(:tenant) + + unless ENV['RAILS_ENV'] && args.tenant puts 'RAILS_ENV and tenant bash variables must be explicitly set before running this script' - puts 'example: bundle exec rake reports:ror_author_submitted tenant=ucop RAILS_ENV=production' + puts 'example: RAILS_ENV=production bundle exec rake reports:ror_author_submitted -- --tenant ucop' next end - Tasks::Reports::RorAuthorDatasets.submitted_report(tenant: ENV['tenant'].strip) + Tasks::Reports::RorAuthorDatasets.submitted_report(tenant: args.tenant.strip) + exit end + # example: RAILS_ENV=production bundle exec rails reports:from_text_institution -- --name "Max Planck" # gets all from an institution by author or contributor (funder) task from_text_institution: :environment do - unless ENV['RAILS_ENV'] && ENV['name'] + args = Tasks::ArgsParser.parse(:name) + pp args.name + unless ENV['RAILS_ENV'] && args.name puts 'RAILS_ENV and name bash variables must be explicitly set before running this script' - puts 'example: bundle exec rails reports:from_text_institution name="Max Planck" RAILS_ENV=production' + puts 'example: RAILS_ENV=production bundle exec rails reports:from_text_institution -- --name "Max Planck"' next end - puts "Creating dataset report for items with author or contributor affiliation like \"#{ENV.fetch('name', nil)}\"" - Tasks::Reports::InstitutionDatasets.datasets_by_name(name: ENV.fetch('name', nil)) - puts "Done, see #{ENV.fetch('name', nil)}-#{Time.now.strftime('%Y-%m-%d')}.tsv" + puts "Creating dataset report for items with author or contributor affiliation like \"#{args.name}\"" + Tasks::Reports::InstitutionDatasets.datasets_by_name(name: args.name) + puts "Done, see #{args.name}-#{Time.now.strftime('%Y-%m-%d')}.tsv" + exit end desc 'Generates a PDF report with monthly stats for GREI' task grei_monthly_report: :environment do Tasks::Reports::GREI.generate_monthly_report - end end diff --git a/lib/tasks/stash_engine_tasks.rake b/lib/tasks/stash_engine_tasks.rake index 191f3cc8e6..93c023f815 100644 --- a/lib/tasks/stash_engine_tasks.rake +++ b/lib/tasks/stash_engine_tasks.rake @@ -131,10 +131,12 @@ namespace :identifiers do end end + # example: RAILS_ENV=production bundle exec rails identifiers:remove_abandoned_datasets -- --dry_run true desc 'remove abandoned, unpublished datasets that will never be published' task remove_abandoned_datasets: :environment do + args = Tasks::ArgsParser.parse :dry_run # This task cleans up datasets that may have had some activity, but they have no real chance of being published. - dry_run = ENV['DRY_RUN'] == 'true' + dry_run = args.dry_run == 'true' if dry_run puts ' ##### remove_abandoned_datasets DRY RUN -- not actually running delete commands' else @@ -191,13 +193,17 @@ namespace :identifiers do end end end + exit end + # example: RAILS_ENV=production bundle exec rails identifiers:remove_old_versions -- --dry_run true desc 'clean up in_progress versions and temporary files that are disconnected from datasets' task remove_old_versions: :environment do # This task cleans up garbage versions of datasets, which may have been abandoned, but they may also have been accidentally created # and not properly connected to an Identifier object - dry_run = ENV['DRY_RUN'] == 'true' + args = Tasks::ArgsParser.parse :dry_run + # This task cleans up datasets that may have had some activity, but they have no real chance of being published. + dry_run = args.dry_run == 'true' if dry_run puts ' ##### remove_old_versions DRY RUN -- not actually running delete commands' else @@ -248,6 +254,7 @@ namespace :identifiers do Stash::Aws::S3.new.delete_dir(s3_key: id_prefix) unless dry_run end end + exit end # This task is deprecated, since we no longer want to automatically expire the review date, @@ -690,16 +697,18 @@ namespace :identifiers do end end + # example: RAILS_ENV=production bundle exec rails identifiers:shopping_cart_report -- --year_month 2024-05 desc 'Generate a report of items that have been published in a given month' task shopping_cart_report: :environment do - # Get the year-month specified in YEAR_MONTH environment variable. + args = Tasks::ArgsParser.parse(:year_month) + # Get the year-month specified in --year_month argument. # If none, default to the previously completed month. - if ENV['YEAR_MONTH'].blank? - p 'No month specified, assuming last month.' - year_month = 1.month.ago.strftime('%Y-%m') - else - year_month = ENV['YEAR_MONTH'] - end + year_month = if args.year_month.blank? + p 'No month specified, assuming last month.' + 1.month.ago.strftime('%Y-%m') + else + args.year_month + end p "Writing Shopping Cart Report for #{year_month} to file..." CSV.open("shopping_cart_report_#{year_month}.csv", 'w') do |csv| @@ -729,17 +738,19 @@ namespace :identifiers do exit end + # example: RAILS_ENV=production bundle exec rails identifiers:deferred_journal_reports -- --sc_report /path/to/file desc 'Generate reports of items that should be billed for deferred journals' task deferred_journal_reports: :environment do - # Get the input shopping cart report in SC_REPORT environment variable. - if ENV['SC_REPORT'].blank? - puts 'Usage: deferred_journal_reports SC_REPORT=' + args = Tasks::ArgsParser.parse(:sc_report) + # Get the input shopping cart report in --sc_report argument. + if args.sc_report.blank? + puts 'Usage: rails deferred_journal_reports -- --sc_report ' exit - else - sc_report_file = ENV['SC_REPORT'] - puts "Producing deferred journal reports for #{sc_report_file}" end + sc_report_file = args.sc_report + puts "Producing deferred journal reports for #{sc_report_file}" + sc_report = CSV.parse(File.read(sc_report_file), headers: true) md = /(.*)shopping_cart_report_(.*).csv/.match(sc_report_file) @@ -779,18 +790,20 @@ namespace :identifiers do exit end + # example: RAILS_ENV=production bundle exec rails identifiers:tiered_journal_reports -- --base_report /path/to/base_report --sc_report /path/to/file desc 'Generate reports of items that should be billed for tiered journals' task tiered_journal_reports: :environment do - # Get the input shopping cart report in BASE_REPORT and SC_REPORT environment variables. - if ENV['SC_REPORT'].blank? || ENV['BASE_REPORT'].blank? - puts 'Usage: tiered_journal_reports BASE_REPORT= SC_REPORT=' + args = Tasks::ArgsParser.parse(:sc_report, :base_report) + # Get the input shopping cart report in --base_report and --sc_report arguments. + if args.sc_report.blank? || args.base_report.blank? + puts 'Usage: tiered_journal_reports -- --base_report --sc_report ' exit - else - base_report_file = ENV.fetch('BASE_REPORT', nil) - sc_report_file = ENV.fetch('SC_REPORT', nil) - puts "Producing tiered journal reports for #{sc_report_file}, using base in #{base_report_file}" end + base_report_file = args.base_report + sc_report_file = args.sc_report + puts "Producing tiered journal reports for #{sc_report_file}, using base in #{base_report_file}" + base_values = tiered_base_values(base_report_file) puts "Calculated base values #{base_values}" @@ -975,18 +988,20 @@ namespace :identifiers do end # rubocop:enable Metrics/MethodLength + # example: RAILS_ENV=production bundle exec rails identifiers:tiered_tenant_reports -- --base_report /path/to/base_report --sc_report /path/to/file desc 'Generate reports of items that should be billed for tiered tenant institutions' task tiered_tenant_reports: :environment do - # Get the input shopping cart report in BASE_REPORT and SC_REPORT environment variables. - if ENV['SC_REPORT'].blank? || ENV['BASE_REPORT'].blank? - puts 'Usage: tiered_tenant_reports BASE_REPORT= SC_REPORT=' + args = Tasks::ArgsParser.parse(:sc_report, :base_report) + # Get the input shopping cart report in --base_report and --sc_report arguments. + if args.sc_report.blank? || args.base_report.blank? + puts 'Usage: tiered_tenant_reports -- --base_report --sc_report ' exit - else - base_report_file = ENV.fetch('BASE_REPORT', nil) - sc_report_file = ENV.fetch('SC_REPORT', nil) - puts "Producing tiered tenant reports for #{sc_report_file}, using base in #{base_report_file}" end + base_report_file = args.base_report + sc_report_file = args.sc_report + puts "Producing tiered tenant reports for #{sc_report_file}, using base in #{base_report_file}" + base_values = tiered_tenant_base_values(base_report_file) puts "Calculated base values #{base_values}" @@ -1061,16 +1076,18 @@ namespace :identifiers do base_values end + # example: RAILS_ENV=production bundle exec rails identifiers:geographic_authors_report -- --year_month 2024-05 desc 'Generate a report of Dryad authors and their countries' task geographic_authors_report: :environment do - # Get the year-month specified in YEAR_MONTH environment variable. + args = Tasks::ArgsParser.parse(:year_month) + # Get the year-month specified in --year_month argument. # If none, default to the previously completed month. - if ENV['YEAR_MONTH'].blank? - p 'No month specified, assuming last month.' - year_month = 1.month.ago.strftime('%Y-%m') - else - year_month = ENV['YEAR_MONTH'] - end + year_month = if args.year_month.blank? + p 'No month specified, assuming last month.' + 1.month.ago.strftime('%Y-%m') + else + args.year_month + end p "Writing Geographic Authors Report for #{year_month} to file..." CSV.open('geographic_authors_report.csv', 'w') do |csv| @@ -1095,16 +1112,19 @@ namespace :identifiers do exit end + # example: RAILS_ENV=production bundle exec rails identifiers:dataset_info_report -- --year_month 2024-05 desc 'Generate a summary report of all items in Dryad' task dataset_info_report: :environment do - # Get the year-month specified in YEAR_MONTH environment variable. + args = Tasks::ArgsParser.parse(:year_month) + # Get the year-month specified in --year_month argument. # If none, default to the previously completed month. - if ENV['YEAR_MONTH'].blank? - p 'No month specified, assuming all months.' - year_month = nil + + if args.year_month.blank? + p 'No month specified, assuming last month.' + year_month = 1.month.ago.strftime('%Y-%m') filename = "dataset_info_report-#{Date.today.strftime('%Y-%m-%d')}.csv" else - year_month = ENV['YEAR_MONTH'] + year_month = args.year_month filename = "dataset_info_report-#{year_month}.csv" end @@ -1394,13 +1414,14 @@ namespace :journals do nil end + # example: RAILS_ENV=production bundle exec rails journals:check_salesforce_sync -- --dry_run true desc 'Compare journal differences between Dryad and Salesforce' task check_salesforce_sync: :environment do - - dry_run = if ENV['DRY_RUN'].blank? + args = Tasks::ArgsParser.parse(:dry_run) + dry_run = if args.dry_run.blank? true else - ENV['DRY_RUN'] != 'false' + args.dry_run != 'false' end puts 'Processing with DRY_RUN' if dry_run @@ -1432,7 +1453,7 @@ namespace :journals do sf_parent = Stash::Salesforce.find(obj_type: 'Account', obj_id: sf_parent_id) puts "SPONSOR MISMATCH for #{j.single_issn} -- #{j.sponsor&.name} -- #{sf_parent['Name']}" if j.sponsor&.name != sf_parent['Name'] end - nil + exit end end diff --git a/lib/tasks/users.rake b/lib/tasks/users.rake index 3c4cb1c377..6f11410b4d 100644 --- a/lib/tasks/users.rake +++ b/lib/tasks/users.rake @@ -1,19 +1,21 @@ # :nocov: require 'byebug' namespace :users do + # example: RAILS_ENV= bundle exec rake users:merge_users -- --old_id 12345 --new_id 4321 desc 'Merge old and new users (into old account like it works in the UI)' task merge_users: :environment do - if ENV['RAILS_ENV'].blank? || ARGV.length != 3 || ARGV[1].to_i == 0 || ARGV[2].to_i == 0 + args = Tasks::ArgsParser.parse(:old_id, :new_id) + if ENV['RAILS_ENV'].blank? || args.old_id.blank? || args.new_id.blank? puts "Merges two users together, the old user (old datasets) has new user/datasets merged into it and new ORCID copied to it\n\n" puts 'Run this script with the line:' - puts " RAILS_ENV= bundle exec rake users:merge_users \n\n" - puts 'Example: RAILS_ENV=development bundle exec rake users:merge_users 645 8037' + puts " RAILS_ENV= bundle exec rake users:merge_users -- --old_id --new_id \n\n" + puts 'Example: RAILS_ENV=development bundle exec rake users:merge_users -- --old_id 12345 --new_id 4321' puts "\nThe user ids should be obtained by looking at id field in the stash_engine_users" exit end - old_user = StashEngine::User.find(ARGV[1].to_i) - new_user = StashEngine::User.find(ARGV[2].to_i) + old_user = StashEngine::User.find(args.old_id.to_i) + new_user = StashEngine::User.find(args.new_id.to_i) puts 'old user' puts '--------' pp(old_user) diff --git a/lib/tasks/zenodo.rake b/lib/tasks/zenodo.rake index b53a0f0b3a..916e1966aa 100644 --- a/lib/tasks/zenodo.rake +++ b/lib/tasks/zenodo.rake @@ -1,3 +1,4 @@ +require_relative 'args_parser' require_relative 'zenodo/stats' require_relative 'zenodo/metadata' require 'byebug' @@ -66,6 +67,7 @@ namespace :zenodo do puts "Optimistic completion date: #{(Time.new + time_remaining).strftime('%Y-%m-%d')}" end + # example: rails/rake zenodo:update_metadata -- --start_id 10 desc 'Update metadata at zenodo latest version for datasets' task update_metadata: :environment do $stdout.sync = true # keeps the output from buffering and delaying output @@ -74,13 +76,11 @@ namespace :zenodo do puts 'Exiting metadata update' exit end - - start_num = ARGV[1].to_i + args = Tasks::ArgsParser.parse(:start_id) + start_num = args.start_id.to_i identifiers = StashEngine::Identifier.joins(:zenodo_copies).distinct.order(:id).offset(start_num) - ARGV.each { |a| task(a.to_sym {}) } # prevents rake from interpreting addional args as other rake tasks puts "Updating zenodo metadata starting at record #{start_num}" - # this stops spamming of activerecord query logs in dev environment ActiveRecord::Base.logger.silence do identifiers.each_with_index do |identifier, idx| @@ -106,5 +106,6 @@ namespace :zenodo do sleep 1 end end + exit # prevents rake from interpreting additional args as other rake tasks end end diff --git a/script/stash/counter-uploader/readme.md b/script/stash/counter-uploader/readme.md index 66fd352587..6ae78b5e6f 100644 --- a/script/stash/counter-uploader/readme.md +++ b/script/stash/counter-uploader/readme.md @@ -16,7 +16,7 @@ that DataCite does not have or that have a suspiciously small number of results. Use a command like this: ```shell script -RAILS_ENV=production REPORT_DIR="/my/report/dir" bundle exec rails counter:datacite_pusher +RAILS_ENV=production bundle exec rails counter:datacite_pusher -- --report_dir /my/report/dir ``` ## Force upload of JSON reports (even if they're not suspicious) @@ -27,7 +27,7 @@ appear suspicious at DataCite. Use a command like this: ```shell script -RAILS_ENV=production REPORT_DIR="/my/report/dir" FORCE_SUBMISSION="2021-11" bundle exec rails counter:datacite_pusher +RAILS_ENV=production bundle exec rails counter:datacite_pusher -- --report_dir /my/report/dir --force_submission 2021-11 ``` ## Get information about the reports at DataCite, but don't upload any reports @@ -37,9 +37,9 @@ with the number of pages of results each of those months has. It's useful for tr down submission problems. ```shell script -RAILS_ENV=production REPORT_DIR="/my/report/dir" REPORT_IDS=true bundle exec rails counter:datacite_pusher +RAILS_ENV=production bundle exec rails counter:datacite_pusher -- --report_dir /my/report/dir --report_ids true ``` At the end of the output it will print out report months, DataCite report identifiers and the number of pages of results. You'll need report identifiers if you need to update -a report or ask DataCite to investigate ingest problems. \ No newline at end of file +a report or ask DataCite to investigate ingest problems. diff --git a/spec/tasks/counter_spec.rb b/spec/tasks/counter_spec.rb index 69d1f0be20..f0eaa99d61 100644 --- a/spec/tasks/counter_spec.rb +++ b/spec/tasks/counter_spec.rb @@ -20,6 +20,7 @@ end it 'executes the task and creates the stats in the database based on json files' do + ARGV.replace(['counter:cop_manual', '--', '--json_directory', @path.to_s]) task.execute @test_items.each_pair do |k, v| doi_obj = StashEngine::Identifier.find_by_identifier(k)