Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized rake tasks argument handling #1734

Merged
merged 7 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ ui-library/bower_components
ui-library/css/vendor/
ui-library/js/vendor/
stash/stash_engine/ui-library/
*.css.map
*.css.map

# ######################################################

Expand Down Expand Up @@ -104,6 +104,7 @@ build-iPhoneSimulator/

# unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
.rvmrc
.nvmrc

#####=== Rails ===#####

Expand Down
2 changes: 1 addition & 1 deletion documentation/external_services/salesforce.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ journal settings:
rails journals:check_salesforce_sync
```

To clean up metadata in Salesforce associated with journals, add `DRY_RUN=false`
To clean up metadata in Salesforce associated with journals, add ` -- --dry_run false`
to the end of the above command.


Expand Down
14 changes: 8 additions & 6 deletions documentation/reports.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ tracking of payments.

Run it with a command like:
```
RAILS_ENV=v3_production bundle exec rails identifiers:shopping_cart_report YEAR_MONTH=2024-01
RAILS_ENV=v3_production bundle exec rails identifiers:shopping_cart_report -- --year_month 2024-01
```

Fields in the shopping cart report
Expand All @@ -37,7 +37,7 @@ To run the report and retrieve the files:
```
# on v3-prod server
cd ~/deploy/current
RAILS_ENV=v3_production bundle exec rake identifiers:shopping_cart_report YEAR_MONTH=2024-01
RAILS_ENV=v3_production bundle exec rake identifiers:shopping_cart_report -- --year_month 2024-01
cp ~/deploy/current/shopping* ~/journal-payments/shoppingcart/
cd ~/journal-payments/shoppingcart
git pull
Expand Down Expand Up @@ -66,7 +66,7 @@ Run the deferred payment reports with a command like:
```
cp ~/journal-payments/shoppingcart/shopping_cart_report_2023* /tmp
# This command must be run on the v3-prod server, to access the production database
RAILS_ENV=v3_production bundle exec rails identifiers:deferred_journal_reports SC_REPORT=/tmp/shopping_cart_report_2024-Q1.csv
RAILS_ENV=v3_production bundle exec rails identifiers:deferred_journal_reports -- --sc_report /tmp/shopping_cart_report_2024-Q1.csv
```

### Tiered payment reports
Expand All @@ -83,7 +83,8 @@ Run the tiered journal payment reports with a command like:
# This command must be run in a personal account with the journal-payments checked out
cp ~/journal-payments/shoppingcart/shopping_cart_report_2023* /tmp
# This command must be run on the production server, to access the production database
RAILS_ENV=v3_production bundle exec rails identifiers:tiered_journal_reports SC_REPORT=/tmp/shopping_cart_report_2023-Q1.csv BASE_REPORT=/tmp/shopping_cart_report_2023.csv

RAILS_ENV=v3_production bundle exec rails identifiers:tiered_journal_reports -- --sc_report /tmp/shopping_cart_report_2023-Q1.csv --base_report /tmp/shopping_cart_report_2023.csv
```

For tenant institutions that have a tiered payment plan, a similar secondary task
Expand All @@ -96,7 +97,8 @@ Run the tiered institution payment reports with a command like:
# This command must be run in a personal account with the journal-payments checked out
cp ~/journal-payments/shoppingcart/shopping_cart_report_2023* /tmp
# This command must be run on the production server, to access the production database
RAILS_ENV=v3_production bundle exec rails identifiers:tiered_tenant_reports SC_REPORT=/tmp/shopping_cart_report_2023-Q1.csv BASE_REPORT=/tmp/shopping_cart_report_2023.csv

RAILS_ENV=v3_production bundle exec rails identifiers:tiered_tenant_reports -- --sc_report tmp/shopping_cart_report_2023-Q1.csv --base_report /tmp/shopping_cart_report_2023.csv
```

Dataset info report
Expand Down Expand Up @@ -205,7 +207,7 @@ This also gives institutions when people didn't autocomplete their ROR correctly
Run like:

```
bundle exec rails reports:from_text_institution name="Planck" RAILS_ENV=v3_production
RAILS_ENV=v3_production bundle exec rails reports:from_text_institution -- --name="Planck"
```

Put the string you want to detect in the `name` variable. It shows the matches in `author_affiliations`
Expand Down
4 changes: 2 additions & 2 deletions documentation/ror_transition.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Truncate the table to remove all current entries. The table is called `stash_en
You can download the latest ROR exports at https://doi.org/10.5281/zenodo.6347574 .

```bash
RAILS_ENV=development bundle exec rails affiliation_import:populate_funder_ror_mapping /path/to/file
RAILS_ENV=development bundle exec rails affiliation_import:populate_funder_ror_mapping -- --path /path/to/file
```

## task to re-import latest ROR data
Expand Down Expand Up @@ -75,4 +75,4 @@ Update the DataCite records with updated metadata.

```bash
RAILS_ENV=development bundle exec rails datacite_target:update_dryad
```
```
11 changes: 6 additions & 5 deletions documentation/server_maintenance/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ embargo date. You can find the deposition_id in the stash_engine_zenodo_copies
table. The zenodo_copy_id is the id from that same table.
```
# the arguments are 1) resource_id, 2) deposition_id at zenodo, 3) date, 4) zenodo_copy_id
RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo 97683 4407065 2021-12-31 12342
RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo -- --resource_id 97683 --deposition_id 4407065 --date 2021-12-31 --zenodo_copy_id 12342
```

**You must login to Zenodo and "publish" the new version of the dataset; otherwise the embargo
Expand Down Expand Up @@ -300,7 +300,8 @@ table. The zenodo_copy_id is the `stash_engine_zenodo_copies.id` from that same

```
# the arguments are 1) resource_id, 2) deposition_id at zenodo, 3) date, 4) zenodo_copy_id
RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo 97683 4407065 2023-07-25 1234
RAILS_ENV=production bundle exec rake dev_ops:embargo_zenodo -- --resource_id 97683 --deposition_id 4407065 --date 2023-07-25 --zenodo_copy_id 1234

```
**You must login to Zenodo and "publish" the new version of the dataset; otherwise the embargo
will not take effect. This is probably something we can fix in the code, but it is waiting for us
Expand Down Expand Up @@ -386,13 +387,13 @@ it is harder to do after removal.

```
# the parameters are 1) resource_id, 2) deposition_id (see in stash_engine_zenodo_copies), 3) date far in the future
RAILS_ENV=production bundle exec rails dev_ops:embargo_zenodo <resource-id> <deposition-id> 2200-12-31
RAILS_ENV=production bundle exec rails dev_ops:embargo_zenodo -- --resource_id <resource-id> --deposition_id <deposition-id> --date <YYYY-MM-DD> --zenodo_copy_id <zenodo_copy_id>
```


If you need to completely remove a dataset from existence, you can run
```
rails dev_ops:destroy_dataset 10.27837/dryad.catfood
rails dev_ops:destroy_dataset -- --doi 10.27837/dryad.catfood
```

This command will remove the dataset from Dryad, and give instructions to remove
Expand Down Expand Up @@ -484,7 +485,7 @@ end

To update anything published between a set of dates using a task, you can use:
```
RAILS_ENV=production bundle exec rails datacite_target:update_by_publication YYYY-MM-DD YYYY-MM-DD
RAILS_ENV=production bundle exec rails datacite_target:update_by_publication -- --start YYYY-MM-DD --end YYYY-MM-DD
```

If you need to update DataCite for *all* items in Dryad, you can use:
Expand Down
4 changes: 2 additions & 2 deletions documentation/sql_queries/fundref_to_ror_comparisons.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ RAILS_ENV=production bundle exec rails affiliation_import:update_ror_orgs

Download the zip and extract the json file from https://doi.org/10.5281/zenodo.6347574:
```bash
RAILS_ENV=<environment> bundle exec rails affiliation_import:populate_funder_ror_mapping <path/to/ror/dump.json>
RAILS_ENV=<environment> bundle exec rails affiliation_import:populate_funder_ror_mapping -- --path <path/to/ror/dump.json>
```

After the imports you can run the following query to see how items in the database map and that
Expand Down Expand Up @@ -66,4 +66,4 @@ Items that are funders but don't match a fundref ID:
SELECT DISTINCT contributor_name, contributor_type, identifier_type, name_identifier_id
FROM dcs_contributors
WHERE contributor_type = 'funder' AND name_identifier_id ='';
```
```
36 changes: 23 additions & 13 deletions lib/tasks/affiliation_import.rake
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,23 @@ namespace :affiliation_import do
Stash::Organization::RorUpdater.perform
end

# example: RAILS_ENV=development bundle exec rake affiliation_import:process_ror_csv -- --affiliation_mode true
desc 'Process all of the CSV files'
task process_ror_csv: :environment do
start_time = Time.now
@dois_to_skip = []
@live_mode = false
@last_resource = nil
args = Tasks::ArgsParser.parse(:affiliation_mode)

case ENV.fetch('AFFILIATION_MODE', nil)
case args.affiliation_mode
when nil
puts 'Environment variable AFFILIATION_MODE is blank, assuming test mode.'
puts '--affiliation_mode argument is blank, assuming test mode.'
when 'live'
puts 'Starting live processing due to environment variable AFFILIATION_MODE.'
puts 'Starting live processing due to --affiliation_mode argument.'
@live_mode = true
else
puts "Environment variable AFFILIATION_MODE is #{ENV.fetch('AFFILIATION_MODE', nil)}, entering test mode."
puts "--affiliation_mode argument is #{args.affiliation_mode}, entering test mode."
end

puts 'Loading affiliation info from CSV files in /tmp/dryad_affiliations*'
Expand All @@ -50,25 +52,28 @@ namespace :affiliation_import do
end

puts "DONE! Elapsed time: #{Time.at(Time.now - start_time).utc.strftime('%H:%M:%S')}"
exit
end

# example: RAILS_ENV=development bundle exec rake affiliation_import:merge_duplicate_authors -- --author_merge_mode true --start 0
desc 'Merge duplicate authors'
task merge_duplicate_authors: :environment do
start_time = Time.now
@live_mode = false
args = Tasks::ArgsParser.parse(:author_merge_mode, :start)

case ENV.fetch('AUTHOR_MERGE_MODE', nil)
case args.author_merge_mode
when nil
puts 'Environment variable AUTHOR_MERGE_MODE is blank, assuming test mode.'
puts '--author_merge_mode argument is blank, assuming test mode.'
when 'live'
puts 'Starting live processing due to environment variable AUTHOR_MERGE_MODE.'
puts 'Starting live processing due to --author_merge_mode argument.'
@live_mode = true
else
puts "Environment variable AUTHOR_MERGE_MODE is #{ENV.fetch('AUTHOR_MERGE_MODE', nil)}, entering test mode."
puts "--author_merge_mode argument is #{args.author_merge_mode}, entering test mode."
end

start_from = 0
start_from = ENV['START'].to_i unless ENV['START'].blank?
start_from = args.start.to_i unless args.start.blank?

stash_ids = StashEngine::Identifier.all.order('stash_engine_identifiers.id').distinct
stash_ids.each_with_index do |i, idx|
Expand All @@ -94,35 +99,40 @@ namespace :affiliation_import do
end
end
puts "DONE! Elapsed time: #{Time.at(Time.now - start_time).utc.strftime('%H:%M:%S')}"
exit
end

# example: rake affiliation_import:populate_ror_db -- --path /path/to/json_file
desc 'Populate our ROR database manually from the ROR dump json file because the Zenodo API not working'
task populate_ror_db: :environment do
$stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f

if ARGV.length != 1
args = Tasks::ArgsParser.parse(:path)
unless args.path
puts 'Please enter the path to the ROR dump json file as an argument'
puts 'You can get the latest dump from https://doi.org/10.5281/zenodo.6347574 (get json file for last version in zip)'
exit
end

ror_dump_file = ARGV[0]
ror_dump_file = args.path
exit unless File.exist?(ror_dump_file)

Stash::Organization::RorUpdater.process_ror_json(json_file_path: ror_dump_file)
end

# example: rake affiliation_import:populate_funder_ror_mapping -- --path /path/to/json_file
desc 'Populate fundref_id to ror_id mapping table'
task populate_funder_ror_mapping: :environment do
$stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f

if ARGV.length != 1
args = Tasks::ArgsParser.parse(:path)
unless args.path
puts 'Please enter the path to the ROR dump json file as an argument'
puts 'You can get the latest dump from https://doi.org/10.5281/zenodo.6347574 (get json file for last version in zip)'
exit
end

ror_dump_file = ARGV[0]
ror_dump_file = args.path
exit unless File.exist?(ror_dump_file)

ActiveRecord::Base.connection.truncate(StashEngine::XrefFunderToRor.table_name)
Expand Down
23 changes: 23 additions & 0 deletions lib/tasks/args_parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
require 'optparse'
# rubocop:disable Lint/EmptyBlock
module Tasks
module ArgsParser

def self.parse(*attributes)
options = OpenStruct.new
return options if attributes.blank?

opts = OptionParser.new
opts.banner = 'Usage: rake add [options]'
attributes.each do |key|
opts.on('-o', "--#{key}=value", String) { |value| options[key] = value }
end

args = opts.order!(ARGV) {}
opts.parse!(args)

options
end
end
end
# rubocop:enable Lint/EmptyBlock
8 changes: 6 additions & 2 deletions lib/tasks/compressed.rake
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# :nocov:

namespace :compressed do
# example: rails/rake compressed:update_contents
task update_contents: :environment do
$stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f

Expand Down Expand Up @@ -52,15 +54,17 @@ namespace :compressed do
end

# a simplified version of the above task that only updates one resource for testing and doesn't catch errors
# example: rails/rake compressed:update_one -- --file_id 10
task update_one: :environment do
$stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f
args = Tasks::ArgsParser.parse(:file_id)

if ARGV.length != 1
unless args.file_id
puts 'Please enter the file id as the only argument to this task'
exit
end

db_file = StashEngine::DataFile.find(ARGV[0])
db_file = StashEngine::DataFile.find(args.file_id)

puts "Updating container_contents for #{db_file.upload_file_name} (id: #{db_file.id}, " \
"resource_id: #{db_file.resource_id})"
Expand Down
Loading