Skip to content

Commit

Permalink
Merge pull request #67 from lyrasis/v4-0-0
Browse files Browse the repository at this point in the history
Release v4.0.0
  • Loading branch information
kspurgin authored Apr 15, 2024
2 parents d7faec2 + 850259c commit c965b1a
Show file tree
Hide file tree
Showing 77 changed files with 1,170 additions and 577 deletions.
22 changes: 16 additions & 6 deletions CHANGELOG.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,30 @@ endif::[]

= Changelog

== Unreleased
These changes are merged into the `main` branch, but have not been released. After merging pull requests (PRs) that are not immediately released into `main`, a tag is added appending the PR# to the current release. For example, if the release version/tag is `v0.0.0`, and PR# 15 is merged without a new release, the state of the codebase after that merge will be tagged as `v0.0.0.pr15`.
== 4.0.0 (2024-04-15)

=== Breaking

* `done` as an explicit batch status is removed from the application. Commands to mark batches as done are removed. `batches:done` will now display all batches with batch_status = ingested.
* `batches:to_ingcheck` command renamed `batches:to_ingstat`, in order to be consistent with `batches:ingstat` and `batch:ingstat` commands.
* Columns in batches CSV have changed. You will be prompted to run `thor batches:fix_csv` at some point, or you can run it proactively.

=== Added

=== Bugfixes
* `batch_status` column to batches CSV
* `batch_mode` column to batches CSV (added in mapping step)
* `ingest duration` calculated, shown in STDOUT, and recorded in batches CSV if ingest is complete when `batch:ingstat` is run
* Batch archiving is enabled by default

=== Changed
=== Changes

=== Deleted
* `csid delete --csv` option now uses `ingest_dir` if configured and only a file name is given
* Reset default client config `media_with_blob_upload_delay` value to 500ms in accordance with the sample client config
* `batches show` now shows headers and batch status

=== Bugfixes

=== Deprecated/Will break in a future version
* Fix issue with `vt add` command, in which ampersands in terms were not escaped in the created XML, causing errors (https://github.com/lyrasis/collectionspace_migration_tools/issues/39[#39])

== 3.0.1 (2024-03-11)

Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ gem "pg", "~> 1.4"
gem "redis", "~> 4.2.1"
gem "refinements", "~> 9.1"
gem "smarter_csv", "~> 1.7.4"
gem "tabulo", "~> 3"
gem "thor", "~> 1"
gem "thor-hollaback", "~> 0"
gem "zeitwerk", "~> 2.5"
Expand Down
5 changes: 5 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -270,11 +270,15 @@ GEM
standard-performance (1.1.2)
lint_roller (~> 1.1)
rubocop-performance (~> 1.18.0)
tabulo (3.0.1)
tty-screen (= 0.8.2)
unicode-display_width (~> 2.5)
thor (1.3.0)
thor-hollaback (0.2.1)
hollaback (~> 0.1)
thor (>= 0.19.1)
time_up (0.0.7)
tty-screen (0.8.2)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
unicode-display_width (2.5.0)
Expand Down Expand Up @@ -315,6 +319,7 @@ DEPENDENCIES
rspec
simplecov (~> 0.21)
smarter_csv (~> 1.7.4)
tabulo (~> 3)
thor (~> 1)
thor-hollaback (~> 0)
time_up
Expand Down
35 changes: 24 additions & 11 deletions doc/workflows.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ NOTE: strike-through formatting indicates the option/command is not yet implemen
** `thor batch:ingstat obj12`
* If any records failed to ingest, run `batch:cb BATCHID` (cb = clear bucket) after doing any troubleshooting/etc
* If there were any duplicates added, run `duplicates:delete RECTYPE` (See important notes on this in step details below)
* Mark completed batches as done (`batches:mark_done`)
* Delete completed batches (`batches:delete_done`)
* Once `batch:ingstat` has indicated the ingest is complete, the batch status is set to "ingested", which indicates the batch is done.
* Delete completed batches (`batch:delete BATCHID` or `batches:delete_done`)

[TIP]
====
Expand Down Expand Up @@ -67,28 +67,41 @@ I've found that if an ingest initially appears to be complete with errors, if I

* `thor batches:to_map`
* `thor batches:to_upload`
* `thor batches:to_ingcheck`
* `thor batches:to_ingstat`
* `thor batches:done`

==== Running `map`, `upload`, or `ingstat` on all batches ready

* `thor batches:map`
* `thor batches:upload`
* `thor batches:ingcheck`
* `thor batches:ingstat`

==== Cleaning out finished batches
The batches CSV isn't really intended as a permanent record of all work done in a given instance. It can be useful to keep some completed batches with errors or warnings in the batches CSV as a reminder to check/handle the records that didn't work as expected the first time. But, once you are done with a batch and tasks related to it, it is best to move it out of the batches CSV.
The batches CSV isn't intended as a permanent record of all work done in a given instance. It can be useful to keep some completed batches with errors or warnings in the batches CSV as a reminder to check/handle the records that didn't work as expected the first time. But, once you are done with a batch and tasks related to it, it is best to delete the batch.


There are two options for moving completed batches out of the batches CSV:

delete:: Deletes batch directory and deletes row for batch from batches CSV
+++<s>+++archive+++</s>+++:: Zips batch directory and moves it to `batch_data/archives` directory. Moves row for batch from batches CSV to `batch_data/archives/archived_batches.csv`
`thor batch:delete BATCHID`:: Delete the specified batch. *This command deletes the batch regardless of batch status*
`thor batches:delete_done`:: Delete all batches with `ingested` batch status

The following happens when a batch is deleted:

* The batch directory containing XML records and reports is deleted.
* The batches CSV row for the batch will be deleted
* If the batch status was `ingested`, and you are archiving batches, the batches CSV row for the batch will be written to the batch archive CSV

IMPORTANT: A batch's source CSV may live anywhere on the user's machine. It is not copied into the batch directory, so deleting the batch does not delete the original file from which the ingested XML was derived. The mapper, upload, and ingest reports generated by this tool are written to the batch directory, and will be deleted with the batch

==== Batch archiving

Unless you add `archive_batches: false` in your client config, batch archiving is enabled.

The `delete` and +++<s>+++`archive`+++</s>+++ commands can both be run:
Batch archiving means: when an *ingested* batch is deleted, the row for that batch is removed from the batches CSV and written to the batch archive CSV. Deleted batches with `added`, `mapped`, or `uploaded` status are not archived.

* for a single batch (`thor batch:delete obj12`); OR
* on all completed batches (`thor batches:delete_done`). This one assumes you have run `thor batches:mark_done` and/or manually marked batches done in the batches CSV
**Now that running `batch:ingstat BATCHID` calculates ingest duration, retaining all ingested batches in the batch archive CSV allows us to capture data on ingest performance with no effort.** It behooves us to retain this information, because "well, how did it used to perform?" is often the first question if we raise a performance complaint.

IMPORTANT: A batch's source CSV may live anywhere on the user's machine. It is not copied into the batch directory, so archiving a batch does archive the source CSV itself. However, the mapper, upload, and ingest reports generated by this tool append columns onto the end of the original data, so you can re-constitute the original source data from any of those.
The batches archive CSV is always created in your client base directory. The default file name is `batches_archive.csv`, but you can change this in the `batch_archive_filename` setting in the client config.

== Autocaching
IMPORTANT: This behavior is configurable and only applies when mapping is done as part of batch workflow.
Expand Down
15 changes: 13 additions & 2 deletions lib/collectionspace_migration_tools.rb
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,10 @@ def csid_cache

def get_csv_path(csv)
config = CMT.config.client
return csv unless config.respond_to?(:ingest_dir)
return csv if ["~", "/"].any? { |char| csv.start_with?(char) }
return get_full_path(csv) unless config.respond_to?(:ingest_dir)
if ["~", "/"].any? { |char| csv.start_with?(char) }
return get_full_path(csv)
end

File.join(config.ingest_dir, csv)
end
Expand Down Expand Up @@ -107,6 +109,15 @@ def tunnel=(tunnel_obj)

# to identify CMT processes in `top`, `ps`, etc.
Process.setproctitle("CMT")

private

def get_full_path(csv)
return File.expand_path(csv) if csv.start_with?("~")

csv
end
module_function :get_full_path
end

CMT.loader
45 changes: 45 additions & 0 deletions lib/collectionspace_migration_tools/archive_csv.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# frozen_string_literal: true

module CollectionspaceMigrationTools
# Namespace for the batch archive CSV, if enabled in client config.
#
# The archive CSV contains data from the batches CSV for batches deleted after
# completed ingest (with or without errors). Rows for batches deleted at an
# earlier workflow stage are not added to the archive CSV.
module ArchiveCsv
extend Dry::Monads[:result, :do]

module_function

# @return [String]
def path
File.join(CMT.config.client.base_dir,
CMT.config.client.batch_archive_filename)
end

# @return [Boolean]
def present? = File.exist?(path)

def file_check
case present?
when true
Success()
when false
Failure(file_check_failure_msg)
end
end

def parse
data = File.read(path)
table = CSV.parse(data, headers: true)
rescue => err
msg = "#{err.message} IN #{err.backtrace[0]}"
Failure(CMT::Failure.new(context: "#{self.class.name}.#{__callee__}",
message: msg))
else
Success(table)
end

def file_check_failure_msg = "No archives CSV file present"
end
end
54 changes: 54 additions & 0 deletions lib/collectionspace_migration_tools/archive_csv/archiver.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# frozen_string_literal: true

require "csv"

module CollectionspaceMigrationTools
module ArchiveCsv
class Archiver
include Dry::Monads[:result]
include Dry::Monads::Do.for(:call)

class << self
def call(...)
new.call(...)
end
end

def initialize(
path: CMT::ArchiveCsv.path,
headers: CMT::Batch::Csv::Headers.all_headers
)
@path = path
@headers = headers
end

# @param batch[CMT::Batch::Batch]
def call(batch)
_present = if CMT::ArchiveCsv.present?
yield CMT::ArchiveCsv.file_check
else
yield CMT::ArchiveCsv::Creator.call
end
_write = yield write_row(batch)

Success(batch)
end

private

attr_reader :path, :headers

def write_row(batch)
CSV.open(path, "a", headers: true) do |csv|
csv << headers.map { |hdr| batch.send(hdr.to_sym) }
end
rescue => err
msg = "#{err.message} IN #{err.backtrace[0]}"
Failure(CMT::Failure.new(context: "#{self.class.name}.#{__callee__}",
message: msg))
else
Success()
end
end
end
end
51 changes: 51 additions & 0 deletions lib/collectionspace_migration_tools/archive_csv/checker.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# frozen_string_literal: true

require "csv"

module CollectionspaceMigrationTools
module ArchiveCsv
class Checker
include Dry::Monads[:result]
include Dry::Monads::Do.for(:call)

class << self
def call(...)
new(...).call
end
end

# @param path [String]
# @param headers [Array<String>]
def initialize(
path: CMT::ArchiveCsv.path,
headers: CMT::Batch::Csv::Headers.all_headers
)
@path = path
@headers = headers
end

def call
_exist = yield CMT::ArchiveCsv.file_check
table = yield CMT::ArchiveCsv.parse
_hdrs = yield header_check(table)

Success(table)
end

private

attr_reader :path, :headers

def header_check(table)
return Success() if table.headers == headers

Failure(header_check_failure_msg)
end

def header_check_failure_msg
"Archive CSV headers are not up-to-date, so archiving may "\
"fail unexpectedly. Run `thor archive:fix_csv` to fix"
end
end
end
end
48 changes: 48 additions & 0 deletions lib/collectionspace_migration_tools/archive_csv/creator.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# frozen_string_literal: true

require "csv"

module CollectionspaceMigrationTools
module ArchiveCsv
# Creates new batches archive CSV
class Creator
include CMT::Batch::Csv::Headers
include Dry::Monads[:result]

class << self
def call
new.call
end
end

def initialize
@path = CMT::ArchiveCsv.path
@headers = all_headers
end

def call
if File.exist?(path)
puts "#{path} already exists; leaving it alone"

Success()
else
build_archive_csv
end
end

private

attr_reader :path, :headers

def build_archive_csv
CSV.open(path, "wb") { |csv| csv << headers }
rescue => err
msg = "#{err.message} IN #{err.backtrace[0]}"
Failure(CMT::Failure.new(context: "#{self.class.name}.#{__callee__}",
message: msg))
else
File.exist?(path) ? Success() : Failure()
end
end
end
end
Loading

0 comments on commit c965b1a

Please sign in to comment.