Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CharacterizeJob to receive either an AF object or a Valkyrie resource as the FileSet #4100

Closed
elrayle opened this issue Oct 21, 2019 · 3 comments
Labels

Comments

@elrayle
Copy link
Contributor

elrayle commented Oct 21, 2019

Descriptive summary

Convert CharacterizeJob to allow it to receive either an ActiveFedora based FileSet or a Valkyrie::Resource equivalent FileSet.

Rationale

Allow jobs to execute correctly with existing AF implementation or the wings Valkyrie implementation.

Expected behavior

CharacterizeJob receives a Valkyrie::Resource and performs the characterization.

Actual behavior

CharacterizeJob raises an exception when passed a Valkyrie::Resource.

Approach for Conversion

See Pattern for Jobs Receiving Parameter that may be an AF Object or a Valkyrie Resource

Steps to reproduce the behavior

  1. Need to add tests that validate that the same tests pass whether a Valkyrie::Resource or an AF object is passed.

See example of processing both objects in... file_set_actor_spec.rb

For testing this job, you can use something like...

RSpec.describe CharacterizeJob do
  [true, false].each do |test_valkyrie|
    context "when test_valkyrie is #{test_valkyrie}" do
      let(:af_file_set) { create(:file_set) }
      let(:valk_file_set) { af_file_set.valkyrie_resource }
      let(:file_set) { test_valkyrie ? valk_file_set : af_file_set }

OR you can write tests that are specific to each object type, similar to the tests in spec/jobs/hyrax/revoke_edit_job_spec.rb. This approach doubles the number of tests and doesn't provide certainty that the new code works exactly the same as the old code if the two versions of the tests diverge. The approach to take is at the discretion and judgement of the developer.

Related work

These have all be modified to support AF object or Valkyrie resource using the [established pattern]((https://docs.google.com/document/d/1f9yDjtkXapFf1DYLyPhprnB4CPGaZw9p_k8T_tcor-U/edit#heading=h.qt9rxydfd727).

@mjgiarlo
Copy link
Member

Blocked by #4060. Should be unblocked by #4055.

mjgiarlo added a commit that referenced this issue Jan 23, 2020
While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
mjgiarlo added a commit that referenced this issue Jan 23, 2020
While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
mjgiarlo added a commit that referenced this issue Jan 23, 2020
While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
no-reply pushed a commit that referenced this issue Jan 24, 2020
While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
fritzfreiheit pushed a commit that referenced this issue Jan 30, 2020
While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
fritzfreiheit pushed a commit that referenced this issue Jan 31, 2020
While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
@mjgiarlo mjgiarlo removed their assignment Apr 29, 2020
@orangewolf orangewolf self-assigned this May 3, 2021
@no-reply
Copy link
Contributor

I think this might be a wontfix, since there's now a characterization infrastructure that works with Valkyrie and is used for Valkyrie driven objects by default

@hackartisan
Copy link
Contributor

This work was done in #5321

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants