Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove collection size feature #4207

Merged
merged 2 commits into from
Jan 24, 2020
Merged

Remove collection size feature #4207

merged 2 commits into from
Jan 24, 2020

Conversation

mjgiarlo
Copy link
Member

@mjgiarlo mjgiarlo commented Jan 23, 2020

While looking into #4100, we noticed that CharacterizeJob not only saves a file set's original_file, but also reindexes the file set, and reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in CharacterizeJob introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.

jcoyne
jcoyne previously approved these changes Jan 23, 2020
Copy link
Contributor

@jeremyf jeremyf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add logic to deprecate the #size method usage? (e.g. have a #size method for collections that returns n/a)

@mjgiarlo
Copy link
Member Author

mjgiarlo commented Jan 23, 2020

@jeremyf 💬

Do we need to add logic to deprecate the #size method usage? (e.g. have a #size method for collections that returns n/a)

I defer to @no-reply and others who have a hyrax in this hunt.

While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.

Why is this problematic?

At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).

We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
@no-reply
Copy link
Contributor

💯 to this! thanks.

i do think it would be nice to make the public methods emit deprecation warnings and return hard-coded & compatible null-ish values

Copy link
Member Author

@mjgiarlo mjgiarlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting some small tweaks, and some questions.

app/models/concerns/hyrax/collection_behavior.rb Outdated Show resolved Hide resolved
app/models/concerns/hyrax/collection_behavior.rb Outdated Show resolved Hide resolved
app/presenters/hyrax/collection_presenter.rb Outdated Show resolved Hide resolved
app/presenters/hyrax/collection_presenter.rb Outdated Show resolved Hide resolved
@elrayle
Copy link
Contributor

elrayle commented Jan 30, 2020

I know this is already merged, but I just want to add my kudos to @mjgiarlo (and any others working on this PR). Always happy to see performance enhancements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants