Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iiif manifiest backport #4412

Merged
merged 8 commits into from
Jul 8, 2020
Merged

Iiif manifiest backport #4412

merged 8 commits into from
Jul 8, 2020

Conversation

no-reply
Copy link
Contributor

@no-reply no-reply commented Jul 8, 2020

backports a series of improvements to iiif manifest generation from the mainline to 2.x.

introduces draper as an explicit dependency to support a new, narrowly scoped manifest presenter.

allows caching for manifests using a FlipFlop feature flipper, so this feature is configurable by administrators.

@samvera/hyrax-code-reviewers

jeremyf and others added 8 commits July 7, 2020 17:53
This commit is to push something up for conversation. I don't want to
get too far in the solution without community feedback.

There are quite a few things that I've discovered in this PR:

1) The routes for a `Hyrax::Test::SimpleWork` do not work. I know that
   @no_reply was working through this at the [Solar Vortex][1] but has
   not had space to get back to that work.
2) There is are presumptive metadata fields for a work show presenter.
   These metadata fields are not part of `Hyrax::Test::SimpleWork`.
   This is further complicated by the fact that we have [configurable
   IIIF fields][4] which are currently congruent with the assumptive
   metadata fields but likely inadequate going forward (see below).

Given that the `Hyrax::WorksControllerBehavior#metadata` has
previously been untested, I do think that what I've written maps to
prior behavior with the ability to turn on caching.

Below is a [samvera.slack.com#hyrax channel][2] discussion about the
above presumptions.

@jeremyf (20)
> The Hyrax::Test::SimpleWork does not have the presumptive metadata
> fields in its [SolrDocument][3]

@no_reply
> the existence of "presumptive metadata fields" here is problematic.
> this is one of a couple of places where BasicMetadata is assumed,
> even though it's not supposed to be

@jeremyf
> The Hyrax::WorkShowPresenter has complications in its intersection
> of presumptive metadata (as listed in the delegate) and variance of
> [Hyrax.config.iiif_metadata_fields][4]

[1]:https://wiki.lyrasis.org/display/samvera/Developer+Congress+-+Solar+Vortex+-+January+2020
[2]:https://samvera.slack.com/archives/CA8ANGLEL/p1583447771028100
[3]:https://github.com/samvera/hyrax/blob/f102e0c43c90baa0efe44f6127134429edf34d51/app/presenters/hyrax/work_show_presenter.rb#L38-L42
[4]:https://github.com/samvera/hyrax/blob/f102e0c43c90baa0efe44f6127134429edf34d51/lib/hyrax/configuration.rb#L447-L453
add `WorksControllerBehavior.iiif_manifest_builder` and a private
`WorksControllerBehavior#iiif_manifest_builder`. these must return an object
that responds to `#as_json(presenter:)`.
this test doesn't need the Hyrax shared specs, so don't bother loading them.
use a more OO style for the dependency injection on `iiif_manifest_builder`.

makes the caching behavior a subclass, instead of a branch, and places the
configuration in the controller. this should make the manifest builders more
reusable for other controllers, which might decide on caching behavior in a
different way.

renames `#as_json` (which used to return a hash!) to `#manifest_for` and usees
the instance method in callers.
we're going to end up requiring `iiif_manifest` so we might as well have it
available application wide. putting the `require` in app only seems to muddy the
waters.
this complete rewrite of the manifest generation process reduces the footprint
of database (Fedora/Solr) calls involved in the manifest generation process,
fixes a bug in sequence/canvas generation for manifests for nested works, and
generally separates the issue of manifest generation from the
`WorkShowPresenter` (existing code is left in place, to be deprecated and
removed as follow-up).

[`iiif_manifest`](https://www.rubydoc.info/gems/iiif_manifest/1.0.1) expects a
"presenter", with `#manifest_url`, `#manifest_metadata` `#description`,
`#sequence_rendering`, and `#work_presenters`/`#file_set_presenters`. since we
know the presenter provided will be used for the duration of a single request,
we can make heavy use of instance variables to cache children and avoid making
multiple requests for any particular data.

the rewrite also does some work to help clarify the two peices of information
the presenter needs in the request context: the `Ability` and the hostname to
use for URIs within the manifest. we provide naive defaults for both of these,
and implement the behavior more or less without regard for their
context---i.e. the whole presenter infrastructure works without them. when
calling from a Controller, we pass them in explicitly to augment the behavior
for the context. this is important because we may (do) want to reuse these
presenters in other contexts (a Job for cache pre-warming) where this data isn't
available.

the presenters themselves use `Draper::Decorator`'s `delegate_all`, so they're
basically glorified wrappers for a model. they don't care whether the model is a
SolrDocument, or powered by a primary database. a small amount of work is done
to align Valkyrie, ActiveFedora, and SolrDocument-style models, but mostly
things were already interoperable.

a refactor in `DisplaysImage` rearranges some instance-level caching there, for
easier reuse in the new presenter infrastructure.

the short story is: between this and a previous optimization in
`DisplaysImage` (#4344), manifest generation in a makeshift controlled
benchmarking environment sped up 10-fold for works of modest size. it seems
roughly linear between 10 and 750 child FileSets. i'm not confident enough in
that environment (especially its behavior WRT caching) to call this "data" or to
provide anything more solid than this statement, but there's good reason to
believe this will be quite a bit faster/more scalable than the status-quo.
this job hits the manifest cache to prewarm; at least for now it's a (fast) no-op if
there is already something in the cache for the key.

reconstruct the cache keys to use `etag`. this isn't perfect, but it correlates
more closely to object changes than the solr `_version_` field, and avoids worry
about cases where a `SolrDocument#[]` lookup on `_version_` is `nil` (which are
common). for AF models, `#etag` is populated as long as the model isn't a
`#new_record?`.

we also add a prefix for the cache key, this is meant to be specific to the
caching implementation. if we change the implementation in the future in a way
that would benefit from ignoring old cache entries, we can simply change or
increment the prefix. this also serves as a namespace, which should avoid cache
key collisions.
@jeremyf jeremyf merged commit 28a5e1f into 2.x-stable Jul 8, 2020
@jeremyf jeremyf deleted the iiif-manifiest-backport branch July 8, 2020 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants