Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #946: IIIF Manifest can Retrieve hOCR from a sibling media via Media Use tag #949

Merged
merged 10 commits into from
Jun 15, 2023

Conversation

alxp
Copy link

@alxp alxp commented Jun 12, 2023

GitHub Issue: (IIIF manifests should get hOCR from related media with a given tag #

Related PR: #945

What does this Pull Request do?

Allows IIIF Manifest to retrieve hOCR from a sibling media based on its media use ta or any other Views relationship

What's new?

Currently hOCR can be retrieved from a field on the same media as the canvas image. This PR removes the assumption that both fields come from the same media.

  • Does this change add any new dependencies? No
  • Does this change require any other modifications to be made to the repository No
  • Could this change impact execution of existing code? No

How should this be tested??

From a new instance of the starter site, e.g. 'make starter_dev' in ISLE-DC:

  1. Back up your existing site config if you want to return to it with 'make config:export', then copy the files in codebase/config/sync to a temporary folder.
  2. Run composer require islandora/islandora_mirador "islandora/islandora:dev-946-hocr-media as 2.7.x-dev". If composer gives you trouble you may need to chmod -R u+w web/sites/default and / or generate a GitHub API token. Alternatively you can just check out this issue branch manually.
  3. I've attached a zip file with configs for setting up what you need to test this change. Import the attached configs:
    config-import.zip
    3.1. Unzip the file into your codebase folder so it is accessible inside ISLE-DC, e.g., to codebase/config. The files will be in a folder called 'config-import'./
    3.2. Inside Docker's Drupal image, run the command: drush config:import --partial --source=/var/www/drupal/config/config-import
  4. create a new term in the Media Use taxonomy with URL set to https://discoverygarden.ca/use#hocr (likely to be the future one we use.)

We should now be able to test hOCR generateion:

  1. Add a Repository Item with Model 'Paged Content'
  2. Upload one or more TIFFs with text on them as children, using Model 'Page', media type 'File' and media use 'Original File'.
  3. You can monitor hOCR derivative generation with the logger, docker compose logs -f hypercube
  4. After derivatives are generated you should see 'hOCR Extracted Text' media as part of the items on the Media tab.

Next test the IIIF Manifest:

  1. Go to Admin > Structure > Views and edit the IIIF Manifest view
  2. Look at the relationships to see how an hOCR media is retrieved based on its relationship to a term and thus associated with the media that has the canvas image. Take a minute to admire how powerful Views is.
  3. Append '/manifest' to the URL of the Paged Content node you created earlier. This should print a manifest including "SeeAlso" entries where the hOCR URLs are included.
  4. The node page itself should include a Mirador viewer with the Text Overlay plugin enabled. Text should be slectable. You can turn this off via Mirador's UI in the top-right. If the text selection buttons don't appear, try clearing Drupal's cache.

image

The part of this PR that is new is the Views Plugin setting. The rest is already in Islandora.
The exported views config linked above is a good starting point for a starter site example.

Documentation Status

  • Does this change existing behaviour that's currently documented?
  • Does this change require new pages or sections of documentation? README updated
  • Who does this need to be documented for?
  • Associated documentation pull request(s): ___ or documentation issue ___

Additional Notes:

Any additional information that you think would be helpful when reviewing this
PR.

Interested parties

Tag (@ mention) interested parties or, if unsure, @Islandora/committers

@rosiel
Copy link
Member

rosiel commented Jun 13, 2023

Views already lets you do this by adding a relationship to Content (via field_media_of) then creating another relationship to a media (via field_media_of) which you can then filter on to have a specific taxonomy term. Then you can expose the field containing the text (likely field_media_file or something) using the previous "Structured OCR Data file field" mechanism.

The reason I'm less hyped about this is that it introduces a dependency on the so-called islandora object model, making it unuseable by anyone who isn't linking parent nodes with child media in our method. Before this PR, the code was virtually drupal-ready, meaning we can expose some of our awesomest modules - Openseadragon and Mirador - to other drupal sites - gaining visibility, interest, usage, and potentially assistance on them.

@rosiel
Copy link
Member

rosiel commented Jun 13, 2023

It would be nice for islandora to provide views plugins that got such things as:

  • child media with selected media use
  • sibling media with selected media use
    or, to make things even easier,
  • source file field for child/sibling media with selected media use

(and implement them separately from the IIIF manifest view so they can be used in all views!)

@alxp alxp force-pushed the 946-hocr-media branch from 3449b59 to e9e52c7 Compare June 14, 2023 01:10
@alxp
Copy link
Author

alxp commented Jun 14, 2023

@rosiel is a certified Views genius. This change simplifies the PR quite a lot and keeps Islandora IIIF pretty generic and not tied to Islandora. Many thank.

@seth-shaw-asu seth-shaw-asu self-requested a review June 14, 2023 17:13
Copy link
Member

@seth-shaw-asu seth-shaw-asu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as advertised. We just need to remove the unused IslandoraUtils and this will be good to go. 👍

@seth-shaw-asu seth-shaw-asu merged commit da33118 into 2.x Jun 15, 2023
@rosiel rosiel deleted the 946-hocr-media branch June 15, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants