-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valkyrie: Item contents snippet bug #769
Comments
Regarding your second point... The term for Alternative Title was previously used to store the generated slug. This is no longer the case, but we can't show the alternative title on the works because of this prior data. We could remove it entirely if you prefer, but it will require a migration to clean up the existing data if we leave it so you can use it in the future. This is why, for now, it is not included on the show views. |
@laritakr thanks for that explanation. It makes perfect sense, and it helps me to understand that I am not seeing a bug to be concerned about. I struck out that part of the ticket above. We don't need to modify at this time. |
cc @KatharineV for the acceptance criteria, should item contents not be displayed at all, or what do you expect to see? |
@ShanaLMoore The way the feature has worked on production previously is that the Item Contents only show in the catalog search after a keyword search, in which case they display with the keyword highlighted in context. Here's a visual from production after keyword search for "Duluth": And here's a screenshot of items in the catalog search when I just entered the catalog to browse with no keywords. You'll see there are no Item Contents fields showing. One side note: In the first screenshot, the third result is the only one showing item contents. The first two items have been split by Tesseract, so I would expect them to show Item Contents with keywords highlighted in context. Apparently there is a bug blocking them from displaying the field. So I want to mention that they aren't displaying as intended, but the third result is, so that's the one we'd want to emulate. Thanks! |
dev notesitems contents is the label for file_set_text_tsimv index field. locally it is not displaying in catalog search when there is a match. original implementation. knapsack may to be missing a few things?: scientist-softserv/adventist-dl@a5de938#diff-bd4eb77984b740347ff2aa902be664d1aa01addc73964897450d5bd3ff09b3c6 resources aren't using app indexer To confirm: are resources getting the following indexed? hyku has a iiif print helper but isn't including it anywhere. do we need it for the render_ocr_snippet method? does application controller need it and a rnder_ocr helper method? |
This commit will introduce the Hyku::Indexers::FileSetIndexer to add indexing logic for born digital PDFs when using PDF.js. We also change the works' indexing field to match the file sets' indexing field (all_text_tsimv). We also "valyrized" the logic in the HykuIndexing module to accomplish this. Ref: - scientist-softserv/adventist_knapsack#769
This commit will introduce the Hyku::Indexers::FileSetIndexer to add indexing logic for born digital PDFs when using PDF.js. We also change the works' indexing field to match the file sets' indexing field (all_text_tsimv). We also "valyrized" the logic in the HykuIndexing module to accomplish this. Ref: - scientist-softserv/adventist_knapsack#769
This commit will introduce the Hyku::Indexers::FileSetIndexer to add indexing logic for born digital PDFs when using PDF.js. We also change the works' indexing field to match the file sets' indexing field (all_text_tsimv). We also "valyrized" the logic in the HykuIndexing module to accomplish this. Ref: - scientist-softserv/adventist_knapsack#769
QA RESULTS: ✅ PASSAcceptance CriteriaEMPTY CATALOG SEARCHtested on STAGING
note: the blank values will be handled in another ticket #819 CATALOG SEARCH OCR MATCHtested on STAGING This work produced the following ocr: 2d-txt (2).txt
CATALOG SEARCH NO MATCHtested on STAGING I searched for RAINBOW |
blocked until resolution for error: |
This commit will add the index field for snippets onto the CatalogControllerDecorator so ADL can see snippets. We had to add this because we remove all the add index fields prior and only add select ones. That means we have to manually add this one. Ref: - #769
This commit will add the index field for snippets onto the CatalogControllerDecorator so ADL can see snippets. We had to add this because we remove all the add index fields prior and only add select ones. That means we have to manually add this one. Ref: - #769 <img width="1032" alt="image" src="https://github.com/user-attachments/assets/6d260506-0645-4ebf-ad4d-70b31c4ac2e7">
Appears to be working as expected in several staging tenants |
Team, I uploaded a set of PDFs to test the UV with a compound work, and this particular work is reverting back to the OCR in search results bug. I did a keyword search for "compound work" and this work came up, of course, because that's the title. However, the keyword match section is displaying a huge block of irrelevant text. As I understand the feature, it is supposed to show a restricted number of characters, so this string is too much to begin with. Second, the keyword match field should only show if there is a keyword match, and IF there is a match, THEN the search terms would show in the snippet and they would be highlighted. This work doesn't have useful OCR because the PDFs are handwritten. So, I know there is no chance that the OCR actually contains the words Compound and Work. That's why nothing is highlighted in the snippet. The behavior I would expect from this work is a) no keyword match field showing, OR b) keyword match snippet with fewer characters surrounding the highlighted search terms. |
Another one with an issue is this one: https://adl.s2.adventistdigitallibrary.org/catalog?utf8=%E2%9C%93&search_field=all_fields&q=20088972 It looks like when there is any match to the search term, it includes the snippet text, but it doesn't limit the length unless there are matches in the text itself. The snippet text should only be shown if there a match IN the text. |
TODO: update logic so that we dont show snippets when it's not supposed to show. |
dev notesThis example should show no snippets. I uploaded it locally. The string doesn't match anything in the snippet.
Relevant PR - supposed to return when no snippets. scientist-softserv/iiif_print#260
|
# Story Refs - #769 # Expected Behavior Before Changes Snippets didn't work correctly # Expected Behavior After Changes - [ ] Search on catalog page performs full text search and shows highlighted snippets. - [ ] Title and thumbnail urls carry the search terms through to the show pages and perform UV search automatically - [ ] Search with no highlighting opens show page normally. # Screenshots / Video <details> <summary></summary> ### Search on catalog page picks up terms in both full text and other metadata ![Screenshot 2024-10-25 at 5 30 51 PM](https://github.com/user-attachments/assets/717d5b0b-92d8-4573-8408-825d7305e86d) ### clicking on work automatically searches the UV ![Screenshot 2024-10-25 at 5 31 17 PM](https://github.com/user-attachments/assets/d1963379-041e-4a35-bf5b-169782044634) </details> # Notes
Summary
This ticket tracks discrepancies with existing works deposited prior to the Valkyrie sprint.
When I open the edit screen for a work, an alternative title is populated when the original metadata does not include anything. I honestly have no idea where the alt title is pulling from. I'm totally baffled. It doesn't show on the public view prior to opening the edit screen, where you magically find it in the metadata field. This is happening with existing works created through OAI (see example) and CSV Bulkrax imports. See and compare this example work: On Staging https://adl.s2.adventistdigitallibrary.org/concern/generic_works/20121716_james_white_to_dudley_canright_jul_13_1881/ and in the OAI Feed where it imported from and there is no alt title that I can see https://oai.adventistdigitallibrary.org/OAI-script?verb=GetRecord&metadataPrefix=oai_adl&identifier=20121716Acceptance Criteria
Screenshots or Video
Screenshot of a work with "Item contents" showing complete OCR in the catalog search, yikes haha
Testing Instructions
Testing note: the staging site was partially reindexed resulting in most existing records being updated but not all. Our assumption is that the cut over will handle the update of all works. Creating new works (or collections) should pass QA.
The works must be uploaded with UV turned on in order to get OCR processed.
Turn UV on in the tenant's feature settings within the admin dashboard
Create a work. Attach a multipage PDF. Wait for all of the jobs to complete.
Once the UV is loaded, scroll down to the items section. The file sets should have an ACTIONS drop down where you can select download txt file. You may have to click into the child work if you don't see it.
This should be a file of OCR words that you can search.
Pick a word and use it for a catalog search.
If there's a match, the word should be highlighted in a snippet of the catalog search results page.
Notes
Known remaining issues: #863
The text was updated successfully, but these errors were encountered: