Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link between SE Publication and ORA Object not maintained (was: 2 ORA records linked to one SE object) #145

Open
mrdsaunders opened this issue Nov 28, 2019 · 11 comments

Comments

@mrdsaunders
Copy link
Collaborator

mrdsaunders commented Nov 28, 2019

We have two issues regarding a probable broken link between SE Publications and ORA Objects.

  • For 1012115, a duplicate record was created in ORA after re-deposit
  • For 1012237, it is not possible to update the record in SE, and there is no active connection between SE and ORA for this object (e.g. there are no harvest requests, even though the object appears in OAI-PMH)

For both issues, the likely cause is that a file was apparently deleted in ORA ('apparently' because the file was not removed, but the fileset id changed [1]). This appears to have severed the link between the SE Publication and the ORA Object, maintained by RT2.

We have not tried re-running the SE Migration tool to re-import a pubs-id/uuid pair into SE to re-establish the link.

[1] we had an issue yesterday where objects moved within the ORA system did not have the ids of their filesets preserved on transfer. This has been fixed in code an awaits deployment to ORA

@tomwrobel
Copy link
Collaborator

tomwrobel commented Nov 29, 2019

The issue regarding 1012237 | uuid_74db6297-62a1-41c4-9210-3915a0cb90ff on Oxris-QA

The publication 1012237 on Oxris QA appears to have a link to ORA object uuid_74db6297-62a1-41c4-9210-3915a0cb90ff. This is visible on the publication page. At some point, the title and file information have been harvested from ORA.

The record is present in ORA4-QA-SYMP, and is correctly listed in the OAI-PMH response page. Sword requests to the record return valid and correct responses.

However, we cannot harvest the ORA record in Elements. The manual update method (via the Publications page) does not update the record; requesting a full harvest does not update the record.
A full harvest was performed, but the ORA metadata source logs do not show any requests for uuid_74db6297-62a1-41c4-9210-3915a0cb90ff or 1012237. No request was queued for these objects. There is no error in the logs.

A search of every ORA/Hyrax Repotool2 log from the last three months reveals no entry for either uuid_74db6297-62a1-41c4-9210-3915a0cb90ff or 1012237 in any log. There are entries for 1012230, 1012231, 1012232, 1012234, 1012235 and 1012239.

The ORA binary file was, for a time, deleted on ORA-SYMP due to an update error. It is possible that Oxris attempted to harvest the record, only to discover that there was no file present, and took some form of action. There is no record of this that I can find. The object has now been fixed and there is now a file attached to the record - although the status of 'complete (file available)' is probably not correct.

The obvious solution is to re-deposit a file and see if that re-establishes the link. However, I don't want to try that in case that transparently fixes the issue (for this record) without us understanding what is happening here. (edited) [Note: it appears that a duplicate record is created in ORA]

@tomwrobel tomwrobel changed the title 2 ORA records linked to one SE object Link between SE Publication and ORA Object not maintained (was: 2 ORA records linked to one SE object) Nov 29, 2019
@tomwrobel
Copy link
Collaborator

tomwrobel commented Nov 29, 2019

Mentions of 1012115 in the logs

./hyrax notes 2019-11-28.txt:2019-11-28 14:53:01 [ORA synchroniser]: Publication record 2934572 (ORA uuid_06a27c8c-bb1d-40f8-adfd-a0762c38d6cb): Updated within Publication 1012115.
./hyrax notes 2019-11-28.txt:2019-11-28 14:53:25 [ORA synchroniser]: Publication record 2934771 (ORA uuid_2485bb48-d745-403b-acb7-e8de984f0065): Injected into Publication 1012115.

Mentions of uuid_06a27c8c-bb1d-40f8-adfd-a0762c38d6cb in the logs

./hyrax warnings 2019-11-28.txt:2019-11-28 14:53:01 [ORA synchroniser]: Record uuid_06a27c8c-bb1d-40f8-adfd-a0762c38d6cb [from Api]: ISBN-10 field:: failed to parse valid ISBN-10 from "0-19-852663-6-TEST".
./hyrax warnings 2019-11-28.txt:2019-11-28 14:53:01 [ORA synchroniser]: Record uuid_06a27c8c-bb1d-40f8-adfd-a0762c38d6cb [from Api]: ISBN-13 field:: failed to parse valid ISBN-13 from "978-3-16-148410-0-TEST".
./hyrax notes 2019-11-28.txt:2019-11-28 14:47:42 [ORA synchroniser]: Publication record (ORA uuid_06a27c8c-bb1d-40f8-adfd-a0762c38d6cb): Enqueued for fetch.
./hyrax notes 2019-11-28.txt:2019-11-28 14:53:01 [ORA synchroniser]: Publication record 2934572 (ORA uuid_06a27c8c-bb1d-40f8-adfd-a0762c38d6cb): Updated within Publication 1012115.

Mentions of uuid_2485bb48-d745-403b-acb7-e8de984f0065 in the logs

./hyrax warnings 2019-11-28.txt:2019-11-28 14:53:25 [ORA synchroniser]: Record uuid_2485bb48-d745-403b-acb7-e8de984f0065 [from Api]: AUTHORS field:: failed to parse valid identifier from "sso:admn3983". No matching identifier scheme exists in Elements..
./hyrax notes 2019-11-28.txt:2019-11-28 14:47:42 [ORA synchroniser]: Publication record (ORA uuid_2485bb48-d745-403b-acb7-e8de984f0065): Enqueued for fetch.
./hyrax notes 2019-11-28.txt:2019-11-28 14:53:25 [ORA synchroniser]: Publication record 2934771 (ORA uuid_2485bb48-d745-403b-acb7-e8de984f0065): Injected into Publication 1012115.

@tomwrobel
Copy link
Collaborator

For 1012115:

  • The elements harvest looks at ALL objects in OAI-PMH, and then processes them.

  • If there is a map in the 'pid list', it uses that

  • If there is an ORA object in the OAI-PMH feed not in the pid list, then it runs a crosswalk and sees if there's anything that will help it to link that record to an SE publication

  • pubs-id OR doi are both maybe enough to do this, fuzzy title matching also applies

  • Given the date (a record from the 27th), I suspect that a record from a previous test re-activated in some way. Maybe something that had been sitting in a review queue for a while

@tomwrobel
Copy link
Collaborator

1012237 | uuid_74db6297-62a1-41c4-9210-3915a0cb90ff

  • redepositing file
  • new record created in ORA
  • we fix this by splitting and then removing

@tomwrobel
Copy link
Collaborator

@AndrewBennet action points

  • Look into improving logging of manual updates using the publications page
  • Why a record or file been seen as deleted blocks elements from taking future action

@AndrewBennet
Copy link
Collaborator

@thomas-wrobel

I have reproduced the behaviour where clicking the full text tab will not attempt to re-fetch an item which the system previously thought was deleted. This is questionable behaviour; next I will look into whether we could adjust this. It's too late to make it into our next release, though, which is out next Thursday.

However, I did observe that items are brought back into a non-deleted state after a harvest was run. (At least, that was the case for DSpace and EPrints repositories which I tested on). I wonder, could you trigger some change on the item in ORA, so that it gets picked up in the next request for changes to the OAI-PMH endpoint? Alternatively, you could trigger the Full Harvest (there should be a button to do so in the data source configuration page). Then, let's see whether the deleted item gets woken up.

@tomwrobel
Copy link
Collaborator

tomwrobel commented Dec 2, 2019

@AndrewBennet

"I have reproduced the behaviour where clicking the full text tab will not attempt to re-fetch an item which the system previously thought was deleted. This is questionable behaviour"

Is there any chance of a patch release to fix this in 5.18? I realise this might be easier to ask for than to do :)

Re: bringing an item into a non-deleted state - dp you want me to:

  1. Manually create a Pubs record
  2. Deposit into Hyrax
  3. Delete object in Hyrax
  4. Manually update the Pubs record in the publications page (triggering the deleted behaviour)
  5. Re-instate the object in Hyrax, listing it in the OAI-PMH endpoint
  6. Run a full harvest

@tomwrobel
Copy link
Collaborator

tomwrobel commented Dec 2, 2019

@tobypitts tests to run (we can do this without WeiHsi's updates)

  • Deposit a file via SE, delete file in Hyrax, what happens after harvest to Pubs record? File disappears in pubs record.
  • Reinstate file in Hyrax, what happens after harvest to Pubs record? File reappears in pubs record
  • Attach new file to record, what happens to pubs record after harvest? File appears in pubs record, publication can be record successfully
  • Deposit a file via SE, break object in Hyrax so default record is returned, what happens after harvest to Pubs record?

Error recorded, record is unchanged:

Screenshot from 2019-12-02 16-04-35-cropped

  • Fix above record in Hyrax, what happens after harvest to Pubs record?

Object updates and the link is fixed

  • Deposit a file via SE, delete object in Hyrax, what happens after harvest to Pubs record? pubs record shows as deleted

@AndrewBennet
Copy link
Collaborator

Tom - your numbered sequence sounds ideal for testing this behaviour. That's essentially the same sequence I ran through (albeit against DSpace and Eprints repositories, since I don't have a test Hyrax), and the harvest did indeed bring the item back into a non-deleted state.

I'll discuss with the team about changing the behaviour of the Full Text tab click - though there may be some technical complications around that. We're also coming up to the final release of the year (5.19 - this Thursday), so we probably won't be able to assess the feasibility of the change until after then.

@tomwrobel
Copy link
Collaborator

TEST ONE: bringing an item into a non-deleted state

  • Manually create a Pubs record: 1012274
  • Deposit into Hyrax: uuid_2844157d-4668-4df3-9c6d-4328fdae1408 (title: '2019-12-02 Test re-instating object connection in SE')
  • Delete object in Hyrax
  • Manually update the Pubs record in the publications page (triggering the deleted behaviour) [see screenshot]
    Screenshot from 2019-12-02 14-52-30
  • Re-instate the object in Hyrax, listing it in the OAI-PMH endpoint (with changed title, now 'Test title'
  • Run a full harvest with fixed object but broken fileset (link between objects fixed)
  • Run harvest with fixed object and binary file (link between objects fixed)

@tomwrobel
Copy link
Collaborator

@AndrewBennet

I think we've narrowed this down now. So long as there's something in the harvest that enables the two objects to reconnect, we can fix a deleted link. But we need the harvest to do that.

I'll leave this open until your update regarding making the deletion of the connection less 'touchy', but I'm moving to post-release development if that's OK with you (given when your update will arrive)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment