Export a repository object (node, media, files) #1096

Natkeeran · 2019-04-18T14:43:46Z

We need to be able to export a digital repository object fully for various uses cases including migration and preservation (AIP/Bags).

Ability to export Metadata (Repository Item) and bitstreams (media/files)
Option to include versions
Options to include derivatives
Options to pull in external/redirected bitstreams
Some logic to handle conceptual objects (i.e books, compounds) so that when we export a book, it exports all pages as well
*(Advanced: Pull in the full graph of a Repository Item. I.e if it has uris to subject, pull in that uri!)

Additional Info:

https://wiki.duraspace.org/display/FEDORA38/REST+API

We probably need a method to ingest the exported object as well.

mjordan · 2019-04-18T15:09:01Z

If a command-line tool external to Drupal is sufficient, try https://github.com/mjordan/islandora_bagger. —

…

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1096>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADCTTRFZX5UZMF2HT2MK4LPRCCKHANCNFSM4HG5UXHQ> .

rangel35 · 2019-04-24T18:01:41Z

What we do at UT-Austin in I-7 is use the bagging feature so our users can request bags for preservation purposes and offsite vaulting. we use both bagging via the interface for bags under 2g and for bags over 2g they get queued for drush processing and bagged overnight.

We provide the ability to bag ALL datatstreams and metadata of the object and for paged content it will also bag the "pages" and their datastreams and metadata

Our users have also requested the ability to bag selected datastreams

Natkeeran · 2019-04-25T13:41:28Z

UTSC has a similar use case and workflow as noted by @rangel35. A mods flag indicates which objects can be bagged. A report is generated with pids. The objects are exported via command line using drush.

(We have considered adding a premis event on bag creation. As it seemed to complicate the workflow, we did not implement that).

In islandora 7.x we bag the full atom zip (including versions) with archive context. In one of the storage locations, we aim to do validation of bags as well.

In 7.x, we run into problems exporting large objects or collections consistently, thus command line seems to work best.

Having the option to bag from UI and Islandora API is nice to have as we don't have a way to download the whole object right now.

Also, it would be ideal to have an option to ingest from a bag or another export format.

mjordan · 2019-04-25T14:06:59Z

Some preliminary thoughts on a Bagging microservice:

Take something like Islandora Bagger and put a REST interface on top of it.
From withing Islandora 8, a user chooses to Bag an object via the GUI, which POSTs a message to the microservice containing the node ID, which then creates the Bag like Islandora Bagger does now by fetching the various files, metadata, etc from Islandora via Islandora's REST interface. The module running in Drupal doesn't do any bagging, it just sends the request to create a Bag (and maybe exposes the results of the Bagging process back to the user, see next point).
On successful creation of the Bag, the microservice sends an email to the user containing the URL of the Bag to download (or some indication of where the bag can be found); or alternatively, the new Bag's URL is provided via the microservice's REST interface so it can show up in a Drupal View, etc.
The microservice would retain its command-line UI so it can be incorporated into automation scripts, etc.

Having a microservice separate from Drupal do the bagging would allow the jobs to run as long as they needed to, eliminating the risk of timing out in front of the user because the bagging is done asyncronously. We'd need to figure out how to allow for different Bag options, but those could possibly be sent as the REST POST request's body or something.

@Natkeeran with regard to ingest from a Bag, that is something that users have been asking for for a while. But, with Islandora 8's nice REST interfaces, we can probably figure out how to map the contents of a Bagged object back to the originating components of the node+media fairly easily and push it into Islandora using something like https://github.com/mjordan/claw_rest_ingester. I think using URIs to define what taxonomy terms should be assigned to the reingested object would be useful here as well.

dannylamb · 2019-04-25T19:34:11Z

@mjordan @Natkeeran I would love to see bags (or zipped bags, really), be the new zip importer format. I don't know how possible that is given how widely bags can vary, but it makes sense to move away from a bespoke format to a more widely adopted one.

Natkeeran · 2019-04-26T13:28:47Z

@mjordan @dannylamb

The feature set for microservice looks good. We can extended it later in the Drupal side to have a flag and queue/cron mechanism.

Ingest would be a neat addition, with use cases such as restore from backup, migration and batch ingest from zip. Having ingest from zip can theoretically be seen as bootstrapping Drupal from Fedora as well.

Some points to consider:

Exporting and importing the full graph is the major challenge. For example, a person has relationships to other people. I don't know enough graph theory to determine how to find the full graph, and how to avoid circular loops!
The second related challenge is persistent identifiers. What is our PID? If Drupal nid or taxonomy id the pid, then does Drupal allow us setting a PID. Do we want to support use case where people install Islandora 8 in an existing instance of Drupal! Do we have a persistent ID in Fedora?
Does Fedora or Drupal representation provide the logical representation of the full repository object similar to FOXML in 7.x? Maybe, via Portland Common Data Model? Though this adds a level of complexity, do we need such a representation (i.e METS) for preservation (OAIS AIP compliance) purposes?
We should be clear about how we are handling conceptual entities (i.e books, compound objects).

mjordan · 2019-04-26T13:56:02Z

@Natkeeran yes, those are all significant issues, but I see them as out of scope for the Bagit functionality. They are more data modeling issues, aren't they?

@dannylamb couldn't agree more. Even if an institution hasn't adopted Bagit widely, the tooling is decent and it is always easier to convert from a standard format than from a bespoke one, especially from a long-term preservation perspective (e.g. the platform tied to the bespoke format hasn't been in use in 20 years....).

rangel35 · 2019-04-26T16:19:11Z

I-8 creates a UUID couldn't we use that as the PID? or are you thinking more along the standard namespace type PID?

mjordan · 2019-04-30T15:01:29Z

In order for the creation of Bags to be truly decoupled from the Drupal module POSTing the request, we either need to issue the request using an asynchronous Guzzle call or using an asynchronous Javascript request, or do something on the microservice side that collects node IDs in a file and then runs as a batched cron job.

One advantage of the batch approach is that since the bagger would be running in a CLI environment, it wouldn't time out like it would if the bags were generated within an HTTP response.

mjordan · 2019-05-06T15:56:33Z

Did some work on Islandora Bagger over the weekend. It now has a REST API that lets you add a node ID and settings file to a queue. It also has a simple FIFO queue manager, and a console command to process the queue. The original CLI create_bag_ command still works as it used to.

The README explains how it works: as PUT requests like this come in:

curl -v -X POST -H "Islandora-Node-ID: 4" --data-binary "@sample_config.yml" http://127.0.0.1:8001/api/createbag

each request's node IDs is added to the queue, along with the path to the settings YAML file (which is the body of the request). In a cronjob, you would run the following to process the queue:

./bin/console app:islandora_bagger:process_queue --queue var/islandora_bagger.queue

which loops through the queue and runs the create_bag CLI command (it does this using internal Symfony methods):

./bin/console app:islandora_bagger:create_bag --settings=sample_config.yml --node=112

rosiel · 2020-05-09T19:44:40Z

The Robertson Library's RDM project uses Mark Jordan's Islandora Bagger and integration module.

We have a BagIt ansible role which installs our fork of islandora_bagger and of islandora_bagger_integration.

Natkeeran added the use case label Apr 18, 2019

Natkeeran changed the title ~~Export repository object (node, media, files)~~ Export a repository object (node, media, files) Apr 18, 2019

This was referenced Apr 28, 2019

Convert the code that does the bagging to a service mjordan/islandora_bagger#10

Closed

Add a REST interface mjordan/islandora_bagger#12

Closed

seth-shaw-unlv mentioned this issue Oct 16, 2019

Provide links to download all the child objects' original files #1310

Open

kstapelfeldt added Type: use case proposes a new feature or function for the software using user-first language. and removed use case labels Sep 25, 2021

This was referenced Oct 20, 2021

Use Case: Export Objects and Collections as BagIt #1934

Open

Use Case: Export objects via REST API #1936

Open

Use Case/Meta Issue: Accommodate standard Metadata schemas and file types for import and for export #1947

Open

kstapelfeldt added this to Islandora Issues Queue Feb 8, 2022

kstapelfeldt moved this to Todo in Islandora Issues Queue Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export a repository object (node, media, files) #1096

Export a repository object (node, media, files) #1096

Natkeeran commented Apr 18, 2019 •

edited

Loading

mjordan commented Apr 18, 2019 via email

rangel35 commented Apr 24, 2019

Natkeeran commented Apr 25, 2019

mjordan commented Apr 25, 2019 •

edited

Loading

dannylamb commented Apr 25, 2019

Natkeeran commented Apr 26, 2019 •

edited

Loading

mjordan commented Apr 26, 2019

rangel35 commented Apr 26, 2019

mjordan commented Apr 30, 2019 •

edited

Loading

mjordan commented May 6, 2019

rosiel commented May 9, 2020

Export a repository object (node, media, files) #1096

Export a repository object (node, media, files) #1096

Comments

Natkeeran commented Apr 18, 2019 • edited Loading

mjordan commented Apr 18, 2019 via email

rangel35 commented Apr 24, 2019

Natkeeran commented Apr 25, 2019

mjordan commented Apr 25, 2019 • edited Loading

dannylamb commented Apr 25, 2019

Natkeeran commented Apr 26, 2019 • edited Loading

mjordan commented Apr 26, 2019

rangel35 commented Apr 26, 2019

mjordan commented Apr 30, 2019 • edited Loading

mjordan commented May 6, 2019

rosiel commented May 9, 2020

Natkeeran commented Apr 18, 2019 •

edited

Loading

mjordan commented Apr 25, 2019 •

edited

Loading

Natkeeran commented Apr 26, 2019 •

edited

Loading

mjordan commented Apr 30, 2019 •

edited

Loading