Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a way of specifying the order by which SIP should be ingested in order to support AIP updates via the ingest process #6

Open
jmaferreira opened this issue Sep 26, 2018 · 5 comments
Assignees
Labels
DILCIS review Issues that need to go for discussion within the DILCIS Board meetings feature request This issue is a feature which will be implemented further on. Used together with a milestone.
Milestone

Comments

@jmaferreira
Copy link
Contributor

jmaferreira commented Sep 26, 2018

In scenarios where SIP updates/replacements are a reality, the order by which the SIPs are applied over the existing AIP is important.

The SIP should be able to identify the AIP and its version/revision so that a verification can be made during ingest.

Ingesting SIPs in the incorrect order will render considerable different results.

@jmaferreira jmaferreira self-assigned this Sep 26, 2018
@jmaferreira jmaferreira changed the title Order of SIP updates How to check the Order of SIP updates Sep 26, 2018
@jmaferreira jmaferreira changed the title How to check the Order of SIP updates Have a way of specifying the order by which SIP should be ingested Dec 19, 2018
@jmaferreira jmaferreira added this to the Unplanned milestone Dec 19, 2018
@jmaferreira jmaferreira removed their assignment Dec 19, 2018
@jmaferreira jmaferreira added the DILCIS review Issues that need to go for discussion within the DILCIS Board meetings label Mar 13, 2019
carlwilson added a commit that referenced this issue Sep 11, 2019
@luis100
Copy link

luis100 commented Dec 23, 2020

@carlwilson why is this issue closed? we believe this is still an issue.

Current strategy in RODA is to have the SIP version number, where the create would be version 1, the first update version 2, the sencond update version 3. The AIP should also have the version number. When ingesting a SIP update, the version number must be exactly equal to the AIP version plus one. As this cannot currently be defined by SIP and AIP metadata, we are defining this in custom descriptive metadata and enforcing these rules in the ingest procedure.

@jmaferreira jmaferreira reopened this Nov 10, 2022
@jmaferreira
Copy link
Contributor Author

This issue has been discussed on the DILCIS board. It has been agreed that there should be instructions about the UPDATE process both on the SIP and the AIP.

@luis100 Can you provide a few paragraphs about how the SIP needs to be changed to support AIP Updates and what needs to happen at the AIP level so that the AIP spec is updated as well?

The decision to include this change will be discussed on the next DILCIS Board meeting.

@jmaferreira jmaferreira changed the title Have a way of specifying the order by which SIP should be ingested Have a way of specifying the order by which SIP should be ingested in order to support AIP updates via the ingest process Nov 10, 2022
@luis100
Copy link

luis100 commented Nov 22, 2022

Currently, the E-ARK SIP specification defined RECORDSTATUS as a way to define how multiple deliveries should be interpreted by the repository.

RECORDSTATUS (string/O): Specifies the status of the METS document. It is used for internal processing purposes.

SIP3: Package status metsHdr/@RECORDSTATUS
A way of indicating the status of the package and to instruct the OAIS on how to properly handle the package. If not set, the expected behaviour is equal to NEW.

The metsHdr is also used to indicate the type of behaviour to be expected from the OAIS when processing a particular SIP. For example, one might indicate that an SIP should be used to "replace" a particular AIP in the repository or that an SIP is meant for "testing" purposes and therefore it should not create an AIP at the end of the ingest process (see attribute metsHdr/@RECORDSTATUS).
From E-ARK SIP specification v2.0

The values include:

  • NEW: A new delivery.
  • SUPPLEMENT: Extends the previous delivery.
  • REPLACEMENT: Replaces a previous delivery.
  • TEST: A test delivery. No AIP should be created.
  • VERSION: A delivery with same content regarding files but one or more files have a new version.
  • DELETE: An order from the Producer to remove an existing AIP.
  • OTHER: Status not in list.

Although is not clear how the SUPPLEMENT and VERSION should affect the AIP, it is usual for such a feature to be needed in production systems, specially when the content transferred to the archive continues to live in the production system. New updates, for example to descriptive metadata, need to be carried on to the AIP accordingly.

But, when a series of "deliveries" of SIPs related or affecting the same AIP, it is of the out-most importante to know if we have all the deliveries and if they are submitted in the correct order.

For example, in a case where a record was defined as ready to be transferred to the archive in the production system, and there was a first "NEW" delivery, then it received an update to the descriptive metadata and set again are ready to be transferred to the archive, creating a new "VERSION" delivery, but again the descriptive metadata was updated, spawning a new "VERSION" delivery. It is important to ensure that the first "VERSION" delivery was applied before the second "VERSION" delivery, or we will end up with the wrong descriptive metadata version. It is also important that we ensure when we are applying the second "VERSION" delivery on top of the first "VERSION" delivery as we might receive the first one later on.

The same can be stated on more complex use cases that would use "SUPPLEMENT", "REPLACEMENT", "VERSION" and "DELETE" deliveries.

The recommendation is to add an additional field or attribute named "SUBMISSION_NUMBER", where a "NEW" delivery will always get the "SUBMISSION_NUMBER=0" (default), and any following deliveries will need to increment this number.

The AIP should have a record of all submissions incorporated into it and possibly some additional information of what was affected, for example:

  • Submission number
  • Submission record status (from the RECORDSTATUS vocabulary)
  • Files added, updated or deleted. (or we can delegate this to PREMIS events)
  • Date of incorporation of the submission (or we can delegate this to PREMIS events)
  • Agents (human or machine) involved in the incorporation (or we can delegate this to PREMIS events)

@jmaferreira
Copy link
Contributor Author

jmaferreira commented Dec 15, 2022

During the DILCIS Board (2022-12-15) it has been stated that:

  • FGS has the same concept. Its is called PACKAGENUMBER.
  • Archivematica also has a similar concept on their SIP.

Action:

  • @luis100 - The SIP has a creation date and a modification date. @luis100 Could that work as a way to determine the order of ingest?

Decision:

  • This suggestion will remain open until it has more endorsement from the community.

@jmaferreira jmaferreira added the feature request This issue is a feature which will be implemented further on. Used together with a milestone. label Sep 4, 2024
@jmaferreira jmaferreira added this to the Unplanned milestone Sep 4, 2024
@jmaferreira jmaferreira assigned jmaferreira and unassigned luis100 Nov 29, 2024
@jmaferreira
Copy link
Contributor Author

jmaferreira commented Dec 4, 2024

Sequencing in E-ARK SIPs for ingest

Overview

This proposal addresses the need to ensure the correct sequence of Submission Information Packages (SIPs) affecting the same Archival Information Package (AIP). Current mechanisms in the E-ARK SIP specification, specifically the RECORDSTATUS attribute, do not provide sufficient support for guaranteeing correct ordering when multiple deliveries of SIPs are processed. This change request proposes the addition of a new field, SUBMISSION_NUMBER, to resolve this limitation.

Current Specification Context

The RECORDSTATUS attribute in the metsHdr element is currently used to define the status of a SIP and its intended behaviour within the OAIS environment. The possible values are as follows:

  • NEW: A new delivery.
  • SUPPLEMENT: Extends the previous delivery.
  • REPLACEMENT: Replaces a previous delivery.
  • TEST: A test delivery; no AIP should be created.
  • VERSION: A delivery with updated versions of files but no new content.
  • DELETE: An order to remove an existing AIP.
  • OTHER: A status not explicitly listed.

While RECORDSTATUS is a useful mechanism, it lacks the capability to handle scenarios involving sequencing of deliveries, which is critical for ensuring data consistency in cases where multiple updates are applied to the same AIP. For example:

  • A VERSION delivery might not be applied in the correct order, leading to outdated data being incorporated.
  • Complex workflows involving SUPPLEMENT, REPLACEMENT, or DELETE deliveries require precise sequencing to maintain integrity.

Proposed Change

To address this gap, it is recommended to introduce a new attribute, SUBMISSION_NUMBER, which explicitly defines the sequence of SIPs related to the same AIP. This mechanism ensures that:

  1. The order of deliveries is explicitly recorded and enforced.
  2. All related deliveries can be verified to ensure completeness and consistency before processing.

Implementation Details

  • Definition:

    • SUBMISSION_NUMBER (integer, mandatory for non-NEW deliveries): Specifies the sequential order of the SIP within a series of deliveries.
    • For the initial NEW delivery, the SUBMISSION_NUMBER is implicitly 0 (default).
    • For subsequent deliveries (SUPPLEMENT, VERSION, REPLACEMENT, etc.), the SUBMISSION_NUMBER must increment by 1.
  • AIP Incorporation:

    The AIP must maintain a record of all incorporated submissions, including:

    • Submission number.
    • Submission RECORDSTATUS.
    • Files added, updated, or deleted (preferably tracked via PREMIS events).
    • Date of incorporation (via PREMIS events, if applicable).
    • Agents involved in the submission (via PREMIS events, if applicable).

Justification

  • Accuracy: Ensures that SIP deliveries are applied in the correct order, avoiding potential data corruption or loss.
  • Auditability: Provides a clear record of all submissions and their effects on the AIP.
  • Interoperability: Supports robust and predictable behaviour in diverse OAIS-compliant systems.

Potential Impact

  • Specification Compliance: Minor changes required to the E-ARK SIP specification to include the SUBMISSION_NUMBER attribute.
  • System Changes: Archives must adapt their ingestion pipelines to validate and enforce SUBMISSION_NUMBER sequencing.
  • Metadata Storage: Increased metadata requirements within AIPs to store submission records.

Next Steps

  1. Approve the addition of SUBMISSION_NUMBER to the E-ARK SIP specification.
  2. Define validation rules for SUBMISSION_NUMBER in SIP deliveries.
  3. Update AIP handling procedures to incorporate submission sequencing metadata.
  4. Communicate changes to stakeholders and update relevant documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DILCIS review Issues that need to go for discussion within the DILCIS Board meetings feature request This issue is a feature which will be implemented further on. Used together with a milestone.
Projects
None yet
Development

No branches or pull requests

3 participants