Skip to content

Manifests

Eric Lopatin edited this page Apr 22, 2022 · 21 revisions

There are a variety of ways to submit digital objects to Merritt. The method you choose will depend on the nature of the digital objects and how many objects you have to submit. One option is to use a manifest to add either complex objects consisting of many files or a large batch of objects. A manifest is a simple pipe-delimited text file containing basic information about the files being submitted. It enables you to post objects to a web server and use the Add Object page to submit the manifest that points to them. Once the manifest is submitted, Merritt accesses the URLs to download and ingest each object.

Using an online manifest tool, this guide will walk you through how to prepare manifest files to submit objects to Merritt, either individually or in batches.

The most common workflow for creating a manifest is:

  1. Create a CSV with columns according to the type of manifest you would like to submit. The columns allow you to add information about each object that will be ingested, including a URL, file name, checksum, etc.
  2. Convert the CSV into a manifest file (a .checkm file) that is then submitted to Merritt.

You can also use a text editor to create manifest files if you prefer. Details are included in the Tips for Creating Text-based Manifests section of this guide. The Batch Manifest Specification for Merritt is available as a PDF.

Why Use a Manifest?

IF YOU HAVE: THE BEST SUBMISSION OPTION IS:
Just a few simple objects, each consisting of a single file Upload directly from the Add Objects page in Merritt's user interface
Just a few complex objects, each consisting of multiple files (these may include metadata files Create a container file (.zip or .tar) for each object, then upload each one directly from the Add Object page. When you upload a .zip or .tar file, the object's component files will be extracted and made accessible through the Display Object and Display Version pages.
OR
Create an object manifest file, then upload the manifest from the Add Object page. When you use an object manifest, every component file must be posted on a web server; the manifest must include each file's URL.
A large number of either simple or complex objects Create a batch manifest file, then upload that file from the Add Object page.

A batch manifest can point to either single-file simple objects, container files, or to object manifest files--but not to all three. If you have all of these, you will need to create three manifests--one for each type of object.

All of the files in your manifest must be posted on a web server; the manifest must include each file's URL.

For a single object: There are two reasons you might prefer to use an object manifest to submit a single object to Merritt:

  1. You have checksum values for every file in a complex object, and you would like each file-level checksum to be verified by Merritt upon ingest.
  2. You have a complex object consisting of many files, but do not have access to a utility to create a .zip or .tar container file.

For a batch of objects: If you have a large number of objects to submit, it may be more efficient to use a batch manifest. If each object consists of many files, you can create container (.zip or .tar) files for each object, post those container files to a web server, then create a batch manifest with URLs for the container files. You will also be able to supply metadata about each object in the batch manifest.

A batch manifest will be an especially good option if you already have information about the objects available in a spreadsheet format, or can easily export a spreadsheet report from another system.

Merritt Manifest Creator and Validator

To assist with the creation and validation of manifests, CDL provides an online manifest tool that allows you to accomplish several tasks:

  • Validate an existing .checkm manifest file
  • Import a CSV and generate a .checkm manifest file
  • Create a manifest file from a CSV template
  • Create a manifest CSV from a list of URLs

Validate an existing .checkm manifest file

This option allows you to validate the contents of an existing checkm manifest file.

  1. Select the "Validate CheckM Manifest" radio button.
  2. Click the Load Your Own CheckM File button.
  3. Locate a checkm file on your local computer to import.
  4. Click the Parse Manifest button. A CheckM Analysis pane opens and displays validation line items. A status is recorded for each line: Pass, Warn or Error. Any warnings or errors will call out a line number in the checkm file to inspect. If desired, click to open the subsequent "CheckM Data" pane to view a subset of rows from your checkm file in table format.

Import a CSV and generate a .checkm manifest

This option assumes you are familiar with the configuration of a CSV that has the correct columns for the type of manifest being created. To use this option:

  1. Select the "Import Manifest CSV and Generate CheckM Manifest" radio button.
  2. Click the Load Your Own CSV File button.
  3. Locate the CSV file on your computer and click Open. The contents of your CSV file is displayed in the dialog.
  4. Click Parse CSV. The content of a corresponding Checkm file is displayed.
  5. Click Parse Manifest. The manifest file content is analyzed. If no errors are encountered, up to 20 rows will be displayed in a table. If errors are found, they will be highlighted in red and include descriptive information about the nature of the issue(s).
  6. Optionally, click to open the "CheckM Data" pane to view your manifest's data.
  7. Click Download CheckM. A link will be provided that enables you to download the final checkm file.
  8. Once you have the .checkm file on your local computer, log in to Merritt and navigate to the collection you wish to submit new content to.
  9. Click the Add Object tab and then Choose File to select your new manifest.
  10. Once it is loaded, click the Submit button. Merritt will return a batch identifier (this ID is referenced in the ingest notification messages you will receive). Merritt then proceeds to download each file or container noted via URL in your manifest and ingests the associated content into the repository.

Create a manifest from a CSV template

There are four primary types of manifest. This option in the tool allows you to download a template for each type:

  • Object manifest: Ingest multiple files into a single object.
  • Single file batch manifest: Ingest multiple files into Merritt – each file becomes an individual object.
  • Container batch manifest: Ingest the contents of one or more container file (.zip or .tar) into Merritt – the contents of each container becomes an individual object.
  • Batch manifest, otherwise know as a "manifest-of-manifests" – Ingest the contents of one or more checkm files – one object is created with files referenced in each checkm. Click the link for the CSV template you would like to download. Reference the following Manifest Workflows sections for assistance with adding information to each CSV template. Once you completed filling out the template, make sure to delete line 2, which is only present for the purpose of indicating which values are required or optional.

Create a CSV from a list of URLs

This option allows you to simply copy a list of URLs that lead to your objects on a web server and paste them into the tool. It will then generate a CSV that can subsequently be made into a checkm manifest.

  1. Select the "Create Manifest CSV from a List of URLs" radio button.
  2. Choose one of the following two radio buttons: Ingest Manifest, or Batch Manifest. The Ingest Manifest option allows you to incorporate all files or containers at the URLs you provide into a single object. While the Batch Manifest option allows you to generate a manifest that creates a single object per URL.
  3. Paste your list of URLs into the URL file dialog.
  4. Click the Generate CSV for URLs button. A CSV's contents are created. Note how the file name is detected in each URL and added as a subsequent file name field. If you chose the Batch Manifest option, a title field is introduced as well. To add a title to the line item in the CSV, add a comma and title text after it to each line.
  5. Click the Parse CSV button. The contents of a checkm file are now generated and displayed in the next dialog. Note that the fifth line in either manifest type ("#%fields") declares the fields that can be filled out to add additional information to the manifest, such as a hash algorithm (md5, sha-256, sha-512) or a file size. Filling out these additional fields per line is optional.
  6. Click the Parse Manifest button to validate the manifest's contents.
  7. Click to open the Download CheckM pane, where a link is provided to download the final checkm manifest file.

Manifest Workflows

A Single Object

To ingest many files into a single object, start with the "mrt-ingest-template" CSV file that is made available via the "Choose CSV Template to Generate Manifest" option in the manifest tool. This CSV includes columns that allow you to specify required and optional fields.

  • Required fields: File URL, File Name
  • Optional fields: Hash algorithm ("md5" or "sha-256"), hash value, file size, file last modified date, MIME type Fill out the manifest template so each line entry in the CSV includes information about one of the files you would like to include in your digital object. Ensure the corresponding files are on a web server from which Merritt can download them. Once the CSV is complete:
  1. Use the "Import CSV" option of the manifest tool to import your CSV by clicking the "Load Your Own CSV File" button. Locate your CSV file and click Open.
  2. Click the "Parse CSV" button. The next pane in the manifest tool opens.
  3. Now click the "Parse Manifest" button. Your manifest file is analyzed. If the tool finds errors, these are displayed in a table.
  4. Optionally, click to open the "CheckM Data" pane to view your manifest's data.
  5. Then open the "Download CheckM" pane, where a link to your manifest is provided. Click the link to download your new .checkm manifest file.
  6. With the .checkm file on your local computer, log in to Merritt and navigate to the collection you wish to submit new content to.
  7. Click the Add Object tab and then Choose File to select your new manifest.
  8. Once it is loaded, supply object-level metadata such as title, creator etc.
  9. Click the Submit button. Merritt will return a batch identifier (this ID is referenced in the ingest notification messages you will receive). Merritt then proceeds to download each file or container noted via URL in your manifest and ingests the associated content into the repository.

A Batch of Single Files

To ingest a batch of files such that each file creates its own object, start with the "mrt-single-file-batch-manifest" CSV file that is made available via the "Choose CSV Template to Generate Manifest" option in the manifest tool. This CSV includes columns that allow you to specify required and optional fields.

  • Required fields: File URL, File Name
  • Optional fields: Hash algorithm ("md5" or "sha-256"), hash value, file size, file last modified date, Primary Identifier (enter an ARK if updating an existing object), Local Identifier, Creator, Title, Date

A Batch of Container Files

To ingest the contents of one or more container file such that each container file's content is expanded into its own object, start with the "mrt-container-batch-manifest" CSV file that is made available via the "Choose CSV Template to Generate Manifest" option in the manifest tool. This CSV includes columns that allow you to specify required and optional fields.

  • Required fields: File URL, File Name
  • Optional fields: Hash algorithm ("md5" or "sha-256"), hash value, file size, file last modified date, Primary Identifier (enter an ARK if updating an existing object), Local Identifier, Creator, Title, Date

A Batch of Object Manifest Files

To ingest the contents of one or more checkm manifest files such that each manifest creates its own object, start with the "mrt-batch-manifest" CSV file that is made available via the "Choose CSV Template to Generate Manifest" option in the manifest tool. This CSV includes columns that allow you to specify required and optional fields.

  • Required fields: File URL, File Name
  • Optional fields: Hash algorithm ("md5" or "sha-256"), hash value, file size, file last modified date, Primary Identifier (enter an ARK if updating an existing object), Local Identifier, Creator, Title, Date

More on Manifest Types

The Object Manifest

The object manifest contains a separate row for each file that is considered part of a single object. Each object worksheet or manifest should only include information about one object. Object components can include files of metadata pertaining to the object in any format (METS, marc etc.). The Object Manifest Specification for Merritt is available in a plain text file (to make columns and rows more clear).

The information you can provide in an object manifest is: fileURL | hashAlgorithm | hashValue | fileSize | filename

Only fileURL and fileName are required.

The hashAlgorithm column specifies what kind of checksum you are providing, if you have a checksum value for a component file in the object. (Accepted checksum algorithms are: Adler-32, CRC-32, MD2, MD5,SHA-1, SHA-256, SHA-384, and SHA-512). If you provide a hashAlgorithm, you must also provide a hashValue, and vice-versa. If provided, Merritt will validate any checksum values provided for each file. If the value provided does not match that value that Merritt calculates, the object submission will fail. You will be notified by email that the object was not submitted because the object did not pass a fixity check.

The fileSize column contains the file size in bytes, and can be left blank.

There are no columns for object-level metadata such as title, creator etc. These can be supplied when you upload the manifest by filling out the form on the Add Object screen, or by also submitting a batch manifest.

The Batch Manifest

The batch manifest contains a separate row for each object, and can contain only one row for any complex, multi-file object. Rows for complex objects should point either to container (.zip or .tar) files or to object manifest files.

The information you can provide in a batch manifest is:

fileUrl | hashAlgorithm | hashValue | fileSize | fileName | primaryIdentifier | localIdentifier | creator | title | date

Only fileURL and fileName are required.

The hashAlgorithm and hashValue columns refer to the checksum of whatever file is referenced in the fileUrl column. If you provide URLs pointing to object manifest files, the hashValue would be the checksum of the object manifest itself. The hashAlgorithm column specifies what kind of checksum you are providing, if you have a checksum value for a component file in the object. (Accepted checksum algorithms are: Adler-32, CRC-32, MD2, MD5,SHA-1, SHA-256, SHA-384, and SHA-512). If you provide a hashAlgorithm, you must also provide a hashValue, and vice-versa. If provided, Merritt will validate any checksum values provided for each file. If the value provided does not match the value that Merritt calculates, the object submission will fail. You will be notified by email that the object did not pass a fixity check.

The fileSize column is optional and is expressed in bytes.

primaryIdentifier is the identifier that Merritt uses to track the object. If you are submitting new objects, you will very likely not have a primary identifier. Primary identifiers must be ARK format identifiers. You will generally only use this column if you are using a manifest to edit an existing object. You will be able to see the Merritt-supplied primary identifier for any object in Merritt when you display it.

localIdentifier is any identifier you already use to refer to the object. You can provide multiple local identifiers by separating them with a semicolon. The contents of this column will be searchable in Merritt. You will also be able to edit the object by referring to the local identifier in a manifest. This identifier must be unique among all of the objects in all of your collections.

creator is the author or creator of the object itself. There are no format requirements for expressing named persons or entities. Merritt will display the creator exactly as entered. This column will be searchable in Merritt.

title is the title of the object itself. Merritt will display the title exactly as entered. This column will be searchable in Merritt.

date is the publication date of the object itself. If you provide the date in a standard excel format, it will be submitted to in Web UTC datetime format. If you enter a non-standard date format, the date will be submitted as plain text. Merritt will display the date exactly as entered. This column will be searchable in Merritt.

Special Considerations

  • If you are using either the spreadsheet or a text editor to produce a manifest, you cannot use a pipe (vertical bar) character "|" in any of your fields. If you need a pipe character to appear anywhere in an object record, replace it with: %7C

Tips for Creating Text-based Manifests

  • You can also use a text editor or create a script to write manifest files. You can use the sample batch and object manifests and simply edit the area for conveying rows of object information.
  • The contents of the manifest are listed on separate lines, with each column delimited by " | " [space] [pipe] [space]. Empty columns are indicated by a space between column delineators: " | | ". (There may be one or two blank spaces in the column).
  • The column heading text in the excel spreadsheet is slightly different than in the manifest files. The meaning and order of the columns is the same, but the headings in the text of a manifest file will begin with either "nfo:" or "mrt:" (example: nfo:fileUrl).
  • Merritt manifests contain placeholders for two columns that do not appear in the spreadsheet described above: nfo:fileLastModified and mrt:mimetype. These columns are not yet implemented in Merritt, but they do need to be included in any manifest that you type by hand, with the columns left empty.
  • Batch manifests must be identified as a batch of containers, batch of single files or a batch of object manifests. The profile line of the manifest identifies the type of batch:

Batch of container files:

#%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-container-batch-manifest

Batch of single files:

#%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-single-file-manifest

Batch of object manifests:

#%profile | http://uc3.cdlib.org/registry/ingest/manifest/mrt-batch-manifest

When you've finished editing a sample manifest, be sure to save it with a unique name and a ".checkm" file extension. In any batch manifest, empty cells up to the nfo:fileName column must be identified with " | | ". After the nfo:fileName column, empty cells should be identified ONLY if they are followed by another cell that has a value.

To validate the manifest, use the "Validate Checkm Manifest" option in the manifest tool.