Skip to content
This repository has been archived by the owner on Mar 29, 2023. It is now read-only.

StarfishStorage/bagit-python

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bagit-python

Build Status

bagit is a Python library and command line utility for working with BagIt style packages.

Installation

bagit.py is a single-file python module that you can drop into your project as needed or you can install globally with:

pip install bagit

Python v2.4+ is required.

Command Line Usage

When you install bagit you should get a command line program called bagit.py which you can use to turn an existing directory into a bag:

bagit.py --contact-name 'John Kunze' /directory/to/bag

You can pass in key/value metadata for the bag using options like --contact-name above, which get persisted to the bag-info.txt. For a complete list of bag-info.txt properties you can use as commmand line arguments see --help.

Since calculating checksums can take a while when creating a bag, you may want to calculate them in parallel if you are on a multicore machine. You can do that with the --processes option:

bagit.py --processes 4 /directory/to/bag

To specify which checksum algorithm(s) to use when generating the manifest, use the --md5, --sha1, --sha256 and/or --sha512 flags (MD5 is generated by default).

bagit.py --sha1 /path/to/bag
bagit.py --sha256 /path/to/bag
bagit.py --sha512 /path/to/bag

If you would like to validate a bag you can use the --validate flag.

bagit.py --validate /path/to/bag

If you would like to take a quick look at the bag to see if it seems valid by just examining the structure of the bag, and comparing its payload-oxum (byte count and number of files) then use the --fast flag.

bagit.py --validate --fast /path/to/bag

And finally, if you'd like to parallelize validation to take advantage of multiple CPUs you can:

bagit.py --validate --processes 4 /path/to/bag

Python Usage

You can also use bagit programatically in your own Python programs. To create a bag you would do this:

import bagit
bag = bagit.make_bag('mydir', {'Contact-Name': 'John Kunze'})

make_bag returns a Bag instance. If you have a bag already on disk and would like to create a Bag instance for it, simply call the constructor directly:

import bagit
bag = bagit.Bag('/path/to/bag')

If you would like to see if a bag is valid, use its is_valid method:

bag = bagit.Bag('/path/to/bag')
if bag.is_valid():
    print "yay :)"
else:
    print "boo :("

If you'd like to get a detailed list of validation errors, execute the validate method and catch the BagValidationError exception. If the bag's manifest was invalid (and it wasn't caught by the payload oxum) the exception's details property will contain a list of ManifestErrors that you can introspect on. Each ManifestError, will be of type ChecksumMismatch, FileMissing, UnexpectedFile.

So for example if you want to print out checksums that failed to validate you can do this:

import bagit

bag = bagit.Bag("/path/to/bag")

try:
  bag.validate()

except bagit.BagValidationError, e:
  for d in e.details:
    if isinstance(d, bag.ChecksumMismatch):
      print "expected %s to have %s checksum of %s but found %s" % \
        (e.path, e.algorithm, e.expected, e.found)

To iterate through a bag's manifest and retrieve checksums for the payload files use the bag's entries dictionary:

bag = bagit.Bag("/path/to/bag")

for path, fixity in bag.entries.items():
  print "path:%s md5:%s" % (path, fixity["md5"])

Development

% git clone git://github.com/LibraryOfCongress/bagit-python.git
% cd bagit-python
% python test.py

If you'd like to see how increasing parallelization of bag creation on your system effects the time to create a bag try using the included bench utility:

% ./bench.py

License

cc0

Note: By contributing to this project, you agree to license your work under the same terms as those that govern this project's distribution.

Packages

No packages published

Languages

  • Python 100.0%