ArchivesSpace Data Auditor

Overview

This script is designed to run on a server with access to an ArchivesSpace installation. It runs a series of checks in the ArchivesSpace database, accessing data through the API, and exporting and evaluating EAD.xml files for content and syntax errors. The script then generates an Excel spreadsheet detailing where there are any areas for data cleanup. For more information about what data is checked, see Workflow.

Getting Started

Dependencies

lxml - Used to parse XML files for evaluating any XML syntax errors and parsing data from downloaded XML files
mysql - Used to import mysql-connector
mysql-connector-python - Used to connect and detect any connection errors to the ArchivesSpace MySQL database
openpyxl - Used to create and write an Excel spreadsheet to document data audit report
requests - Used to check URLs and get their status codes

Installation

Download the repostiory via cloning to your local IDE or using GitHub's Code button and Download as ZIP
Run pip install requirements.txt
Create a secrets.py file with the following information:
1. An ArchivesSpace admin username (as_un = ""), password (as_pw = "")
2. The URLs to your ArchivesSpace staging (as_api_stag = "") and production (as_api = "") API instances
3. The ArchivesSpace data_auditor account username (as_auditor_un = "") and password (as_auditor_pw = "")
4. Variables with their values set to user emails you want to send the report to
  1. sendfrom_email = "<send_from_email>"
  2. sendto_emails = ["<send_to_email>", "<send_to_email>", "<send_to_email>"]
  3. senderror_emails = ["<send_to_email>", "<send_to_email>"]
5. The email server from which you send your email report (email_server = "")
6. Your ArchivesSpace's staging database credentials, including username (as_dbstag_un = ""), password (as_dbstag_pw = ""), hostname (as_dbstag_host = ""), database name (as_dbstag_database = ""), and port (as_dbstag_port = "")
Run the script as python3 ASpace_Data_Audit.py

Script Arguments

Open the console of your choice and navigate to the project directory. Type python3 ASpace_Data_Audit.py to run the script. If you want to run the audit without emailing users of the result, add -t or --test, so python3 ASpace_Data_Audit.py -t. The testing functionality is still being developed and may not function properly.

Testing

There are a series of unittests that check various functions in ASpace_Data_Audit.py. They are still being developed and any test should be run with the -t or --test argument as listed in # Script Arguments

Workflow

Generate an Excel spreadsheet to use for our report
Begin running the audit. The audit checks for the following:
1. Any new controlled vocabulary terms for the following and highlights the row in red:
  1. Subject_Term_Type
  2. Subject_Sources
  3. Finding_Aid_Status_Terms
  4. Name_Sources
  5. Instance_Types
  6. Extent_Types
  7. Digital_Object_Types
  8. Container_Types
  9. Accession_Resource_Types
2. Any archival objects with component unique identifiers
3. Any top containers without barcodes
4. Any top containers without indicators
5. A list of all current users
6. Any archival objects with multiple top containers
7. Any archival objects with multiple digital objects
8. Any archival objects listed as level of description == collection
9. Any resources with EAD IDs
10. Any duplicate subjects
11. Any duplicate agent-persons
12. Any resources without Creator agents
13. Any XML syntax errors in exported EAD.xml files
14. Any broken URLs in EAD.xml exports
15. Any top containers not linked to any resources or archival objects
16. Any archival objects with "otherlevel" and "unspecified" level of description
Save the spreadsheet and send an email using email_users(). If an error is generated, send a message to specified user
Delete the spreadsheet and exported EAD.xml folder and files from the server - email if there is an error

Author

Corey Schmidt - Project Management Librarian/Archivist at the University of Georgia Libraries

Acknowledgements

Kevin Cottrell - GALILEO/Library Infrastructure Systems Architect at the University of Georgia Libraries
ArchivesSpace Community

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github		.github
.gitignore		.gitignore
ASpace_Data_Audit.py		ASpace_Data_Audit.py
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArchivesSpace Data Auditor

Overview

Getting Started

Dependencies

Installation

Script Arguments

Testing

Workflow

Author

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

uga-libraries/aspace_data_audit

Folders and files

Latest commit

History

Repository files navigation

ArchivesSpace Data Auditor

Overview

Getting Started

Dependencies

Installation

Script Arguments

Testing

Workflow

Author

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages