This project is part of the Transfer Digital Records project. It is a prototype for an application that a Digital Archivist might use to export files from S3 once a transfer has been finalized.
The full export has several steps:
- Download files from S3, and create a directory of files to export
- Zip the files
- Upload the encrypted file to a different S3 bucket
You can run the steps separately, or run them together with Docker.
Configure your AWS credentials in the ~/.aws/credentials
file. The download step will use this
configuration to authenticate requests to S3.
Set the mandatory environment variables in the command line or in IntelliJ:
GRAPHQL_SERVER
: The hostname of the API, e.g.http://localhost:8080
in developmentGRAPHQL_PATH
: The path of the GraphQL API endpoint, e.g.graphql
in developmentCONSIGNMENT_ID
: the database ID of the consignment to export
Then run sbt download/run
.
By default, this will download the contents of a specific S3 bucket to a temporary directory, and create a BagIt bag in another temporary directory.
You can also set some optional environment variables to configure the download:
INPUT_BUCKET_NAME
: name of the S3 bucket to download files fromINPUT_FOLDER_NAME
: name of the parent S3 folder (defaults to the consignment ID)FILE_DOWNLOAD_DIR
: the local folder to download files toBAG_DIR
: the local folder to save the BagIt bag to
Use tar
to create a .tar.gz file:
tar -zcvf name-of-output-file.tar.gz /path/of/directory/to/zip
Run:
ARCHIVE_FILEPATH=/path/of/file/to/upload \
sbt exportZip/run
setting the ARCHIVE_FILEPATH
variable to the file to be uploaded.
You can also set the S3 bucket to upload the file to in an optional parameter: EXPORT_BUCKET
.
-
Build the jar files with
sbt clean assembly
-
Build the image with
docker build . --tag exportfiles
-
Run the Docker image, setting environment variables:
docker run \ --env ACCESS_KEY_ID=your_aws_key_id \ --env SECRET_ACCESS_KEY=your_aws_secret_key \ --env GRAPHQL_SERVER=https://graphql-api-hostname.amazonaws.com \ --env GRAPHQL_PATH=some/api/path \ --env CONSIGNMENT_ID=1234 \ --env EXPORT_BUCKET=name-of-s3-bucket \ exportfiles:latest
You can also set
INPUT_BUCKET_NAME
andINPUT_FOLDER_NAME
to specify the S3 bucket and folder to download.