This document contains an overview of the various post-processing scripts we use here in the AVLab at UCSB's SRC. Please see AVLab Utility Software List & Installation Instructions (on the wiki) for more info on the software used, dependencies, installation instructions, etc. The scripts described here build off of those software functionalities.
- Intro
- config
- makesomethings
- [library scripts](#library scripts)
- avlab
- National Jukebox / PHI
- Tips
- hashmove
We use the microservices structure here, which breaks down complicated programs and processing steps into small, discreet, modular scripts, which can then be daisy-chained to fit particular applications.
To run these scripts, type python [path of python file] [arguments]
Every script can also be run with ypthon script.py -h for more info
The scripts rely on a configuration file named microservices-config.ini. Because this file contains passwords, and because UCSB IT policy doesn't allow for .jar files (like BFG) to run, and because I don't actually know how to use git-filter, it's no longer tracked in this repo. The version here is thus out of date and you'll need to make the necessary changes.
For UCSB users, the "canonical" config file can be found at S:/avlab/microservices/microservices-config.ini
the make-scripts are kind of the atomic units of our microservices. They work on single files and are very dumb but effective.
This script is used to generate barcode files that can be printed by our Zebra barcode printers. It makes a temporary file in your current directory. Then, for each side of a record, it asks for the title of the content, then the barcode: cusb_[label-abbrev][issue-number][copy(optional)][matrix-number][take-number]. Used for patron requests mostly. Once done, follow steps outlined in wiki doc Printing Barcodes
Takes an input file, generally an archival master or raw-broadcast capture, inserts 2s fades, drops bitrate to 44.1k/16bit, embeds ID3 metadata if source txt file is present.
Has flags for fades (-ff), national jukebox names (-nj), stereo (-s), normalize (-n)
Takes an input string which is the canonical name for our digitized objects [a1234, cusb_col_a12_01_5678_00] and the transaction number from Aeon to which this DIP is linked. Transcodes from source objects if necessary, hashmoves them to DIP directory, zips DIP directory in anticipation of upload via FTP to Aeon server.
Has flags for startObject (-so), transactionNumber (-tn), mode (--tape for tapes, --disc coming soon), zip (-z) to make a .zip file
Takes an input file, generally a broadcast master, transcodes to 320kbps mp3. Embeds ID3 tags if present (either in source file or in sidecar txt file). Embeds png image of "Cover Art" if png or tif present in source directory.
Takes an input video file, and outputs a compressed xml file that can be read by QCTools. It has to transcode it to a raw video format first so this script takes some time and processor space and is generally run Friday afternoon over a week of new captures, and runs into the weekend.
Takes an input file and stream-copies reversed slices of 600 seconds, then concatenates the slices back together. ffmpeg's areverse loads a whole file into memory, so for the large files we deal with we need this workaround.
makesip takes an input folder path and uses our rules to create a SubmissionInformation Package (SIP)
Takes a canonical name for an asset, e.g. a1234, cusb_col_a1234_01_5678_00, and returns a full path to an instance of that object. The instance is prioritized for transcoding in the following order: broadcast master, archival master, un-tagged archival master, access mp3.
Slices preservation and access transfers of videos with more than 1 asset on the tape (eg. vm1234 also contains vm5678 and vm9101). Takes no arguments but you have to edit the in and out points in a list in the script, as well as corresponding vNums. Needs to be better, OpenCube editing interface is crappy and Premiere doesn't accept our preservation masters so.....
Still in development, this script makes a youtube-ready video for a digitized 78rpm disc, based on metadata in DAHR
These scripts are basically collections of functions that are invoked by other scripts.
config parses the microservices-config.ini file and returns a dictionary object that is accessible via dot notation, e.g. conf.cylinders.new_ingest rather than conf['cylinders']['new_ingest']. conf is declared globally in every script, this module is imported as rawconfig and conf = rawconfig.config() actually makes the conf dictionary.
ff does everything ffmpeg-related. principally, it uses the provided dictionary objects to construct strings that can be sent to ffmpeg. it also describes a framework ("ff.go") which runs the supplied ffmpeg string and returns true if successful, the error if unsuccessful
logger is the handler for all logging functions. Generally, it makes logs for each script's full execution, linking sub-scripts to their parents via PIDs. logs are named "user-pid-timestamp.log"
mtd is short for "metadata" and it's the handler for all FileMaker and catalog access. It implements pyodbc an and SRU string to get metadata from external sources, or to insert metadata into FileMaker.
util is the handler for all utility functions. Two noteworthy ones are: desktop() - which returns the current user's desktop folder; and dotdict(dict) - which takes a regular dictionary and makes it accessible via dot notation, e.g. conf.cylinders.new_ingest rather than conf['cylinders']['new_ingest']
These processes deal with all of the audio objects created in the AVLab, not part of the Jukebox
Post-processing for magnetic-tape materials. Gets metadata from Audio Originals FileMaker db
Deletes sidecar files that Wavelab generates
makes a list of files to process based on FileMaker outputs (see staff wiki)
Deletes silences of longer than 30s
Embeds bext info, data chunk md5
hashmoves it to the repo directory, greps the output of hashmove and, if successful, deletes raw capture and assocaited txt files
Post-processing of cylinders and the creation of derivative files. Gets metadata from Cylinders FileMaker db.
Post-processing for discs that are not part of NJ process. Bundles makemp3 and makebroadcast with configurable filepaths. Used for post-processing of patron requests, mostly, saves files to a QC directory on R:/. Has dependencies for ffmpeg, ffprobe, bwfmetaedit
post-processing for archival master video files. bundles makeqctoolsreport; makes a PBCore2.0 compliant xml file of technical metadata.
Here's the scripts we use to process materials for the National Jukebox/ PHI digitization project
This script processes the archival and broadcast master files we make for the National Jukebox. For more info on how files are created, see Digitizing 78s for The National Jukebox on the wiki
The processing steps are:
delete the bs sidecar files that Wavelab creates
ending in .bak .gpk .mrk
makebroadcast on the raw-broadcast-master files, inserting fades and renaming for nj
here's what that command would look like if you typed it out for each file
python makebroadcast.py [full path to input file] -ff -nj
hashmove the now-processed broadcast masters to the qc directroy/ purgatory
here's what that command would look like if you typed it out for each file
python hashmove.py [full path to input file] [full path to qc directory + /discname]
use bwf metaedit to embed MD5 hashes into the MD5 chunk
here's what that command would look like if you typed it out for each file
bwfmetaedit --MD5-Embed [full path to input file]
hashmove the now-processed archival masters to the qc directory/ purgatory
here's what that command would look like if you typed it out for each file
python hashmove.py [full path to input file] [full path to qc directory + /discname]
This script is used exclusively during disc label imaging and is called by FileMaker. It is run in conjunction with NJ Workflow DB's "discimg-in" script. For more info on the process of capturing disc label images, see Disc Imaging for National Jukebox
It takes arguments for a barcode / filename (which Filemaker provides)
The processing steps are:
do a string substitution for "ucsb" to "cusb
corrects a long-standing programming error in FM
using the filename provided, check that it doesn't already exist in the capture directory
return an error if it does
sort the capture dir by file creation date
rename the most recently created file to the barcode/filename provided
walk through the capture directory and check to see that no files exist with their raw-capture names
this indicates that we missed scanning a barcode
return an error if true
This script is used to process an intermediate set of disc label digital images (created with Adobe DNG converter). For more info, see Disc Imaging for National Jukebox
It takes no arguments.
The processing steps are:
for each file in the intermediate directory
call GraphicsMagick to crop, rotate, adjust ppi, and save as tif in a new folder
here's what that command would look like if you typed it out for each file
gm convert [full path to input file] -crop 3648x3648+920 -density 300x300 -rotate 180 [full path to output .tif]
hashmove the raw-capture, dng intermediate, and broadcast tif to the qc directory/ purgatory
here's what that command would look like if you typed it out for each file
python hashmove.py [full path to input file] [full path to qc directory + /discname]
Start -> search "regedit" -> double-click "regedit.exe"
HKEY_CURRENT_USER -> Software -> Microsoft -> Command Processor
right click on the big empty space and select "new key"
type "Autorun" and hit enter
right-click "Autorun" in the regedit window and select "Edit"
type "cd/ d [path to repo directory]"
By doing this, you are set to open the cmd window in the directory with all the scripts so you don't have to type their full paths
file movement for people wihtout rsync
General Usage
python hashmove.py [source file or directory full path] [destination parent directory full path] [flags]
to move a file
python hashmove.py C:/path/to/file.ext C:/path/to/parent/dir
to move a directory
python hashmove.py /home/path/to/dir/a /home/path/to/dir/b
to copy a file
python hashmove.py C:/path/to/file.ext C:/path/to/parent/dir -c
log the transfer
python hashmove.py /home/path/to/dir/a /home/path/to/dir/b -l
verify against another hash or set of hashes
python hashmove.py "/home/path to/dir/you question" /home/path/to/dir/with/hashes -v