A project for NaNoGenMo 2022 to produce cut pages in the style of artists' books.
Given a URL of an IIIF manifest pointing at a series of scanned book pages, produce a new book that cuts out words revealing the pages beneath. (IIIF is an API and data format for describing image sequences for use in academia and research, and can be applied to individual documents, maps, books, or ephemera.)
The webapp can be used to view sample page output. It's preconfigured with an interesting book, or you can paste your own IIIF manifest. (I have only tried this with Harvard's IIIF content and server so your mileage may vary with other sources.)
The "official" NaNoGenMo entry is 99 pages produced from Boswell's Life of Johnson (26MB PDF). Boswell, James, 1740-1795. Boswell's Life of Johnson, extra-illustrated, 1464-1897. MS Hyde 76, vol. 2, pt. 1. Houghton Library, Harvard University, Cambridge Mass.
The title comes from output produced when cutting up an edition of Emily Dickenson's poetry (11MB PDF).
You probably don't want to bother!
To generate PDF book output, you'll need to install the project. It will use Vite to run the local webserver, and Playwright to run the browser automation to walk through an entire book and save each page as a screenshot.
TesseractJS OCR is used to identify the words on each page. You can tune the frequency of words and pages transformed in main.js
.
npm install
# Configure `playwright.config.js` to capture your IIIF manifest and tweak the PDF output
# Then run the browser automation
npm run test
Individual screenshots will end up in output/
. The simplest way to turn them into a PDF is with ImageMagick:
cd output/
convert *.png your-file.pdf
A large list of sample output is in examples/
.