Welcome! The following tutorial provides instructions for building an image collage using Juxta and a image information dataset generated through the Archives Research Compute Hub (ARCH) platform.
- Overview
- What is a Juxta?
- Considerations
- Pre-requisites
- Creating a Juxta Image Collage
- Feedback
- References
- License
We recognize the following work and contributions which have made this tutorial possible.
Toke Eskildsen is the creator of the Juxta shell script.
The following tutorial was collaboratively designed by Nick Ruest and Samantha Fritz. Many thanks to Ian Milligan for his testing and editorial support.
The Google Colab notebook was built by Nick Ruest, with datasets examples generated using the Archives Research Compute Hub (ARCH).
Web archive data is a rich source for studying the recent past. Web archives preserve a variety of information formats, including the full text of websites to image and video information, to network links among websites in a collection, and as such, offer a plethora of opportunities to explore web archive data.
The following tutorial will outline how to create a Juxta collage using an image dataset generated through the Archives Research Compute Hub (ARCH). By transforming image data, researchers have an opportunity to explore a web archive collection interactively.
Juxta is a shell script which generates a collage of images for display on a webpage. You can learn more about the Juxta script through its GitHub page: https://github.com/tokee/juxta.
Web archive data tends to be quite large and often outpaces the capacity of local storage on your computer. You may want to consider working with dedicated storage (HDD, SSD) or servers.
This tutorial was created using macOS.
To complete this tutorial, you will need three things:
- The ARCH “image information” derivative
- The “ImageMagick” package. To check if it is installed, run
convert -v
in your console. - The “jq” package. To check if it is installed, run
jq
in your console.
On Mac OS, both ImageMagick and jq can be installed using brew
brew install imagemagick
andbrew install ghostscript
for ImageMagickbrew install jq
for jq
Within ARCH, select a collection and generate a new dataset from the "File Formats" category called "Extract Image Information".
Once the dataset has been generated, click on "View Dataset" to navigate to the summary page. Scroll down to the download icon, right-click and select “copy link.” This will be the dataset URL needed for working with images in the notebook.
Create a copy of the Image Information Download Urls notebook via Google Colab.
Working from the copied notebook now, you will start off by changing the title. You may find it easiest to note the collection number in the title if you plan to work with multiple copies of the template.
There are a few cells that will need a change in information.
-
In the first cell, change the URL listed to the URL of the image information dataset we copied in the previous step. The curl command is used to transfer data to and from a server. In this case, the notebook calls out to the extracted image information dataset from ARCH.
-
In cell six, which identifies the Wayback URL, change the collection id to match the collection we are currently working with.
-
Finally, in the last cell, change the collection id in the CSV title. This title could be anything meaningful to you as a researcher, but we suggest maintaining consistency using the collection id.
Located at the top, click on the Runtime menu and select Run All. Alternatively, you can manually click on each play button. The pre-scripted actions in this notebook will ultimately generate a .txt file with formatted image URLs, which can then be used to fetch and download the images from this web archive collection using a single command line function.
In the right-hand pane, which is collapsed by default, click on the file folder icon and download the .txt file to either your desktop or a server.
Create a directory (folder) to house the recently downloaded .txt file. For this example, a directory called 13709Juxta was created on a local desktop.
Next, create a subfolder in the new directory and call it images.
Our example path to our main working directory looks like this:
/Users/fritz/Desktop/13709Juxta
Using your terminal window, navigate to the images directory. Then use the following command to download all the images from the text file to the images folder. This image directory will be used to create the Juxta collage.
wget --random-wait -i ../13709_image_urls.txt
NOTE: Downloading the images will take time! Do not close your terminal window.
Navigate to https://github.com/tokee/juxta. Use the URL provided under the green "Code" button to clone.
Clone Juxta within your main directory by using the following command
git clone https://github.com/tokee/juxta.git
Note your path to Juxta; for simplicity's sake in this example, we’ve cloned Juxta to our main working directory /Users/fritz/Desktop/13709Juxta
A .dat file is a “generic data file that contains important information about the program used to create the particular file.”1 For the purposes of generating a Juxta image collage, we will be converting the jpg image files downloaded from the replay URLs and redirecting the output 2 as a .dat file format.
From your terminal, navigate to be one directory above where the images are saved.
For instance: /Users/fritz/Desktop/13709Juxta
Run the following command to find the images and redirect the output to a .dat file
find images > images.dat
Next, will create all of the files and tiles needed to view the collage in a web browser.
We are creating a new directory for all of the files. You will need to make a few modifications to the command below:
THREADS=4 /Users/fritz/Desktop/13709Juxta/juxta/juxta.sh images.dat example
Here’s a quick breakdown of what this line of code does:
Code Snippet | Code Functionality |
---|---|
THREADS=4 |
You may need to change the number of threads. This example opts for four threads, with a total of 8 cores available. As this tutorial uses local computer storage, changing the threads ultimately means the laptop is used for processing JUXTA files but can continue to be used for other work. |
/Users/fritz/Desktop/13709Juxta |
Change to the path of where you’ve cloned Jutxa. |
example |
This will be the name of the directory in which Juxta formatted files are created. You can change this to whatever makes sense for you, but be sure to avoid spaces. |
Before launching a local web server, navigate to the example directory created with the command above.
cd example
Now that all the files have been created, we will use Python to serve files from a local directory via HTTP. This will allow you to display and explore the image collage through the web browser. Type the following into terminal:
python3 -m http.server
To launch the server, enter the local host address as a URL in a browser of your choosing.
Localhost:8000
As this is intended to be a stand-alone resource, please let us know how we can improve the experience of using this tutorial, through our feedback survey.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Footnotes
-
Otachi, Elsie. (2020). "How to Read and Open .DAT Files in Windows"https://www.online-tech-tips.com/computer-tips/how-to-open-dat-files/ ↩
-
"Using COmmand Symbols" https://sourcedaddy.com/windows-7/using-command-symbols.html ↩