This repository was forked from the repository that accompanies Pat's tutorial on GNU Make. The original tutorial focuses on replicating an analysis performed by FiveThirtyEight to predict someone's age using their name. This project provides a parallel implementation in Snakemake, a Python-based workflow management system.
The Snakefile
in the master branch uses lots of Snakemake features that don't have parallels in Make. See the simple
branch for a version that more closely resembles the original Makefile
.
The analysis draws names from two sources within the Social Security Administration:
All dependencies are listed in config/env.yaml
. You can install them manually using your preferred package manager(s), or use conda
.
If you don't already have conda
installed, you can download either Anaconda or Miniconda -- your preference. Anaconda3 includes everything, while Miniconda3 is faster to install. Be sure to pick the Python 3 version of the installer for your OS.
Download & install for 64-bit macOS:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
Create an environment called predict-age
with the dependencies we need:
conda env create -f config/env.yaml
Or give the environment whatever name you want using the flag --name
or -n
. (If you give it a different name, be sure to modify the environment name in pbs-torque/pbs-jobscript.sh
.)
Activate the environment before running any code:
conda activate predict-age
See the conda
documentation for more information.