Machine learning classifiers, such as those found in tensorflow, are powerful tools for image classification. To test new models or newly parameterized models, the MNIST and Fashion-MNIST datasets are commonly used to to explore concepts in machine learning. The module below works through an example of machine learning in tensorflow using MNIST and MNISTFashion to compare model performance on two different datasets. In the future, additional models or datasets could also be added to this workflow to compare across more research situtations.
Inputs
data/mnist.txt
tells the system to use the MNIST dataset, accessed via the Keras librarydata/fashion.txt
tells the system to call Fashion MNIST, accessed via the Keras Libraryrun_main.py
which runs a sequence of python scripts and Jupyter notebooks
Outputs
<data>.html
: Jupyter notebook outputs<data>_model_results_summary.txt
: Summary of the model runs<data>_results_summary_plots.pdf
: Graphical summary of model results
snakemake
will take care of supplying the inputs to the python code and printing the results to /results
mounted on your Docker volume. So the above is just context for troubleshooting or extending the workflow.
Note: for these instructions you will need a Jetstream account. If you do not already have one, you can find instructions for setting up a trial account here
- Log in to Jetstream and click "Start a New Instance."
- Select the Ubuntu 18.04 Devel and Docker instance and press launch.
- Select an m1.xlarge instance in instance size, and then click "Launch Instance."
- Once the instance is "Active," go into the shell either through the Web Shell or via ssh. (Instructions are provided through the links)
-
Clone the github repo with the relevant dockerfile.
git clone https://github.com/cyber-carpentry/group2-machine-learning/ cd group2-machine-learning
-
Now you will run a shell script to build your docker image and create a volume to store your results.
source setup_docker.sh
There are multiple options for using the neural networks. We suggest starting with Option 1 for optimal reproducibility.
- Option 1: Run both mnist and fashion mnist datasets in parallel.
- Option 2: Run mnist or fashion mnist datasets on their own.
- Option 3: Explore the code via jupyter notebooks.
Run both mnist and fashion mnist datasets in parallel.
-
Enter the command below to run the docker image.
docker run --mount source=results,target=/home/jovyan/results -it sprince399/mlnotebook sh
-
Once you are in the shell, run the command below:
cd cyber-carpentry-group2-* snakemake
The neural network model and classifier has launched!
-
When they are finished, you will find the files summarizing the output and results of the model in the
/home/jovyan/results
folder within your container. -
To access the results outside of your container, first exit the container with the command below:
exit
To view the results, enter the command below (again, you should now be OUTSIDE of the container).
sudo cat ${MYVOLDIR}/fileyouwanttolookat
#for example
sudo cat ${MYVOLDIR}/mnist_model_results_summary.txt
- If you would like to move the results to your home folder on your Jetstream instance, follow the commands below.
sudo -i
cp ${MYVOLDIR}/* /home/
exit
Compare your results here!
-
Optional: To rerun the classifier inside the container, first delete the snakemake results with the command below.
snakemake some_target --delete-all-output
Run mnist or fashion mnist datasets on their own.
NOTE: If you are still in your docker container from the option 1 instructions, please exit now with the command below:
exit
-
Enter the command below to run the docker image. Do NOT change the username on the mlnotebook.
docker run --mount source=results,target=/home/jovyan/results -it sprince399/mlnotebook sh
-
Once you are in the shell, run the commands below. You can specify the dataset you would like to run by writing
mnist.txt
orfashion.txt
as the option after run_main.pycd cyber-carpentry-group2-machine-learning-* python run_main.py mnist.txt
The neural network model and classifier has launched! When they are finished, you will find the files summarizing the output and results of the model in the /home/jovyan/results
folder.
-
To access the results outside of your container, first exit the container with the command below:
exit
To view the results, enter the command below (again, you should now be OUTSIDE of the container).
sudo cat ${MYVOLDIR}/fileyouwanttolookat
#for example
sudo cat ${MYVOLDIR}/mnist_model_results_summary.txt
- If you would like to move the results to your home folder on your Jetstream instance, follow the commands below.
sudo -i
cp ${MYVOLDIR}/* /home/
exit
Compare your results here!
NOTE: If you are still in your docker container from the option 1 instructions, please exit now with the command below:
-
Enter the command below if you would like to explore the neural network model code via jupyter notebooks.
docker run -p 80:8888 sprince399/mlnotebook
-
You will be given a prompt to access a jupyter notebook. It should look like the example below. Do not follow the instructions provided in the command prompt. Go to Step 3.
To access the notebook, open this file in a browser: file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html Or copy and paste one of these URLs: http://(c26dc6be4e1a or 127.0.0.1):8888/?token=73258a96f4088f042c856a3f24f057be37b5da5d43067754
-
Instead of following the shell output above, go to your Jetstream instance home page. Copy the IP address from the Jetstream instance.
-
In a new browser window, enter the URL below. NOTE: you should fill in the <my.jetstream.IP> section with what you copied above.
http://<my.jetstream.IP>:80
You will get a prompt to enter the token provided in the last line of instruction #2. Now you can look at the code underlying the neural network models! Select any of the files to explore.
Dockerfile
Defines the Docker image build. This image is also set to autobuild on Docker Hub
Snakefile
Defines the snakemake
workflow and iterates over datasets
main.ipynb
Runs ``train.ipynb,
test.ipynb```, and ```output.ipynb```
model.ipynb
Builds the tensorflow models
output.ipynb
Produces output summary files
setup_docker.sh
Sets up the Dockerfile
test.ipynb
Tests the model
train.ipynb
Trains the model
run_main.py
Runs the main.ipynb
. This enables Running from snakemake
\archive
Archived example reports and project data.
\hooks
Enables version control on Dockerfiles
\data
Holds data input files.
\example_results
Example output results from running the model.