Skip to content

The easiest way to set up a cloud data science environment

Notifications You must be signed in to change notification settings

mikekosk/Urchin-Data-Cloud

Repository files navigation

Urchin Labs Data Cloud

I often felt limited by my computer's processing power and memory capacity to run large data science problems. However, building a data science environment in the cloud (and installing all the packages I needed) was time consuming and difficult. I wished there was a way to have access to powerful computational resources with only a single click.  This one-line script will spin up an Amazon EC2 instance and transform it into a data science environment that is no different than working within your local Jupyter notebook.

As a default, the script installs and configures:

  • Jupyter Notebook 5.0.x
  • Conda Python 3.x environment
  • pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh
  • Conda R v3.3.x and channel
  • plyr, devtools, shiny, rmarkdown, forecast, rsqlite, reshape2, nycflights13, caret, rcurl, and randomforest pre-installed
  • The tidyverse R packages are also installed, including ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, lubridate, and broom

Adding additional packages is as easy as editing the Docker file that comes with this project. Even if you're not interested in data science, this is a great way to try creating and hosting a Docker container.

Installation

See Install Guide

Quick Run

Building your instance and configuring it is done entirely by running:

terraform apply

The function returns a shareable URL to your Jupyter notebook on the cloud!

User Guide

See User Guide

License

MIT

About

The easiest way to set up a cloud data science environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages