This repository contains course materials for BIOS 611 (Introduction to Data Science) typically taught during the Fall Semester at UNC Chapel Hill in the Department of Biostatistics.
The intent of the course is to provide an intensive introduction to the technical material and skills that a data scientist needs in order to do repeatable, reliable research.
It covers basic linux tools like bash and make, Docker, git (extensively) and serves as an introduction to R and Python including how one goes about organizing a research project and an R or Python library.
Along the way we will become informally familiar with some analytical techniques: classification, regression and clustering. The emphasis here is practical: how to use the methods while avoiding common pitfalls.
Class is at 3:35 pm - 4:50 pm on MW. There is a lab session from 2:00 pm to 3:00 pm on Tuesdays.
Class is held in: McGavran-Greenberg PH-Rm 2308 Lab is held in: McGavran-Greenberg PH-Rm 2308
Date | Course Title | Material | Homework |
---|---|---|---|
Mon 08/18/20201 | Introduction | 1,2 | hw1 due: Wed 08/25/2021 |
Mon 08/23/2021 | Compute Resources | 1,2,3 | hw2 due: Mon 08/30/2021 |
Wed 08/25/2021 | Unix | 1,2,3 | hw3 due: Wed 09/08/2021 |
Mon 08/30/2021 | Docker | 1,2,3,4 | hw4 due: Wed 09/15/2021 |
Wed 09/01/2021 | git basics & github basics | 1,2,3,4 | hw5 due: Mon 09/20/2021 |
Mon 09/06/2020 | Labor Day 🍞🌹 | 1,2 | |
Wed 09/08/2021 | How to Think about Programming & R | 1,2 | hw6 due: Wed 09/27/2021 |
Mon 09/13/2021 | More R | 1,2 | |
Wed 09/15/2021 | Tidyverse for Tidying & GGPlot | 1,2,3,4,56 | |
Mon 09/20/2021 | Make and Makefiles | 12 | |
Wed 09/22/2021 | git concepts and practices | 123 | |
Mon 09/27/2021 | Project Organization | 123 | |
Wed 09/29/2021 | ~~~~ | ||
Mon 10/04/2021 | Dimensionality Reduction | 1234 | hw7 due: Mon 10/11/2021 |
Wed 10/06/2021 | Clustering | 1234 | hw8 due: Wed 10/13/2021 |
Mon 10/11/2021 | Classification | 1234567 | hw9 due: Mon 10/18/2021 |
Wed 10/13/2021 | Model Validation and Selection | 12 | |
Mon 10/18/2021 | Shiny | 123456 | hw10 due: Mon 10/25/2021 |
Wed 10/20/2021 | Introduction to Scientific Python | 12 | hw11 due: Wed 10/27/2021 |
Mon 10/25/2021 | SQL (and pandas, dplyr) | 123 | |
Wed 10/27/2021 | Pandas & SQL | 1[2] | hw12 due: Wed 11/03/2021 |
Fri 10/29/2021 | Mid Term Project Review | ||
Mon 11/01/2021 | SKLearn Introduction | ||
Wed 11/03/2021 | Training Neural Networks | ||
Mon 11/08/2021 | Bokeh | ||
Wed 11/10/2021 | Browser Based Visualization w/ d3 | 12 | |
Mon 11/15/2021 | Data Science Ethics | 12 | |
Wed 11/17/2021 | Panel Discussion | ||
Mon 11/22/2021 | Web Scraping | 1 | |
Wed 11/24/2021 | Feedback Day | ||
Mon 11/29/2021 | Class Presentations I | ||
Wed 12/01/2021 | Class Presentations II | --- | |
--- |
There is also a lab held every Tuesday. This will be generally unstructured time where you will be able to work on projects and ask me questions. Sometimes we will use this time to cover material.
I provide a Docker container which you can use to hack on these lectures and the associated materials. Some lectures may have their own Docker container. But to work on most of them:
./start-env.sh
This will start an RStudio Instance.