The best way to learn is by doing. Improving at data science is an active process, and this repo focuses on resources for haptic learning. We are building up some good in-house tutorials on how to do particular things, but don't aim to reinvent lessons that are done well elsewhere on the web. Hence we mix notebooks checked in here with links to resources elsewhere.
- Watch this - it will change your life.
- Keep going. Feeling unhappy with your work is a normal and important part of learning something new. If you can stick with it, you will surprise yourself.
- Modeling and scikit-learn.ipynb
- Basics of loading a dataset (in this case, one included with sklearn) then creating and evaluating some out-of-the-box sklearn models.
- ROC Curve.ipynb
- What a ROC curve is, why it's useful, how to build one, and how to translate one to real-world concepts that non-engineers can understand.
- Proof that normalization matters.ipynb
- The result of an internal debate about the need to normalize data before fitting a regularized logistic regression model (so that the weights are properly traded off)
- http://pandas.pydata.org/pandas-docs/stable/visualization.html
- Plotting with pandas
- http://radimrehurek.com/data_science_python/
- End-to-end spam filter, beginning with raw data and exploratory analysis, progressing to overfitting, cross-validation, etc. Pandas and sklearn.
- http://filepi.com/i/88U7T4b
- Python for Data Analysis (free pdf book) - ipython notebook, pandas, numpy, matplotlib
- http://scikit-learn.org/stable/tutorial/index.html
- Tutorial on machine learning and scikit-learn