README for daml
DAML - Lectures in Data Analytics and Machine Learning
Jupyter notebooks, material, code and data for lectures I give about the practice (and some theory) of Data Science in Python. This repository contains materials that may be presented in a different order, i.e. please do not rely on the repository naming convention for material ordering.
Text, code and examples that are contained within this repository have been heavily influenced by several other texts. While for my own lectures on Data Analytics and Machine Learning I use the (more-or-less) exact material in this repository, several other materials helped me build it. No single material in modern Data Science can be attributed to the effort of a single person, therefore I want to acknowledge the huge effort made by many researchers, teachers and presenters that allowed me to summarise this work into the shape seen in this repository. The main pieces use by me are:
- Think Stats, 2nd Edition - by Allen B. Downey
- Python Data Science Handbook - by Jake VanderPlas
- Data Analysis with Python and Pandas - by Bill Chambers
- Intro to Information Retrieval - by Manning, Raghavan and Schütze
- Statistics Done Wrong - by Alex Reinhart
These are by no means exhaustive, where a different reference is used, or gives extra information to the presented material; the reference is placed directly in the text.
Copyright (C) 2018 Michal Grochmal
This file is part of daml
The text, including code samples when used as presentation text and not
runnable code, is licensed under the Creative Commons Share Alike Non
Commercial 4.0 - CC BY-NC-SA 4.0 - license. See the COPYING-TEXT
file
for the full text of the license or read it as creative commons:
https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.
The code in this repository and any code samples in the notebooks is licensed
under the MIT license. The full text of the license can be found in the
COPYING-CODE
file. Or read on the open source initiative website:
https://opensource.org/licenses/MIT.
The Copyright of the data files in this repository is covered by several Open Data initiatives. Where several different licenses are covered, or similar to, public domain attribution; whilst others are licensed under very specific licenses that allow for non-commercial (or academic only) usage. This repository uses the data in an academic fashion but, in general, I'd advise against using this data for any non-academic purpose.
As of this writing the complications about Open Data policies would require me to track the use of every data provided to each user, which is inviable. Instead I point to the place from where the data can be retrieved from and advise that, if you want to use this data you download it from there.