Skip to content

Latest commit



366 lines (243 loc) · 6.47 KB

File metadata and controls

366 lines (243 loc) · 6.47 KB

Intro to Data Science and Machine Learning

@amitkaps | @bargava







See the world through a data lens

"Data is just a clue to the end truth"

-- Josh Smith

Data Driven Decisions

"Science is knowledge which we understand so well that we can teach it to a computer. Everything else is art"

-- Donald Knuth

Data Science is an Art

Hypothesis Driven Approach


"An approximate answer to the right problem is worth a good deal"


"80% perspiration, 10% great idea, 10% great output"


"All data is messy."


"I don't know, what I don't know."


"All models are wrong, but some are useful"


"The goal is to turn data into insight"

"Doing data analyis requires quite a bit of thinking and we believe that when you’ve completed a good data analysis, you’ve spent more time thinking than doing."

-- Roger Peng

Python Data Stack

Case Studies

Day 1

Peeling the Onion

Time Series Analysis

Day 2


Market Basket Analysis / Collaborative Filter

Day 2

BanK Marketing

Random Forest and Gradient Boosting

Day 3


Text Analytics

Learning Approach

Do the Exercises

Pair up & Learn

Call for Help

Enjoy the workshop

Workshop Material is available at the Github Repo


1. Time Series Exercise

"Predict the number of tickets that will be raised in the next week"

  • Frame: What to forecast? At what horizon? At what level?
  • Acquire, Refine, Explore: Do EDA to understand the trend and pattern within the data
  • Models: Mean Model, Linear Trend, Random Walk, Simple Moving Average, Exp Smoothing, Decomposition, ARIMA
  • Insight: Share the insight through a datavis of the models

2. Text Analytics Exercise

"Identify the entity, features & topics in the 'Comments' data or 'Twitter #machine learning' data"

  • Frame: What are the comments you are trying to understand?
  • Acquire, Refine, Explore: Do Wordcloud, Lemmatization, Part of Speech Analysis, and Entity Chunking
  • Models: TF-IDF, Topic Modelling, Sentiment Analysis
  • Insight: Share the insight through word cloud and topic visualisation




  • Toy Problems
  • Simple Problems
  • Complex Problems
  • Business Problems
  • Research Problems


  • Scraping (structured, unstructured)
  • Files (csv, xls, json, xml, pdf, ...)
  • Database (sqlite, ...)
  • APIs
  • Streaming


  • Data Cleaning (inconsistent, missing, ...)
  • Data Refining (derive, parse, merge, filter, convert, ...)
  • Data Transformations (group by, pivot, aggregate, sample, summarise, ...)


  • Simple Vis
  • Multi Dimensional Vis
  • Geographic Vis
  • Large Data Vis (Bin - Summarise - Smooth)
  • Interactive Vis

Model - Supervised Learning

  • Continuous: Regression - Linear, Polynomial, Tree Based Methods - CART, Random Forest, Gradient Boosting Machines
  • Classification - Logistics Regression, Tree, KNN, SVM, Naive-Bayes, Bayesian Network

Model - UnSupervised Learning

  • Continuous: Clustering & Dimensionality Reduction like PCA, SVD, MDS, K-means
  • Categorical: Association Analysis

Model - Advanced /

  • Time Series
  • Text Analytics
  • Network / Graph Analytics
  • Optimization

Model - Specialized

  • Reinforcement Learning
  • Online Learning
  • Deep Learning
  • Other Applications: Image, Speech


  • Narrative Visualisation
  • Dashboard Visualisation
  • Decision Making Tools
  • Automated Decision Tools

PyData Stack

  • Acquire / Refine: Pandas, Beautiful Soup, Selenium, Requests, SQL Alchemy, Numpy, Blaze
  • Explore: MatPlotLib, Seaborn, Bokeh, Plotly, Vega, Folium
  • Model: Scikit-Learn, StatsModels, SciPy, Gensim, Keras, Tensor Flow, PySpark
  • Insight: Django, Flask





fit fit fit

fit fit fit


Resources - Statistical Learning

Resources - Time Series

Resources - Text Analytics

Online Course

  • Harvard Data Science Course - CS 109 Course (It is structured in similar way to the approach we shared)
  • Data Science Specialisation - JHU Data Science (It is a good course, though the material is coded in R)

- Many more on Coursera & Udacity...

We enjoyed the workshop!

Speak to Us!

Thank you

@amitkaps | @bargava