title | duration | creator | ||||
---|---|---|---|---|---|---|
APIs / Web Scraping 101 |
3 hr |
|
DS | Lesson 4
After this lesson, you will be able to:
- Extract Data Using an API
- Extract Data Using Scraping
- Extract Data Using Regex
Before this lesson, you should already be able to:
- I will be using a module called
tweepy
today.- To install please type into your console
conda install tweepy
- OR if that does not work,
pip install tweepy
- OR if that does not work,
- To install please type into your console
TIMING | TYPE | TOPIC |
---|---|---|
5 min | Opening | Lesson Objectives |
30 min | Introduction | Getting Data |
30 min | Introduction | Regular Expressions |
30 min | Demo | Regular Expressions Demo |
30 min | Introduction | API |
30 min | Demo | API Demo |
30 min | Introduction | Scraping Demo |
35 min | Independent Practice | Regex, Scrape and API Yourself |
15 min | Wrap-up | Review Guided Practice |
####Agenda
####Homework
- Project 2 is Due next Week!!
####Resources:
- Mashape allows you to explore tons of different APIs. Alternatively, a Python API wrapper is available for many popular APIs.
- The Data Science Toolkit is a collection of location-based and text-related APIs.
- API Integration in Python provides a very readable introduction to REST APIs.
- Microsoft's Face Detection API, which powers How-Old.net, is a great example of how a machine learning API can be leveraged to produce a compelling web application. Web Scraping Resources:
- For a much longer web scraping tutorial covering Beautiful Soup, lxml, XPath, and Selenium, watch Web Scraping with Python (3 hours 23 minutes) from PyCon 2014. The slides and code are also available.
- import.io and Kimono claim to allow you to scrape websites without writing any code. Its alrighhhtttttt
- How a Math Genius Hacked OkCupid to Find True Love and How Netflix Reverse Engineered Hollywood are two fun examples of how web scraping has been used to build interesting datasets.