Skip to content

Latest commit

 

History

History
54 lines (35 loc) · 1.31 KB

README.md

File metadata and controls

54 lines (35 loc) · 1.31 KB

Reddit Crawler

Python script to crawl Reddit.

About

The story behind this crawler is that I wanted to get all of /r/DailyProgrammer/ challanges, but couldn't have been bothered to go through every post, page by page, for hundreds of posts.

Features

  • Crawl any subreddit,
  • Choose how many pages you wish to crawl,
  • Save crawled data and do whatever you want with it

Prerequisites

Setup

$ git clone https://github.com/filipkonieczny/reddit-crawler.git
$ cd reddit-crawler/
$ virtualenv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt

Usage

All you have to do is run the script while in the project directory like this:

$ python reddit_crawler.py SUBREDDIT CRAWLING_DEPTH

and supply SUBREDDIT along with CRAWLING_DEPTH(optional, default is 1), for example:

$ python reddit_crawler.py http://www.reddit.com/r/dailyprogrammer/ 10

will crawl you first 10 pages of /r/DailyProgrammer/ subreddit.