Python script to crawl Reddit.
The story behind this crawler is that I wanted to get all of /r/DailyProgrammer/ challanges, but couldn't have been bothered to go through every post, page by page, for hundreds of posts.
- Crawl any subreddit,
- Choose how many pages you wish to crawl,
- Save crawled data and do whatever you want with it
$ git clone https://github.com/filipkonieczny/reddit-crawler.git
$ cd reddit-crawler/
$ virtualenv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
All you have to do is run the script while in the project directory like this:
$ python reddit_crawler.py SUBREDDIT CRAWLING_DEPTH
and supply SUBREDDIT
along with CRAWLING_DEPTH
(optional, default is 1
), for example:
$ python reddit_crawler.py http://www.reddit.com/r/dailyprogrammer/ 10
will crawl you first 10 pages of /r/DailyProgrammer/ subreddit.