Mountain Project Database Builder

This project is a web crawler built using Scrapy that aims to pull important metadata about both areas and routes from Mountain Project into a database to serve as the backbone for a 3rd party API. The ultimate goal of this project is to allow 3rd party developers to provide supplementary services for Mountain Project users.

Prerequisites

Python 3.x
A MongoDB database

Installation

Clone the repository
Navigate to your cloned repository
Run pip install -r requirements.txt

Usage

The spider can be started with:

scrapy crawl mp -s MONGO_URI=<YOUR_MONGO_URI> -s MONGO_DATABASE=<YOUR_MONGO_DATABASE_NAME>

Note that this command will both log debug output to console, and log your configured level to your configured logfile

As Mountain Project contains a rather impressive amount of areas and routes, expect this crawler to take several hours. I highly recommend using Scrapy's built in job manager so that you can stop the crawl without losing where you left off. By default the logging level is INFO the log file is log.txt. These values are configurable in mp_scraper/settings.py. For more information on settings check Scrapy's docs

What if I don't want to use MongoDB?

If MongoDB doesn't suit your needs, creating a new serialization pipeline is fairly trivial. I recommend looking at Scrapy's Item Pipeline docs as a starting place. All items coming into the pipeline are defined in mp_scraper/items.py.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
mp_scraper		mp_scraper
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapinghub.yml		scrapinghub.yml
scrapy.cfg		scrapy.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mountain Project Database Builder

Prerequisites

Installation

Usage

What if I don't want to use MongoDB?

About

Releases

Packages

Languages

License

JacobHearst/mp-crawl

Folders and files

Latest commit

History

Repository files navigation

Mountain Project Database Builder

Prerequisites

Installation

Usage

What if I don't want to use MongoDB?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages