NREL MIDC Scraper

This is a project to scrape solar irradiance and meteorological data from NREL's Measurement and Instrumentation Data Center (MIDC) records. The specific station being crawled is their baseline measurement system located in Denver, Colorado.

A Scrapy crawler parses the requested response, before a SQLAlchemy connection is leveraged to push data to a PostgreSQL instance hosted on Amazon RDS. Using the Serverless framework, this function is deployed to AWS Lambda and scheduled for daily scraping at 11:59PM MST (UTC-7:00). The project backlog is tracked on this Kanban board.

🚀 Quick start

Fork & Install Dependencies

All the dependencies for this project (e.g. Scrapy, Pandas, Psycopg2-binary) are bundled using Pipenv. Leverage them using pipenv install or pipenv sync (if you already have an activated virtual environment). If your developing for a different use-case, you can also set up your own pipenv with Scrapy as a starting dependency:

pipenv --three

pipenv shell

pipenv install scrapy

Install Serverless

This Scrapy Crawler operates on Lambda and RDS using the Serverless framework, so you'll require an AWS account and credentials as prerequisites. You can configure Serverless access to your cloud provider by following documentation.

npm install -g serverless

serverless plugin install -n serverless-python-requirements

Configure and Deploy

Make sure to configure the serverless.yml according to your requirements and dependencies. E.g.

custom:
  pythonRequirements:
    slim: true # Omits tests, __pycache__, *.pyc etc from dependencies
    usePipenv: true
    dockerizePip: non-linux

Deploy to your cloud provider. On AWS, you should see your application stored in an S3 bucket and a corresponding function on the Lambda console. Scheduled runs can be found on the CloudWatch dashboard.

serverless deploy --verbose

Invoke (optional)

You can schedule job events using cron syntax in serverless.yml, but the function can also be invoked manually:

serverless invoke -f nrelScrape -l

🧐 What's inside?

A quick look at the top-level files and directories you'll likely find in this repository:

.
├── nrel_scraper
├── query_history
├── .gitignore
├── Pipfile
├── Pipfile.lock
├── handler.py
├── LICENSE
├── package-lock.json
├── package.json
├── scrapy.cfg
├── serverless.yml
└── README.md

/nrel_scraper: This directory contains all of the data that will be queried by GraphQL and ultimately displayed on the website.
/query_history: This directory contains all of the modules of code that your project depends on (npm packages) are automatically installed.
.gitignore: This file tells git which files it should not track / not maintain a version history for.
package-lock.json (See package.json below, first). This is an automatically generated file based on the exact versions of your npm dependencies that were installed for your project. (You won’t change this file directly).
package.json: A manifest file for Node.js projects, which includes things like metadata (the project’s name, author, etc). This manifest is how npm knows which packages to install for your project.
README.md: A text file containing useful reference information about your project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NREL MIDC Scraper

🚀 Quick start

Fork & Install Dependencies

Install Serverless

Configure and Deploy

Invoke (optional)

🧐 What's inside?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
nrel_scraper		nrel_scraper
query_history		query_history
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
handler.py		handler.py
package-lock.json		package-lock.json
package.json		package.json
scrapy.cfg		scrapy.cfg
serverless.yml		serverless.yml

Kim-Sha/nrel-scraper

Folders and files

Latest commit

History

Repository files navigation

NREL MIDC Scraper

🚀 Quick start

Fork & Install Dependencies

Install Serverless

Configure and Deploy

Invoke (optional)

🧐 What's inside?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages