Skip to content

National Weather Service Web Scrape, ETL, and Databasing

License

Notifications You must be signed in to change notification settings

Drewrwhite/weather_database

Repository files navigation

National Weather Service Web Scrape and ETL

This project is part of a larger project located here: Team Week 3

Contents

Summary | Technologies Used | Sources | Description | dw_weather_scrape.py | dw_weekly_avg.py | Visualizations | Known Bugs

Links:

Looker Dashboard

Summary:

This is a data engineering project that utilizes various technologies to scrape weather data, transform it, and store it in Google BigQuery. The project uses Python as its primary language and Apache Airflow as its workflow management system. BeautifulSoup is used to scrape the data from the National Weather Service and Pandas is used to manipulate the data. Google BigQuery is used as the primary data store.

The dw_weather_scrape.py script contains three functions that work together to scrape weather data from the National Weather Service, transform the data, and write it to Google BigQuery on a daily basis.

The dw_weekly_avg.py script pulls data from the daily table in Google BigQuery and calculates weekly averages for select columns. The script then writes the averages to the weekly_avg table on a weekly basis.

Note: For demonstration purposes in this project, the scheduled intervals are not daily and weekly but instead hourly and daily. This was to gather more data for the presentation of this project. In a full production environment, the Airflow DAGs will trigger at the daily and and weekly intervals.

Technologies Used

  • Python
  • Apache Airflow
  • Pandas
  • BeautifulSoup
  • Google BigQuery

Sources:

A dictionary of the sources of the city weather data:

Description:

dw_weather_scrape.py

  • scrape_weather_data
    • Uses BeautifulSoup to scrape National Weather Service and put into Pandas data frame.
  • transform_weather_data
    • Takes Pandas data frame and makes transformations on data to create more usable values.
  • write_weather_data_to_bq
    • Writes the scraped/transformed data to Google BigQuery daily appending on to existing daily table.

Daily Schema:

dw_weekly_avg.py

  • calculate_weekly_averages
    • Pulls daily data from BigQuery and gets averages of select columns.
  • write_weekly_avg_to_bq
    • Writes averages to BigQuery on weekly schedule to weekly_avg table.

Weekly Avg Schema:


Visualizations

Known Bugs

  • No known bugs

License

MIT

If you find any issues, please reach out at: d.white0002@gmail.com.

Copyright (c) 2023 Drew White

About

National Weather Service Web Scrape, ETL, and Databasing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published