A Jupyter notebook to analyze trends in pipeline incidents in the US from 2010 to October 2017. Different aspects of the incidents are considered to answer these five questions:
- How common are spills?
- What is their spatial and temporal distributions?
- What is their scale regarding volume and cost?
- What are the main causes of spills?
- What places have a higher risk?
At this moment, all the code is in pipeline.ipynb
file. It mainly performs the
following actions:
- Checks the latest update in the dataset. If the date is more than
number_of_days
variable, it downloads the latest dataset from the server and replaces the old dataset locally. - Extracts the required columns from the dataset, cleans the values (both text and
NaN
) and converts the units to more useful ones. Finally it saves the cleaned dataset locally. It also exports ajson
file containing the summary of data. This file will be used to create a website which shows the summary using D3 library. - Plots multiple figures showing temporal and spatial trends in spills and their financial and environmental damage.
- Some figures are exported to be used in the README file and the final report.
- numpy
- pandas
- matplotlib
- plotly
The dataset contains 'Flagged Incidents' from PHMSA Pipeline Safety website.
Mahdi Sadjadi - http://mahdisadjadi.com/
This repository is also published as a blog post.
This project is licensed under the MIT License - see the LICENSE.md file for details. The dataset is downloaded from PHMSA.