NYC Mobility Survey Analysis

This was an end-to-end data engineering project that was designed to derive insights from the New York City Department of Transportation's 2022 Citywide Mobility Survey. Five NYC DOT datasets were extracted to an s3 bucket, transformed and loaded into a database, and loaded into AWS QuickSight for analysis.

Go here for the analysis report: https://github.com/Tyriek-cloud/NYC-Mobility-Survey-Analysis/blob/main/NYC%20Mobility%20Survey%20Analysis.pdf

Step 1: Identity Access Management (IAM)

Despite the fact that I am the only person on this project, I decided to exercise good Identity Access Management (IAM) practices. I created a user (myself) and attached the permission policies directly. For the most part, the user has full access to all the assigned services (no Policy Boundary). The list of permissions that I assigned to the user are listed below:

Step 2: Creating a s3 Bucket

We begin by creating a s3 Bucket. A standard s3 Bucket will be fine for this use case. It is important to pick an easily recognizable name for this project, but everything else should be the default.

The staging folder will contain all five of the datasets that were extracted into AWS. The warehouse folder will contain the transformed data. The data will be changed into parquet format for easier management (quite helpful because one of the datasets is quite large).

Here is a peak at the datasets in the staging folder. They are all in CSV format (it is possible to extract them into another preferred file format from NYC Open Data).

Step 3: A Complex Join Schema in AWS Glue

Several Full Outer Joins were initiated for easy data retrieval. While it is true that this will result in datapoints that may not be immediately needed analysis, this approaches allows for maximum data retrieval. At times, it is better to have more data than less data. Furthermore, everything is either joined on the hh_id or the person_id.

Step 4: Run the Glue Job

Now we run the AWS Glue job. This is simply a matter of waiting for AWS to do its thing. It is important that we select this GitHub repository for record keeping/in the event that we need to re-run the job. Each time the job was run, it passed in about a minute.

Step 5: Create a Database to be Crawled

An AWS Glue Crawler was created to explore and extract all the data in all the sub-folders of the datawarehouse folder in our s3 Bucket. This also completed in about a minute. It was also important to ensure that proper permissions were configured (IAM full access for the user). Finally, we could use Athena permissions here to easily retrieve our data for analysis.

Step 6: Load data with Athena and Visualize the Data with Quicksight

After the crawler has completed its job and combined the data into a conveniently organized table with a schema, it is time for analysis. Since Athena was connected to Quicksight during this process, it was simple to connect my account and the user account to Quicksight. There were some notable highlights that were found after the data analysis and visualization:

There is a stable number of disabled New Yorkers who are employed (which is likely due to an increase of telework in recent years).
More men than women are likely to purchase an electric vehicle in NYC.
More NYC employees residents have the option to work fully remote for their roles.

Sources:

https://data.cityofnewyork.us/Transportation/Citywide-Mobility-Survey-Vehicle-2022/qyry-gwrj

https://data.cityofnewyork.us/Transportation/Citywide-Mobility-Survey-Day-2022/5njs-bq3c

https://data.cityofnewyork.us/Transportation/Citywide-Mobility-Survey-Trip-2022/x5mc-2gmi

https://data.cityofnewyork.us/Transportation/Citywide-Mobility-Survey-Household-2022/dt3g-khpi

https://data.cityofnewyork.us/Transportation/Citywide-Mobility-Survey-Person-2022/7qdz-u9hr

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Data Sources		Data Sources
NYC-Mobility-Survey-Analysis-job		NYC-Mobility-Survey-Analysis-job
LICENSE		LICENSE
NYC Mobility Survey Analysis.pdf		NYC Mobility Survey Analysis.pdf
NYC-Mobility-Survey-Analysis-job.py		NYC-Mobility-Survey-Analysis-job.py
README.md		README.md
Table-Schema		Table-Schema

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC Mobility Survey Analysis

Step 1: Identity Access Management (IAM)

Step 2: Creating a s3 Bucket

Step 3: A Complex Join Schema in AWS Glue

Step 4: Run the Glue Job

Step 5: Create a Database to be Crawled

Step 6: Load data with Athena and Visualize the Data with Quicksight

About

Languages

License

Tyriek-cloud/NYC-Mobility-Survey-Analysis

Folders and files

Latest commit

History

Repository files navigation

NYC Mobility Survey Analysis

Step 1: Identity Access Management (IAM)

Step 2: Creating a s3 Bucket

Step 3: A Complex Join Schema in AWS Glue

Step 4: Run the Glue Job

Step 5: Create a Database to be Crawled

Step 6: Load data with Athena and Visualize the Data with Quicksight

About

Topics

Resources

License

Stars

Watchers

Forks

Languages