Skip to content

Created an ETL pipeline to merge two CSV files (converted to JSON) into a parquet file using Azure Data Factory, The data was extracted from NYC Open Data: https://opendata.cityofnewyork.us/ and I created a Blob Container within an existing storage account.

Notifications You must be signed in to change notification settings

Tyriek-cloud/NYC-DCA-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 

Repository files navigation

NYC DCA ETL

This was a project done in Azure Data Factory.

I began by extracting data from New York City Open Data: https://opendata.cityofnewyork.us/

From there, I created a Blob Container within an existing storage account. Then I initialized Azure Data Factory to do a series of T-SQL transformations on CSV files. I ultimately wanted to load data into a parquet file. The dataflow looks like this:

image

The final, loaded result of the ETL process resulted in the creation of a parquet file hosted within a generated blob in the container:

image

I then went back into my container to look for any issues to trouble shoot. There were no issues to resolve, so I monitored activity in my container in a private dashboard:

image image image

About

Created an ETL pipeline to merge two CSV files (converted to JSON) into a parquet file using Azure Data Factory, The data was extracted from NYC Open Data: https://opendata.cityofnewyork.us/ and I created a Blob Container within an existing storage account.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published