Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 1.4 KB

README.md

File metadata and controls

18 lines (12 loc) · 1.4 KB

Movies-ETL

Overiview

Amazing Prime loves the dataset and wants to keep it updated on a daily basis. Britta needs your help to create an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. You’ll need to refactor the code from this module to create one function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data—and performs the ETL process by adding the data to a PostgreSQL database.

ETL_functiion_test

Here we created a function that took in 3 arguments. We then extracted the 3 files we needed to get them in to our pandas dataframe by reading kaggle data and ratings data as CSVs and loading the json file as a DataFrame. etl_function

ETL_clean_wiki_movies

In this python file we use list comprehension, functions, and regex to clean movie data and organize column names. CleanWikiMovie

ETL_create_database

Lastly we used PostgreSQL as our database to load the transformation in. We used sqlalchemy library to create an engine to connect to our database.

sqlalchemy