Movies-ETL

Overiview

Amazing Prime loves the dataset and wants to keep it updated on a daily basis. Britta needs your help to create an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. You’ll need to refactor the code from this module to create one function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data—and performs the ETL process by adding the data to a PostgreSQL database.

ETL_functiion_test

Here we created a function that took in 3 arguments. We then extracted the 3 files we needed to get them in to our pandas dataframe by reading kaggle data and ratings data as CSVs and loading the json file as a DataFrame.

ETL_clean_wiki_movies

In this python file we use list comprehension, functions, and regex to clean movie data and organize column names.

ETL_create_database

Lastly we used PostgreSQL as our database to load the transformation in. We used sqlalchemy library to create an engine to connect to our database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Movies-ETL

Overiview

ETL_functiion_test

ETL_clean_wiki_movies

ETL_create_database

Files

README.md

Latest commit

History

README.md

File metadata and controls

Movies-ETL

Overiview

ETL_functiion_test

ETL_clean_wiki_movies

ETL_create_database