Amazing Prime loves the dataset and wants to keep it updated on a daily basis. Britta needs your help to create an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. You’ll need to refactor the code from this module to create one function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data—and performs the ETL process by adding the data to a PostgreSQL database.
Here we created a function that took in 3 arguments. We then extracted the 3 files we needed to get them in to our pandas dataframe by reading kaggle data and ratings data as CSVs and loading the json file as a DataFrame.
In this python file we use list comprehension, functions, and regex to clean movie data and organize column names.
Lastly we used PostgreSQL as our database to load the transformation in. We used sqlalchemy library to create an engine to connect to our database.