Project 1: Data Modeling with PostgreSQL

This Project creates a postges database sparkifydb for a music app, Sparkify. The purpose of the database is to model log and song datasets (JSON format) with a star schema optimised for queries on song play analysis.

Schema design and ETL pipeline

The star schema has 1 fact table(songplays) and 4 dimension tables(users,songs,artists,time).CRUD queries are defined in sql_queries.py. create_tables.py uses functions drop_tables,create_database and create_tables to create the database sparkifydb and the required tables.

Extract the data and load it in etl.py to fill the songs and artists tables with data from song_data JSON file. Processed data from log_data is used to fill data for time and users tables. selectquery catch the song and artist id from songs and artists tables and combines them with the log file to fill the songplays fact table.

Song play example queries

Simple queries might include number of users with each membership level.

SELECT COUNT(level) FROM users;

Day of the week music most frequently listened to.

SELECT COUNT(weekday) FROM time;

Or, hour of the day music most often listened to.

SELECT COUNT(hour) FROM time;

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 1: Data Modeling with PostgreSQL

Schema design and ETL pipeline

Song play example queries

About

Releases

Packages

Languages

alymohamed20/Data-Modeling-Project-PostgreSQL-ETL-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Project 1: Data Modeling with PostgreSQL

Schema design and ETL pipeline

Song play example queries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages