Project Overview.

The Sparkify is startup want to analyze the data and they have been collecting on user activity and songs on their new streaming app. The purpose of this database is to provide Sparkify the ability to easily query data.

ER diagram & Data Modeling

In order to afford Sparkify's analytics requirements, The data engineering team understands that the best approach is star schema modeling. The star schema will allow a single fact table to keep track of user's interaction with the application. Also surrounded by some dimensions like songs, artists, users and time.

ETL Process (pipeline) and Setup Project

Important note: The files were ran on local machine with Postgres Server running on port = 8080

etl.py: Responsible for the orchestration of the entire data flow pipeline that will execute the extraction from JSON source files, transform data with DQ checks and load into Postgres tables; Note: When each time you run 'etl.py' it will execute automaticly the 'create_table.py' file, on function of drop_tables and create_tables.
create_table.py : Database and tables creation. It is necessary to be run before etl.py so the tables are created and cleaned;
sql_queries.py : Contains the creation table DDL and inserts DML scripts that will be called by create_table.py. It will drop any existing table before creating it again.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
.workspace-config.json		.workspace-config.json
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview.

ER diagram & Data Modeling

ETL Process (pipeline) and Setup Project

About

Releases

Packages

Languages

Samidahlawi/sparkify-startup-data-moeling-postgres

Folders and files

Latest commit

History

Repository files navigation

Project Overview.

ER diagram & Data Modeling

ETL Process (pipeline) and Setup Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages