Skip to content

ETL project using python and postgres. Part of Udacity Data Engineering nanodegree

Notifications You must be signed in to change notification settings

ldself/dataeng01_datamodelingwithpostgres

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL using Python and PostgreSQL

Introduction

This project demonstrates an ETL process that reads attribute information and log data using Python and writes the information into a normalized database build in Postgre. The data represents songs in a streaming service called Sparkify and log data that contains information about who listened to which songs and other related attributes.

This project was developed as part of the Udacity Data Engineering nanodegree program.

How to install

There is no installation package. The folder structure and all of the files can be downloaded from the repositoy and saved directly on a local computer. The code expects the files to be saved in the /home/workspace folder of the local machine

How to use

  • The database is (re)created and refreshed by executing the create_tables.py script from the command line.
  • The logic of the ddl and dml scripts in the sql_queries.py script can be tested by executing the etl.ipynb and test.ipnyb python notebooks.
  • The etl.py script is the main script that reads and process all of the files in the data folder

Technologies used

package (version)
python (3.6.3)
psycopg2 (2.7.4)
pandas (0.23.3)
numpy (1.12.1)

About

ETL project using python and postgres. Part of Udacity Data Engineering nanodegree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published