Skip to content

danielbeach/dataEngineeringTemplate

Repository files navigation

dataEngineeringTemplate

Template for Data Engineering and Data Pipeline projects

Project Overview

This is a high level description of the project, what it is trying to accomplish.

  1. Add your requirements to the requirements.txt file for Python pip packages.
  2. Add any nessesary installations to the Dockerfile.

Architecture

This is a high level description of the tool(s) and decisions around why those tool(s) were choosen.

Testing

This is instructions on how to test this repo. All tests are located inside the tests folder. We are using pytest. Run the following steps.

  1. docker build --tag my-project .
  2. docker-compose up test

Add your unit tests to files inside the tests folder ... name your files test_somename.py

Data Flow

High level description of data source(s) and sink(s), as well as the general pattern and data flow through the pipeline. Discuss any assumptions made.

Hooks

If you have your own hooks, you can add them to git-hooks.

Use this command to add them to the appropriate folder then commit.

sh git-hooks/copy_hooks.sh

Whatever is copied from git-hooks/copy_hooks.sh will replace anything set up using the pre-commit.

About

Template for Data Engineering and Data Pipeline projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published