Skip to content

mralmeidars/igti-cloud-data-engineer-1-gcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IGTI Bootcamp - Cloud Data Engineer - Challenge 1

DescriptionStackDiagram

Description

Create a process to start the infrastructure, ingest, read, transform (to Parquet file), processing and manipulate files on DataLake. The resources were created and destroyed by Terraform pipeline.

  • Data Lake on GCP Cloud Storage.
  • Job Spark (PySpark) on GCP Cloud DataProc.
  • GCP BigQuery can be used to get insights querying the Data Lake (Parquet Files).

Stack

  • Python
  • PySpark
  • Terraform
  • Google Cloud Platform (Cloud Storage, Cloud Dataproc)

Diagram

diagram

About

IGTI Bootcamp - Challenge 1 - Cloud Data Engineer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published