Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 811 Bytes

README.md

File metadata and controls

24 lines (19 loc) · 811 Bytes

IGTI Bootcamp - Cloud Data Engineer - Challenge 1

DescriptionStackDiagram

Description

Create a process to start the infrastructure, ingest, read, transform (to Parquet file), processing and manipulate files on DataLake. The resources were created and destroyed by Terraform pipeline.

  • Data Lake on GCP Cloud Storage.
  • Job Spark (PySpark) on GCP Cloud DataProc.
  • GCP BigQuery can be used to get insights querying the Data Lake (Parquet Files).

Stack

  • Python
  • PySpark
  • Terraform
  • Google Cloud Platform (Cloud Storage, Cloud Dataproc)

Diagram

diagram