IGTI Bootcamp - Cloud Data Engineer - Challenge 1

Description

Create a process to start the infrastructure, ingest, read, transform (to Parquet file), processing and manipulate files on DataLake. The resources were created and destroyed by Terraform pipeline.

Data Lake on GCP Cloud Storage.
Job Spark (PySpark) on GCP Cloud DataProc.
GCP BigQuery can be used to get insights querying the Data Lake (Parquet Files).

Stack

Python
PySpark
Terraform
Google Cloud Platform (Cloud Storage, Cloud Dataproc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

IGTI Bootcamp - Cloud Data Engineer - Challenge 1

Description

Stack

Diagram

Files

README.md

Latest commit

History

README.md

File metadata and controls

IGTI Bootcamp - Cloud Data Engineer - Challenge 1

Description

Stack

Diagram