IGTI Bootcamp - Cloud Data Engineer - Challenge 1

Description

Create a process to start the infrastructure, ingest, read, transform (to Parquet file), processing and manipulate files on DataLake. The resources were created and destroyed by Terraform pipeline.

Data Lake on GCP Cloud Storage.
Job Spark (PySpark) on GCP Cloud DataProc.
GCP BigQuery can be used to get insights querying the Data Lake (Parquet Files).

Stack

Python
PySpark
Terraform
Google Cloud Platform (Cloud Storage, Cloud Dataproc)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
docs		docs
iac		iac
scripts		scripts
static		static
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IGTI Bootcamp - Cloud Data Engineer - Challenge 1

Description

Stack

Diagram

About

Releases

Packages

Languages

mralmeidars/igti-cloud-data-engineer-1-gcp

Folders and files

Latest commit

History

Repository files navigation

IGTI Bootcamp - Cloud Data Engineer - Challenge 1

Description

Stack

Diagram

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages