Showcase skill in AWS, Big Data warehousing, SPARK, Data Gathering, Data Analysis, Dashboarding and Reporting
-
Updated
Oct 23, 2024 - Jupyter Notebook
Showcase skill in AWS, Big Data warehousing, SPARK, Data Gathering, Data Analysis, Dashboarding and Reporting
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
Developing a Flow with EMR and Airflow
This project provides a detailed overview of creating an automated data engineering pipeline using Airflow, AWS services, Spark, Snowflake and Tableau
This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊
Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource
Ce projet a pour but de réaliser une extraction de features, suivie d'une PCA sur des données volumineuses à l'aide de Spark dans le cloud.
Experience with time-series analysis and forecasting models, large data sets, model development and visualisation, statistics.
Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.
A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing
Classwork projects and home works done through Udacity data engineering nano degree
This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.
Elastic Data Factory
Performed business operations using Big data technologies: AWS EMR, AWS RDS (MySQL), Hadoop, Apache Scoop, Apache HBase, MapReduce
Loaded, filtered and visualized Google Ngrams dataset, which was created by Google's research team by analyzing all of the content in Google Books from the 1800s into the 2000s, in a cloud-based distributed computing environment using Hadoop, Spark, and the AWS S3 file system.
Bits of code I use during live demos
Implemented the PageRank algorithm in Hadoop MapReduce framework and Spark.
Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.
To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."