emr-cluster

This project provides a detailed overview of creating an automated data engineering pipeline using Airflow, AWS services, Spark, Snowflake and Tableau

python aws airflow snowflake s3-bucket data-engineering ec2-instance tableau emr-cluster etl-pipeline dags

Updated Aug 2, 2024
Python

sowrabh-m / Data_Processing_using_Spark_Flink

Star

This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.

aws spark aws-s3 aws-emr spark-streaming flink flink-stream-processing emr-cluster spark-flink

Updated Jul 20, 2024
Python

longNguyen010203 / Spark-Processing-AWS

Star

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

aws apache-spark terraform aws-s3 iam pyspark cloud-computing aws-ec2 redshift data-pipeline aws-services apache-airflow emr-cluster spark-cluster spark-master spark-worker

Updated Jul 12, 2024
Python

kevinndungu-source / Amazon_EMR_Project_Resources

Star

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.

python bigdata pyspark aws-ec2 dataprocessing datamanagement emr-cluster juypter-notebook bigdatainfrastructure

Updated Jun 19, 2024
Jupyter Notebook

cloudposse-archives / terraform-aws-spotinst-mrscaler

Star

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource

emr cluster map-reduce spot-instances spotinst emr-cluster hcl2

Updated May 21, 2024

BiGHeaDMaX / Extraction-features-avec-Spark

Star

Ce projet a pour but de réaliser une extraction de features, suivie d'une PCA sur des données volumineuses à l'aide de Spark dans le cloud.

aws spark hadoop bigdata aws-emr pyspark pca transfer-learning emr-cluster features-extraction bigdatacloud elastic-mapreduce

Updated Mar 14, 2024
Jupyter Notebook

tawounfouet / data-scientist-ocr-x-centralsupelec

Star

Experience with time-series analysis and forecasting models, large data sets, model development and visualisation, statistics.

api aws machine-learning cloud spark deep-learning pipeline clustering bigdata s3-bucket ci-cd supervised-learning cloud-computing unsupervised-learning emr-cluster mlflow-tracking-server

Updated Jan 15, 2024
Jupyter Notebook

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

Star

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

lambda-functions pyspark boto3 pyspark-notebook emr-cluster transient-cluster

Updated Jan 1, 2024
Python

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

Star

A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing

distributed-computing snowflake pyspark amazon-s3 emr-cluster airflow-dags transient-cluster

Updated Jan 1, 2024
Python

adnanrahin / spark-rdd-df-comparison-emr

Star

java aws scala spark dataframe rdd emr-cluster

Updated Dec 23, 2023
Scala

immu0001 / Udacity-Data-Engineer-nanodegree

Star

Classwork projects and home works done through Udacity data engineering nano degree

data-science big-data spark etl s3-bucket data-analysis redshift data-pipelines classwork emr-cluster data-lake-analytics data-engineering-pipeline airflow-dags

Updated Dec 12, 2023
Jupyter Notebook

airscholar / EMR-for-data-engineers

Star

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

aws apache-spark aws-s3 emr-cluster

Updated Nov 12, 2023
Python

JennaFar / elastic-data-factory

Star

Elastic Data Factory

aws data-science machine-learning sql presto deployment athena data-acquisition data-visualization pyspark data-processing emr-cluster sagemaker sagemaker-deployment

Updated Oct 26, 2023
Python

EddieAmaitum / NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

Star

Performed business operations using Big data technologies: AWS EMR, AWS RDS (MySQL), Hadoop, Apache Scoop, Apache HBase, MapReduce

python aws hadoop hbase linux-shell sqoop mapreduce-jobs rds-database emr-cluster

Updated Sep 20, 2023
Python

kulwinderkk / Big_data_Wrangling_GoogleNgram_data_analysis

Star

Loaded, filtered and visualized Google Ngrams dataset, which was created by Google's research team by analyzing all of the content in Google Books from the 1800s into the 2000s, in a cloud-based distributed computing environment using Hadoop, Spark, and the AWS S3 file system.

aws spark hadoop s3-bucket boto3 emr-cluster hadoop-hdfs

Updated Sep 6, 2023
Jupyter Notebook

dacort / demo-code

Star

Bits of code I use during live demos

amazon-emr aws-emr aws-cloudformation aws-athena amazon-athena emr-cluster aws-cloudformation-templates emr-notebooks live-demos

Updated Jan 23, 2024
Jupyter Notebook

RonnJacob / PageRank-MapReduce-Spark

Star

Implemented the PageRank algorithm in Hadoop MapReduce framework and Spark.

scala big-data apache-spark hadoop hadoop-cluster mapreduce hadoop-mapreduce emr-cluster mapreduce-java hadoop-hdfs

Updated Jul 23, 2023
Java

Improve this page

Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emr-cluster

Here are 97 public repositories matching this topic...

PedGit025 / Big-Data-Showcase

cloudposse / terraform-aws-emr-cluster

matbragan / emr-airflow

Siddhesh19991 / Automate_EMR_ETL_pipeline_using_Airflow

sowrabh-m / Data_Processing_using_Spark_Flink

longNguyen010203 / Spark-Processing-AWS

kevinndungu-source / Amazon_EMR_Project_Resources

cloudposse-archives / terraform-aws-spotinst-mrscaler

BiGHeaDMaX / Extraction-features-avec-Spark

tawounfouet / data-scientist-ocr-x-centralsupelec

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

adnanrahin / spark-rdd-df-comparison-emr

immu0001 / Udacity-Data-Engineer-nanodegree

airscholar / EMR-for-data-engineers

JennaFar / elastic-data-factory

EddieAmaitum / NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

kulwinderkk / Big_data_Wrangling_GoogleNgram_data_analysis

dacort / demo-code

RonnJacob / PageRank-MapReduce-Spark

Improve this page

Add this topic to your repo