emr-cluster

Star

Here are 37 public repositories matching this topic...

san089 / goodreads_etl_pipeline

Star

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

Wittline / pyspark-on-aws-emr

Sponsor

Star

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

python aws big-data spark aws-emr pyspark dataengineering big-data-analytics ec2-spot emr-cluster wordcloud-generator ec2-spot-instances

Updated Jun 13, 2022
Python

minhky2185 / healthcare_data_pipeline

Star

An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.

visualization mysql data big-data spark apache-spark analytics postgresql s3 data-engineering data-lake powerbi emr-cluster spark-cluster data-engineering-pipeline healthcare-data rds-mysql rds-postgres

Updated Jan 31, 2023
Python

anthonywong611 / Batch-ETL-with-AWS-EMR-and-MWAA

Star

Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.

airflow s3-bucket aws-cloudformation batch-processing emr-cluster

Updated Jul 12, 2021
Python

airscholar / EMR-for-data-engineers

Star

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

aws apache-spark aws-s3 emr-cluster

Updated Nov 12, 2023
Python

HarshadRanganathan / aws-emr-launcher

Star

Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)

aws aws-emr emr-cluster

Updated Dec 8, 2022
Python

sepulworld / serverless-aws-emr-boilerplate

Sponsor

Star

Event driven EMR via Serverless

emr aws-lambda serverless aws-apigateway python3 serverless-framework aws-sns emr-cluster

Updated Nov 22, 2017
Python

mikeacosta / florasense

Star

Orchestrating Cloud ETL Workloads

aws cloudformation apache-spark lambda-functions data-warehouse data-lake kinesis-stream redshift step-functions emr-cluster etl-pipeline redshift-spectrum

Updated Sep 19, 2021
Python

JohnnyLVP / Project-Standar-Documentation

Star

This repository contains a definition of standar structure for Machine Learning and Data Pipelines Projects

python emr aws documentation machine-learning ec2 project pyspark standard redshift boto3 emr-cluster

Updated Apr 28, 2020
Python

longNguyen010203 / Spark-Processing-AWS

Star

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

aws apache-spark terraform aws-s3 iam pyspark cloud-computing aws-ec2 redshift data-pipeline aws-services apache-airflow emr-cluster spark-cluster spark-master spark-worker

Updated Jul 12, 2024
Python

ucaiado / etl-intraday-bidask

Star

Hosting data lake with bid-ask data in S3 using Spark and Airflow

flask aws airflow spark athena s3 emr-cluster etl-pipeline plotly-dash

Updated Jul 24, 2020
Python

Tanay0510 / Data-Lake-with-Spark

Star

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

spark s3 datalake emr-cluster etl-pipeline

Updated Aug 17, 2021
Python

jpsalado92 / Udacity-DEND_DataLake-AWSEMR

Star

Full code for UDACITY's Data Engineer Nano Degree project. Implementing a Data Lake in Amazon's cloud with AWS S3, AWS EMR and Spark.

s3-bucket data-warehouse aws-emr data-lake emr-cluster

Updated Jul 22, 2020
Python

EddieAmaitum / NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

Star

Performed business operations using Big data technologies: AWS EMR, AWS RDS (MySQL), Hadoop, Apache Scoop, Apache HBase, MapReduce

python aws hadoop hbase linux-shell sqoop mapreduce-jobs rds-database emr-cluster

Updated Sep 20, 2023
Python

arfatmateen / Data_Lake_and_ETL_Pipeline_on_AWS_using_Spark

Star

Database Schema & ETL pipeline for Song Play Analysis | Bosch AI Talent Accelerator Scholarship Program

python aws sql jupyter-notebook s3-bucket pyspark emr-cluster etl-pipeline

Updated Sep 18, 2022
Python

nileshsingal / PUBG-DATA-ANALYSIS

Star

Player Unknown's Battlegrounds (PUBG), is a first person shooter game where the goal is to be the last player standing. You are placed on a giant circular map that shrinks as the game goes on, and you must find weapons, armor, and other supplies in order to kill other players / teams and survive.

spark hive aws-lambda api-gateway bigdata s3-bucket tableau aws-cloudformation emr-cluster

Updated Jan 27, 2021
Python

jpb111 / AWS-EMR-APACHE-SPARK

Star

Guide: Executing a python script on AWS EMR for big data analysis.

python aws aws-s3 pandas pyspark aws-ec2 emr-cluster emr-serverless

Updated Feb 9, 2023
Python

alikemalocalan / alibaba-cloud-emr-create-examples

Star

Alibaba Cloud EMR Create Example for Python

python3 aliyun alibaba-cloud emr-cluster alibaba-cloud-sdk

Updated Jul 17, 2019
Python

Siddhesh19991 / Automate_EMR_ETL_pipeline_using_Airflow

Star

This project provides a detailed overview of creating an automated data engineering pipeline using Airflow, AWS services, Spark, Snowflake and Tableau

python aws airflow snowflake s3-bucket data-engineering ec2-instance tableau emr-cluster etl-pipeline dags

Updated Aug 2, 2024
Python

ucaiado / etl-spark-aws

Star

Data Modeling with Spark for a data lake hosted on S3

aws spark s3 python3 emr-cluster

Updated Jul 24, 2020
Python

Improve this page

Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emr-cluster

Here are 37 public repositories matching this topic...

san089 / goodreads_etl_pipeline

Wittline / pyspark-on-aws-emr

minhky2185 / healthcare_data_pipeline

anthonywong611 / Batch-ETL-with-AWS-EMR-and-MWAA

airscholar / EMR-for-data-engineers

HarshadRanganathan / aws-emr-launcher

sepulworld / serverless-aws-emr-boilerplate

mikeacosta / florasense

JohnnyLVP / Project-Standar-Documentation

longNguyen010203 / Spark-Processing-AWS

ucaiado / etl-intraday-bidask

Tanay0510 / Data-Lake-with-Spark

jpsalado92 / Udacity-DEND_DataLake-AWSEMR

EddieAmaitum / NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

arfatmateen / Data_Lake_and_ETL_Pipeline_on_AWS_using_Spark

nileshsingal / PUBG-DATA-ANALYSIS

jpb111 / AWS-EMR-APACHE-SPARK

alikemalocalan / alibaba-cloud-emr-create-examples

Siddhesh19991 / Automate_EMR_ETL_pipeline_using_Airflow

ucaiado / etl-spark-aws

Improve this page

Add this topic to your repo