emr-cluster

Here are 97 public repositories matching this topic...

BiGHeaDMaX / Extraction-features-avec-Spark

Ce projet a pour but de réaliser une extraction de features, suivie d'une PCA sur des données volumineuses à l'aide de Spark dans le cloud.

aws spark hadoop bigdata aws-emr pyspark pca transfer-learning emr-cluster features-extraction bigdatacloud elastic-mapreduce

Updated Mar 14, 2024
Jupyter Notebook

abie-ramie / HQL_Cosmetic_ClickStreamData

Star

With online sales gaining popularity, tech companies are exploring ways to improve their sales by analyzing customer behavior and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging.

aws hive hiveql emr-cluster

Updated Jul 7, 2021

Siddhesh19991 / Automate_EMR_ETL_pipeline_using_Airflow

Star

This project provides a detailed overview of creating an automated data engineering pipeline using Airflow, AWS services, Spark, Snowflake and Tableau

python aws airflow snowflake s3-bucket data-engineering ec2-instance tableau emr-cluster etl-pipeline dags

Updated Aug 2, 2024
Python

rupeshtiwari / learning-apache-spark

Sponsor

Star

apache spark

spark apache python3 emr-cluster awsemr

Updated Mar 25, 2022
Jupyter Notebook

ucaiado / etl-spark-aws

Star

Data Modeling with Spark for a data lake hosted on S3

aws spark s3 python3 emr-cluster

Updated Jul 24, 2020
Python

ustcdj / Sparkify_Churn_Analysis

Star

Preventing churn is key to improving revenue for Sparkify, a subscription-based company (fictitious). This project is to analyze data from Sparkify to build a model to predict user churn. First, a sample dataset (128MB) was used on a local machine to explore relevant features and develop a working model. Then similar steps were used to develop a…

aws spark machine-learning-algorithms music-streaming emr-cluster

Updated Sep 10, 2020
Jupyter Notebook

skyler-myers-db / Common-Crawl-Analysis

Star

Parsing the common crawl database using Scala and Spark

emr scala big-data spark s3 s3-bucket common-crawl emr-cluster

Updated Mar 17, 2021
Scala

JennaFar / elastic-data-factory

Star

Elastic Data Factory

aws data-science machine-learning sql presto deployment athena data-acquisition data-visualization pyspark data-processing emr-cluster sagemaker sagemaker-deployment

Updated Oct 26, 2023
Python

adnanrahin / spark-rdd-df-comparison-emr

Star

java aws scala spark dataframe rdd emr-cluster

Updated Dec 23, 2023
Scala

ajinChen / amazon-product-analysis

Star

The goal of this repo is to analyze Amazon's digital product from different perspectives using AWS EMR.

aws big-data emr-cluster

Updated Mar 26, 2022
Jupyter Notebook

ramtekeabhas7 / Hive_Case_Study_using_AWS_Hadoop

Star

The goal is to extract the data and gather insights from a real-life data set of an e-commerce company, using BIG Data tools like Hive, Hadoop, AWS etc.

aws hive hadoop ec2 s3-bucket emr-cluster

Updated Dec 19, 2022

cloudposse-archives / terraform-aws-spotinst-mrscaler

Star

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource

emr cluster map-reduce spot-instances spotinst emr-cluster hcl2

Updated May 21, 2024

RonnJacob / PageRank-MapReduce-Spark

Star

Implemented the PageRank algorithm in Hadoop MapReduce framework and Spark.

scala big-data apache-spark hadoop hadoop-cluster mapreduce hadoop-mapreduce emr-cluster mapreduce-java hadoop-hdfs

Updated Jul 23, 2023
Java

omarfessi / UDACITY-CapstoneProject

Star

It's just my first repo, feel free to give feedbacks 😁

sql spark aws-s3 python3 iac aws-ec2 redshift emr-cluster

Updated Jan 19, 2021
Jupyter Notebook

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

Star

A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing

distributed-computing snowflake pyspark amazon-s3 emr-cluster airflow-dags transient-cluster

Updated Jan 1, 2024
Python

carlossanchezvega / twitter

Star

This repository aims to capture and clean data from the twitter API in order to perform a sentiment analysis on an EMR cluster.

aws cloud sentiment-analysis twitter-api emr-cluster

Updated Dec 30, 2017
Python

jaquelinecella / jaquelinecella-Bootcamp_modulo1_Eng_Dados_Cloud

Star

Criação de Esteiras de Deploy com Git Actions para subir uma infraestrutura na AWS com o Terraform fazendo controle da versão. Tecnologias utilizadas: escrita no formato Delta, Lambda Function, Kinesis Streaming, S3, Athena, Glue e EMR.

aws aws-lambda athena lambda-functions s3 glue kinesis-stream emr-cluster

Updated Aug 18, 2022
Jupyter Notebook

kevinndungu-source / Amazon_EMR_Demonstration_Resources

Star

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.

python bigdata pyspark aws-ec2 dataprocessing datamanagement emr-cluster juypter-notebook bigdatainfrastructure

Updated Jun 19, 2024
Jupyter Notebook

nogueira-ric / emr-6.4-spark-3.1.2

Star

AWS EMR 6.4 - Spark 3.1.2 - Python3.7.5

spark emr-cluster

Updated Feb 26, 2022
Python

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

Star

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

lambda-functions pyspark boto3 pyspark-notebook emr-cluster transient-cluster

Updated Jan 1, 2024
Python

Improve this page

Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emr-cluster

Here are 97 public repositories matching this topic...

BiGHeaDMaX / Extraction-features-avec-Spark

abie-ramie / HQL_Cosmetic_ClickStreamData

Siddhesh19991 / Automate_EMR_ETL_pipeline_using_Airflow

rupeshtiwari / learning-apache-spark

ucaiado / etl-spark-aws

ustcdj / Sparkify_Churn_Analysis

skyler-myers-db / Common-Crawl-Analysis

JennaFar / elastic-data-factory

adnanrahin / spark-rdd-df-comparison-emr

ajinChen / amazon-product-analysis

ramtekeabhas7 / Hive_Case_Study_using_AWS_Hadoop

cloudposse-archives / terraform-aws-spotinst-mrscaler

RonnJacob / PageRank-MapReduce-Spark

omarfessi / UDACITY-CapstoneProject

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

carlossanchezvega / twitter

jaquelinecella / jaquelinecella-Bootcamp_modulo1_Eng_Dados_Cloud

kevinndungu-source / Amazon_EMR_Demonstration_Resources

nogueira-ric / emr-6.4-spark-3.1.2

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

Improve this page

Add this topic to your repo