Event driven EMR via Serverless
-
Updated
Nov 22, 2017 - Python
Event driven EMR via Serverless
This repository aims to capture and clean data from the twitter API in order to perform a sentiment analysis on an EMR cluster.
running zeppelin on EMR and launching tasks on it with task runner.
Alibaba Cloud EMR Create Example for Python
Cloud Computing Tutorials for AWS
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This repository contains a definition of standar structure for Machine Learning and Data Pipelines Projects
Full code for UDACITY's Data Engineer Nano Degree project. Implementing a Data Lake in Amazon's cloud with AWS S3, AWS EMR and Spark.
Data Modeling with Spark for a data lake hosted on S3
Hosting data lake with bid-ask data in S3 using Spark and Airflow
The EMR Helper library tries to help when setting up and managing an EMR cluster.
Player Unknown's Battlegrounds (PUBG), is a first person shooter game where the goal is to be the last player standing. You are placed on a giant circular map that shrinks as the game goes on, and you must find weapons, armor, and other supplies in order to kill other players / teams and survive.
This project is about building a data lake and creating an ETL pipeline in Spark that loads data from Amazon S3, processes the data into analytics tables, and loads them back into S3
Create a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transformed data back to S3.
To implement a data lake using S3 and Spark on an EMR cluster using AWS Cloud9 environment and develop an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as a set of dimensional tables.
A template for creating Amazon EMR clusters using either Amazon MWAA or a Dockerized Airflow Container as a workflow environment
Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR
Orchestrating Cloud ETL Workloads
Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.
To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."