Skip to content

liangruibupt/glue-streaming-etl-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS serverless etl and streaming demo

Glue Streaming ETL Demo

This demo is shown how to use the Glue Streaming feature to Manage continuous ingestion pipelines and processing data on-the-fly. The Glue Steaming Jobs is extending AWS Glue jobs, based on Apache Spark, to run continuously and consume data from streaming platforms such as Amazon Kinesis Data Streams and Apache Kafka (including the fully-managed Amazon MSK).

Glue can provision, manage, and scale the infrastructure to ingest data to data lakes on Amazon S3, data warehouses such as Amazon Redshift, or store streaming data in a DynamoDB table for quick lookups, or in Elasticsearch to look for specific operation patterns.

Glue Streaming is based on Spark Structured Streaming to implement data transformations, such as aggregating, partitioning, and formatting as well as joining with other data sets to enrich or cleanse the data for easier analysis.

Please find more details in Adding Streaming ETL Jobs in AWS Glue guide

IoT-Kafka-GlueStreaming-Demo

serverless-etl-diagram-kafka

IoT-Kinesis-GlueStreaming-Demo

serverless-etl-diagram

kinesis-kafka-connector-Demo

kinesis-kafka-connector

Kinesis Data Anlytics Streaming Demo

This demo is shown how to use the Kinesis Data Anlytics to Manage continuous ingestion pipelines and processing data on-the-fly. Kinesis Data Anlytics can help you run continuously and consume data from streaming platforms such as Amazon Kinesis Data Streams and Apache Kafka (including the fully-managed Amazon MSK).

IoT-Kinesis-KinesisDataAnlytics-Demo

kinesis-kda-demo

IoT-Kafka-KinesisDataAnlytics-Demo

kafka-kda-demo

Glue ingest the RDS data

This demo is shown how to use the Glue to ingest data from RDS database.

Architeture

mysql-glue

Glue ingest MySQL5.7 via Glue connector

Glue ingest MySQL8.0 via Glue connector

Connect the RDS which SSL connection enabled

Data-On-Boarding-End2End-Demo

end2end-data-onboarding

Data On Boarding End2End Demo

Python Code send record to S3 via Kinesis Firehose

python-firehose-arch Pyhton-Send-Data-Firefose Demo

IoT-Athena-QuickSight

Build a business intelligence capability for streaming IoT device data using AWS IoT Core, Amazon Firehose, Amazon S3, Amazon Athena and Amazon QuickSight

iot-athen-quicksight-achitect IoT-Athena-QuickSight

Releases

No releases published

Packages

No packages published

Languages