Kafka and Spark Streaming Integration

Overview

In this project, we provide a statistical analyses of the data using Apache Spark Structured Streaming. We created a Kafka server to produce data, a test Kafka Consumer to consume data and ingest data through Spark Structured Streaming. Then we applied Spark Streaming windowing and filtering to aggregate the data and extract count on hourly basis.

Environment

Java 1.8.x
Python 3.6 or above
Zookeeper
Kafka
Scala 2.11.x
Spark 2.4.x

How to Run?

Start Zookeeper and Kafka Server

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

Run Kafka Producer server

python kafka_server.py

Run the kafka Consumer server

python kafka_consumer.py

Submit Spark Streaming Job

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.4 --master local[*] data_stream.py

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
config		config
README.md		README.md
Untitled1.ipynb		Untitled1.ipynb
archive.zip		archive.zip
data_stream.py		data_stream.py
kafka-console-consumer-output.PNG		kafka-console-consumer-output.PNG
kafka_consumer.py		kafka_consumer.py
kafka_server.py		kafka_server.py
output.png		output.png
producer_server.py		producer_server.py
radio_code.json		radio_code.json
requirements.txt		requirements.txt
spark-streaming-progress-report.PNG		spark-streaming-progress-report.PNG
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka and Spark Streaming Integration

Overview

Environment

How to Run?

Start Zookeeper and Kafka Server

Run Kafka Producer server

Run the kafka Consumer server

Submit Spark Streaming Job

kafka consumer console output

Streaming progress reporter

Output

Reference: https://spark.apache.org/docs/latest/sql-performance-tuning.html

About

Releases

Packages

Languages

san089/SF-Crime-Statistics

Folders and files

Latest commit

History

Repository files navigation

Kafka and Spark Streaming Integration

Overview

Environment

How to Run?

Start Zookeeper and Kafka Server

Run Kafka Producer server

Run the kafka Consumer server

Submit Spark Streaming Job

kafka consumer console output

Streaming progress reporter

Output

Reference: https://spark.apache.org/docs/latest/sql-performance-tuning.html

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages