Skip to content

A Kafka and Spark Streaming Integration project : SF Crime Statistics with Spark Streaming

Notifications You must be signed in to change notification settings

san089/SF-Crime-Statistics

Repository files navigation

Kafka and Spark Streaming Integration

Overview

In this project, we provide a statistical analyses of the data using Apache Spark Structured Streaming. We created a Kafka server to produce data, a test Kafka Consumer to consume data and ingest data through Spark Structured Streaming. Then we applied Spark Streaming windowing and filtering to aggregate the data and extract count on hourly basis.

Environment

  • Java 1.8.x
  • Python 3.6 or above
  • Zookeeper
  • Kafka
  • Scala 2.11.x
  • Spark 2.4.x

How to Run?

Start Zookeeper and Kafka Server

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

Run Kafka Producer server

python kafka_server.py

Run the kafka Consumer server

python kafka_consumer.py

Submit Spark Streaming Job

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.4 --master local[*] data_stream.py

kafka consumer console output

Consumer console output

Streaming progress reporter

Progress Reporter

Output

output

About

A Kafka and Spark Streaming Integration project : SF Crime Statistics with Spark Streaming

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published